APPRIS 2017: principal isoforms for multiple gene sets
Rodriguez-Rivas, Juan; Di Domenico, Tomás; Vázquez, Jesús; Valencia, Alfonso
2018-01-01
Abstract The APPRIS database (http://appris-tools.org) uses protein structural and functional features and information from cross-species conservation to annotate splice isoforms in protein-coding genes. APPRIS selects a single protein isoform, the ‘principal’ isoform, as the reference for each gene based on these annotations. A single main splice isoform reflects the biological reality for most protein coding genes and APPRIS principal isoforms are the best predictors of these main proteins isoforms. Here, we present the updates to the database, new developments that include the addition of three new species (chimpanzee, Drosophila melangaster and Caenorhabditis elegans), the expansion of APPRIS to cover the RefSeq gene set and the UniProtKB proteome for six species and refinements in the core methods that make up the annotation pipeline. In addition APPRIS now provides a measure of reliability for individual principal isoforms and updates with each release of the GENCODE/Ensembl and RefSeq reference sets. The individual GENCODE/Ensembl, RefSeq and UniProtKB reference gene sets for six organisms have been merged to produce common sets of splice variants. PMID:29069475
Picotti, Paola; Clement-Ziza, Mathieu; Lam, Henry; Campbell, David S.; Schmidt, Alexander; Deutsch, Eric W.; Röst, Hannes; Sun, Zhi; Rinner, Oliver; Reiter, Lukas; Shen, Qin; Michaelson, Jacob J.; Frei, Andreas; Alberti, Simon; Kusebauch, Ulrike; Wollscheid, Bernd; Moritz, Robert; Beyer, Andreas; Aebersold, Ruedi
2013-01-01
Complete reference maps or datasets, like the genomic map of an organism, are highly beneficial tools for biological and biomedical research. Attempts to generate such reference datasets for a proteome so far failed to reach complete proteome coverage, with saturation apparent at approximately two thirds of the proteomes tested, even for the most thoroughly characterized proteomes. Here, we used a strategy based on high-throughput peptide synthesis and mass spectrometry to generate a close to complete reference map (97% of the genome-predicted proteins) of the S. cerevisiae proteome. We generated two versions of this mass spectrometric map one supporting discovery- (shotgun) and the other hypothesis-driven (targeted) proteomic measurements. The two versions of the map, therefore, constitute a complete set of proteomic assays to support most studies performed with contemporary proteomic technologies. The reference libraries can be browsed via a web-based repository and associated navigation tools. To demonstrate the utility of the reference libraries we applied them to a protein quantitative trait locus (pQTL) analysis, which requires measurement of the same peptides over a large number of samples with high precision. Protein measurements over a set of 78 S. cerevisiae strains revealed a complex relationship between independent genetic loci, impacting on the levels of related proteins. Our results suggest that selective pressure favors the acquisition of sets of polymorphisms that maintain the stoichiometry of protein complexes and pathways. PMID:23334424
Muley, Vijaykumar Yogesh; Ranjan, Akash
2012-01-01
Recent progress in computational methods for predicting physical and functional protein-protein interactions has provided new insights into the complexity of biological processes. Most of these methods assume that functionally interacting proteins are likely to have a shared evolutionary history. This history can be traced out for the protein pairs of a query genome by correlating different evolutionary aspects of their homologs in multiple genomes known as the reference genomes. These methods include phylogenetic profiling, gene neighborhood and co-occurrence of the orthologous protein coding genes in the same cluster or operon. These are collectively known as genomic context methods. On the other hand a method called mirrortree is based on the similarity of phylogenetic trees between two interacting proteins. Comprehensive performance analyses of these methods have been frequently reported in literature. However, very few studies provide insight into the effect of reference genome selection on detection of meaningful protein interactions. We analyzed the performance of four methods and their variants to understand the effect of reference genome selection on prediction efficacy. We used six sets of reference genomes, sampled in accordance with phylogenetic diversity and relationship between organisms from 565 bacteria. We used Escherichia coli as a model organism and the gold standard datasets of interacting proteins reported in DIP, EcoCyc and KEGG databases to compare the performance of the prediction methods. Higher performance for predicting protein-protein interactions was achievable even with 100-150 bacterial genomes out of 565 genomes. Inclusion of archaeal genomes in the reference genome set improves performance. We find that in order to obtain a good performance, it is better to sample few genomes of related genera of prokaryotes from the large number of available genomes. Moreover, such a sampling allows for selecting 50-100 genomes for comparable accuracy of predictions when computational resources are limited.
Breast Reference Set Application: Karen Anderson-ASU (2014) — EDRN Public Portal
In order to increase the predictive value of tumor-specific antibodies for use as immunodiagnostics, our EDRN BDL has developed a novel protein microarray technology, termed Nucleic Acid Protein Programmable Array (NAPPA), which circumvents many of the limitations of traditional protein microarrays. NAPPA arrays are generated by printing full-length cDNAs encoding the target proteins at each feature of the array. The proteins are then transcribed and translated by a cell-free system and immobilized in situ using epitope tags fused to the proteins. Sera are added, and bound IgG is detected by standard secondary reagents. Using a sequential screening strategy to select AAb from 4,988 candidate tumor antigens, we have identified 28 potential AAb biomarkers for the early detection of breast cancer, and here we propose to evaluate these biomarkers using the EDRN Breast Cancer Reference Set.
Zhou, Carol L Ecale
2015-01-01
In order to better define regions of similarity among related protein structures, it is useful to identify the residue-residue correspondences among proteins. Few codes exist for constructing a one-to-many multiple sequence alignment derived from a set of structure or sequence alignments, and a need was evident for creating such a tool for combining pairwise structure alignments that would allow for insertion of gaps in the reference structure. This report describes a new Python code, CombAlign, which takes as input a set of pairwise sequence alignments (which may be structure based) and generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA). The use and utility of CombAlign was demonstrated by generating gapped MSSAs using sets of pairwise structure-based sequence alignments between structure models of the matrix protein (VP40) and pre-small/secreted glycoprotein (sGP) of Reston Ebolavirus and the corresponding proteins of several other filoviruses. The gapped MSSAs revealed structure-based residue-residue correspondences, which enabled identification of structurally similar versus differing regions in the Reston proteins compared to each of the other corresponding proteins. CombAlign is a new Python code that generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA) given a set of pairwise sequence alignments (which may be structure based). CombAlign has utility in assisting the user in distinguishing structurally conserved versus divergent regions on a reference protein structure relative to other closely related proteins. CombAlign was developed in Python 2.6, and the source code is available for download from the GitHub code repository.
Direct and Absolute Quantification of over 1800 Yeast Proteins via Selected Reaction Monitoring*
Lawless, Craig; Holman, Stephen W.; Brownridge, Philip; Lanthaler, Karin; Harman, Victoria M.; Watkins, Rachel; Hammond, Dean E.; Miller, Rebecca L.; Sims, Paul F. G.; Grant, Christopher M.; Eyers, Claire E.; Beynon, Robert J.
2016-01-01
Defining intracellular protein concentration is critical in molecular systems biology. Although strategies for determining relative protein changes are available, defining robust absolute values in copies per cell has proven significantly more challenging. Here we present a reference data set quantifying over 1800 Saccharomyces cerevisiae proteins by direct means using protein-specific stable-isotope labeled internal standards and selected reaction monitoring (SRM) mass spectrometry, far exceeding any previous study. This was achieved by careful design of over 100 QconCAT recombinant proteins as standards, defining 1167 proteins in terms of copies per cell and upper limits on a further 668, with robust CVs routinely less than 20%. The selected reaction monitoring-derived proteome is compared with existing quantitative data sets, highlighting the disparities between methodologies. Coupled with a quantification of the transcriptome by RNA-seq taken from the same cells, these data support revised estimates of several fundamental molecular parameters: a total protein count of ∼100 million molecules-per-cell, a median of ∼1000 proteins-per-transcript, and a linear model of protein translation explaining 70% of the variance in translation rate. This work contributes a “gold-standard” reference yeast proteome (including 532 values based on high quality, dual peptide quantification) that can be widely used in systems models and for other comparative studies. PMID:26750110
USDA-ARS?s Scientific Manuscript database
A set of fatal neurological diseases that includes scrapie and chronic wasting disease (CWD) are caused by a pathological protein referred to as a prion (PrPSc). A prion propagates an infection by converting a normal cellular protein (PrPC) into a prion. Unlike viral, bacterial, or fungal pathogens,...
Assessment of protein set coherence using functional annotations
Chagoyen, Monica; Carazo, Jose M; Pascual-Montano, Alberto
2008-01-01
Background Analysis of large-scale experimental datasets frequently produces one or more sets of proteins that are subsequently mined for functional interpretation and validation. To this end, a number of computational methods have been devised that rely on the analysis of functional annotations. Although current methods provide valuable information (e.g. significantly enriched annotations, pairwise functional similarities), they do not specifically measure the degree of homogeneity of a protein set. Results In this work we present a method that scores the degree of functional homogeneity, or coherence, of a set of proteins on the basis of the global similarity of their functional annotations. The method uses statistical hypothesis testing to assess the significance of the set in the context of the functional space of a reference set. As such, it can be used as a first step in the validation of sets expected to be homogeneous prior to further functional interpretation. Conclusion We evaluate our method by analysing known biologically relevant sets as well as random ones. The known relevant sets comprise macromolecular complexes, cellular components and pathways described for Saccharomyces cerevisiae, which are mostly significantly coherent. Finally, we illustrate the usefulness of our approach for validating 'functional modules' obtained from computational analysis of protein-protein interaction networks. Matlab code and supplementary data are available at PMID:18937846
Costa, Caroline B; Monteiro, Karina M; Teichmann, Aline; da Silva, Edileuza D; Lorenzatto, Karina R; Cancela, Martín; Paes, Jéssica A; Benitz, André de N D; Castillo, Estela; Margis, Rogério; Zaha, Arnaldo; Ferreira, Henrique B
2015-08-01
The histone chaperone SET/TAF-Iβ is implicated in processes of chromatin remodelling and gene expression regulation. It has been associated with the control of developmental processes, but little is known about its function in helminth parasites. In Mesocestoides corti, a partial cDNA sequence related to SET/TAF-Iβ was isolated in a screening for genes differentially expressed in larvae (tetrathyridia) and adult worms. Here, the full-length coding sequence of the M. corti SET/TAF-Iβ gene was analysed and the encoded protein (McSET/TAF) was compared with orthologous sequences, showing that McSET/TAF can be regarded as a SET/TAF-Iβ family member, with a typical nucleosome-assembly protein (NAP) domain and an acidic tail. The expression patterns of the McSET/TAF gene and protein were investigated during the strobilation process by RT-qPCR, using a set of five reference genes, and by immunoblot and immunofluorescence, using monospecific polyclonal antibodies. A gradual increase in McSET/TAF transcripts and McSET/TAF protein was observed upon development induction by trypsin, demonstrating McSET/TAF differential expression during strobilation. These results provided the first evidence for the involvement of a protein from the NAP family of epigenetic effectors in the regulation of cestode development.
Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil
2015-01-01
The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp. PMID:25362073
MSA Bladder Reference Set Application: Charles Rosser-Hawaii (2014) — EDRN Public Portal
The goal of this proposal is straightforward. We wish to assay in a discovery set, reference set from EDRN, both PAI-1 and ANG promoters and genes for mutations. Then the results will be confirmed in a test cohort comprised of DNA extracted from fresh frozen tissue (n = 80 BCa patients). DNA from matching buffy coat from these 80 patients will serve as control. Extracted RNA can be assessed for difference in transcription. Furthermore, matched voided urine samples from these 80 patients are available to assess protein levels of PAI-1 and ANG by ELISA in addition to assessing activity of PAI-1 and ANG. At the end, we will link any genetic alteration with changes in RNA, protein and protein activity level as well as clinical features (e.g., age, race, tobacco history, grade, stage and outcomes). This comprehensive study will allow us with certainty to state if there are mutations in the promoters and genes of PAI-1 and ANG that are functional and thus may lead to the growth advantage that we previously demonstrated in our experiments.
Laskowski, Roman A
2009-01-01
PDBsum (http://www.ebi.ac.uk/pdbsum) provides summary information about each experimentally determined structural model in the Protein Data Bank (PDB). Here we describe some of its most recent features, including figures from the structure's key reference, citation data, Pfam domain diagrams, topology diagrams and protein-protein interactions. Furthermore, it now accepts users' own PDB format files and generates a private set of analyses for each uploaded structure.
Automated Gene Ontology annotation for anonymous sequence data.
Hennig, Steffen; Groth, Detlef; Lehrach, Hans
2003-07-01
Gene Ontology (GO) is the most widely accepted attempt to construct a unified and structured vocabulary for the description of genes and their products in any organism. Annotation by GO terms is performed in most of the current genome projects, which besides generality has the advantage of being very convenient for computer based classification methods. However, direct use of GO in small sequencing projects is not easy, especially for species not commonly represented in public databases. We present a software package (GOblet), which performs annotation based on GO terms for anonymous cDNA or protein sequences. It uses the species independent GO structure and vocabulary together with a series of protein databases collected from various sites, to perform a detailed GO annotation by sequence similarity searches. The sensitivity and the reference protein sets can be selected by the user. GOblet runs automatically and is available as a public service on our web server. The paper also addresses the reliability of automated GO annotations by using a reference set of more than 6000 human proteins. The GOblet server is accessible at http://goblet.molgen.mpg.de.
[Near-infrared reflectance spectroscopy predicts protein, moisture and ash in beans].
Gao, Huiyu; Wang, Guodong; Men, Jianhua; Wang, Zhu
2017-05-01
To explore the potential of near-infrared reflectance( NIR)spectroscopy to determine macronutrient contents in beans. NIR spectra and analytical measurements of protein, moisture and ash were collected from 70 kinds of beans. Reference methods were used to analyze all the ground beans samples. NIR spectra on intact and ground beans samples were registered. Partial least-squares( PLS)regression models were developed with principal components analysis( PCA) to assign 49 bean accessions to a calibration data set and 21 accessions to an external validation set. For intact beans, the relative predictive determinant( RPD) values for protein and ash( 3. 67 and 3. 97, respectively) were good for screening. RPD value for moisture was only 1. 39, which was not recommended. For ground beans, the RPD values for protein, moisture and ash( 6. 63, 5. 25 and 3. 57, respectively) were good enough for screening. The protein, moisture and ash levels for intact and ground beans were all significantly correlated( P < 0. 001) between the NIR and reference method and there was no statistically significant difference in the mean with these three traits. This research demonstrates that NIR is a promising technique for simultaneous sorting ofmultiple traits in beans with no or easy sample preparation.
Lang, Tiange; Yin, Kangquan; Liu, Jinyu; Cao, Kunfang; Cannon, Charles H; Du, Fang K
2014-01-01
Predicting protein domains is essential for understanding a protein's function at the molecular level. However, up till now, there has been no direct and straightforward method for predicting protein domains in species without a reference genome sequence. In this study, we developed a functionality with a set of programs that can predict protein domains directly from genomic sequence data without a reference genome. Using whole genome sequence data, the programming functionality mainly comprised DNA assembly in combination with next-generation sequencing (NGS) assembly methods and traditional methods, peptide prediction and protein domain prediction. The proposed new functionality avoids problems associated with de novo assembly due to micro reads and small single repeats. Furthermore, we applied our functionality for the prediction of leucine rich repeat (LRR) domains in four species of Ficus with no reference genome, based on NGS genomic data. We found that the LRRNT_2 and LRR_8 domains are related to plant transpiration efficiency, as indicated by the stomata index, in the four species of Ficus. The programming functionality established in this study provides new insights for protein domain prediction, which is particularly timely in the current age of NGS data expansion.
Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil
2015-02-01
The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Serum proteins by capillary zone electrophoresis: approaches to the definition of reference values.
Petrini, C; Alessio, M G; Scapellato, L; Brambilla, S; Franzini, C
1999-10-01
The Paragon CZE 2000 (Beckman Analytical, Milan, Italy) is an automatic dedicated capillary zone electrophoresis (CZE) system, producing a five-zone serum protein pattern with quantitative estimation of the zones. With the view of substituting this instrument for two previously used serum protein electrophoresis techniques, we planned to produce reference values for the "new" systems leading to compatible interpretation of the results. High resolution cellulose acetate electrophoresis with visual inspection and descriptive reporting (HR-CAE) and five-zone cellulose acetate electrophoresis with densitometry (CAE-D) were the previously used techniques. Serum samples (n = 167) giving "normal pattern" with HR-CAE were assayed with the CZE system, and the results were statistically assessed to yield 0.95 reference intervals. One thousand normal and pathological serum samples were then assayed with the CAE-D and the CZE techniques, and the regression equations of the CAE-D values over the CZE values for the five zones were used to transform the CAE-D reference limits into the CZE reference limits. The two sets of reference values thereby produced were in good agreement with each other and also with reference values previously reported for the CZE system. Thus, reference values for the CZE techniques permit interpretation of results coherent with the previously used techniques and reporting modes.
Phylo_dCor: distance correlation as a novel metric for phylogenetic profiling.
Sferra, Gabriella; Fratini, Federica; Ponzi, Marta; Pizzi, Elisabetta
2017-09-05
Elaboration of powerful methods to predict functional and/or physical protein-protein interactions from genome sequence is one of the main tasks in the post-genomic era. Phylogenetic profiling allows the prediction of protein-protein interactions at a whole genome level in both Prokaryotes and Eukaryotes. For this reason it is considered one of the most promising methods. Here, we propose an improvement of phylogenetic profiling that enables handling of large genomic datasets and infer global protein-protein interactions. This method uses the distance correlation as a new measure of phylogenetic profile similarity. We constructed robust reference sets and developed Phylo-dCor, a parallelized version of the algorithm for calculating the distance correlation that makes it applicable to large genomic data. Using Saccharomyces cerevisiae and Escherichia coli genome datasets, we showed that Phylo-dCor outperforms phylogenetic profiling methods previously described based on the mutual information and Pearson's correlation as measures of profile similarity. In this work, we constructed and assessed robust reference sets and propose the distance correlation as a measure for comparing phylogenetic profiles. To make it applicable to large genomic data, we developed Phylo-dCor, a parallelized version of the algorithm for calculating the distance correlation. Two R scripts that can be run on a wide range of machines are available upon request.
Liver Rapid Reference Set Application: Hemken - Abbott (2015) — EDRN Public Portal
The aim for this testing is to find a small panel of biomarkers (n=2-5) that can be tested on the Abbott ARCHITECT automated immunoassay platform for the early detection of hepatocellular carcinoma (HCC). This panel of biomarkers should perform significantly better than alpha-fetoprotein (AFP) alone based on multivariate statistical analysis. This testing of the EDRN reference set will help expedite the selection of a small panel of ARCHITECT biomarkers for the early detection of HCC. The panel of ARCHITECT biomarkers Abbott plans to test include: AFP, protein induced by vitamin K absence or antagonist-II (PIVKA-II), golgi protein 73 (GP73), hepatocellular growth factor (HGF), dipeptidyl peptidase 4 (DPP4) and DPP4/seprase (surface expressed protease) heterodimer hybrid. PIVKA-II is abnormal des-carboxylated prothrombin (DCP) present in vitamin K deficiency.
Gioutlakis, Aris; Klapa, Maria I.
2017-01-01
It has been acknowledged that source databases recording experimentally supported human protein-protein interactions (PPIs) exhibit limited overlap. Thus, the reconstruction of a comprehensive PPI network requires appropriate integration of multiple heterogeneous primary datasets, presenting the PPIs at various genetic reference levels. Existing PPI meta-databases perform integration via normalization; namely, PPIs are merged after converted to a certain target level. Hence, the node set of the integrated network depends each time on the number and type of the combined datasets. Moreover, the irreversible a priori normalization process hinders the identification of normalization artifacts in the integrated network, which originate from the nonlinearity characterizing the genetic information flow. PICKLE (Protein InteraCtion KnowLedgebasE) 2.0 implements a new architecture for this recently introduced human PPI meta-database. Its main novel feature over the existing meta-databases is its approach to primary PPI dataset integration via genetic information ontology. Building upon the PICKLE principles of using the reviewed human complete proteome (RHCP) of UniProtKB/Swiss-Prot as the reference protein interactor set, and filtering out protein interactions with low probability of being direct based on the available evidence, PICKLE 2.0 first assembles the RHCP genetic information ontology network by connecting the corresponding genes, nucleotide sequences (mRNAs) and proteins (UniProt entries) and then integrates PPI datasets by superimposing them on the ontology network without any a priori transformations. Importantly, this process allows the resulting heterogeneous integrated network to be reversibly normalized to any level of genetic reference without loss of the original information, the latter being used for identification of normalization biases, and enables the appraisal of potential false positive interactions through PPI source database cross-checking. The PICKLE web-based interface (www.pickle.gr) allows for the simultaneous query of multiple entities and provides integrated human PPI networks at either the protein (UniProt) or the gene level, at three PPI filtering modes. PMID:29023571
Crystallizing Membrane Proteins Using Lipidic Mesophases
Caffrey, Martin; Cherezov, Vadim
2009-01-01
A detailed protocol for crystallizing membrane proteins that makes use of lipidic mesophases is described. This has variously been referred to as the lipid cubic phase or in meso method. The method has been shown to be quite general in that it has been used to solve X-ray crystallographic structures of prokaryotic and eukaryotic proteins, proteins that are monomeric, homo- and hetero-multimeric, chromophore-containing and chromophore-free, and α-helical and β-barrel proteins. Its most recent successes are the human engineered β2-adrenergic and adenosine A2A G protein-coupled receptors. Protocols are provided for preparing and characterizing the lipidic mesophase, for reconstituting the protein into the monoolein-based mesophase, for functional assay of the protein in the mesophase, and for setting up crystallizations in manual mode. Methods for harvesting micro-crystals are also described. The time required to prepare the protein-loaded mesophase and to set up a crystallization plate manually is about one hour. PMID:19390528
Kim, Jong-Seo; Fillmore, Thomas L; Liu, Tao; Robinson, Errol; Hossain, Mahmud; Champion, Boyd L; Moore, Ronald J; Camp, David G; Smith, Richard D; Qian, Wei-Jun
2011-12-01
Selected reaction monitoring (SRM)-MS is an emerging technology for high throughput targeted protein quantification and verification in biomarker discovery studies; however, the cost associated with the application of stable isotope-labeled synthetic peptides as internal standards can be prohibitive for screening a large number of candidate proteins as often required in the preverification phase of discovery studies. Herein we present a proof of concept study using an (18)O-labeled proteome reference as global internal standards (GIS) for SRM-based relative quantification. The (18)O-labeled proteome reference (or GIS) can be readily prepared and contains a heavy isotope ((18)O)-labeled internal standard for every possible tryptic peptide. Our results showed that the percentage of heavy isotope ((18)O) incorporation applying an improved protocol was >99.5% for most peptides investigated. The accuracy, reproducibility, and linear dynamic range of quantification were further assessed based on known ratios of standard proteins spiked into the labeled mouse plasma reference. Reliable quantification was observed with high reproducibility (i.e. coefficient of variance <10%) for analyte concentrations that were set at 100-fold higher or lower than those of the GIS based on the light ((16)O)/heavy ((18)O) peak area ratios. The utility of (18)O-labeled GIS was further illustrated by accurate relative quantification of 45 major human plasma proteins. Moreover, quantification of the concentrations of C-reactive protein and prostate-specific antigen was illustrated by coupling the GIS with standard additions of purified protein standards. Collectively, our results demonstrated that the use of (18)O-labeled proteome reference as GIS provides a convenient, low cost, and effective strategy for relative quantification of a large number of candidate proteins in biological or clinical samples using SRM.
Wildhaber, M.L.; Papoulias, D.M.; DeLonay, A.J.; Tillitt, D.E.; Bryan, J.L.; Annis, M.L.
2007-01-01
From May 2001 to June 2002 Wildhaber et al. (2005) conducted monthly sampling of Lower Missouri River shovelnose sturgeon (Scaphirhynchus platorynchus) to develop methods for determination of sex and the reproductive stage of sturgeons in the field. Shovelnose sturgeon were collected from the Missouri River and ultrasonic and endoscopic imagery and blood and gonadal tissue samples were taken. The full set of data was used to develop monthly reproductive stage profiles for S. platorynchus that could be compared to data collected on pallid sturgeon (Scaphirhynchus albus). This paper presents a comprehensive reference set of images, sex steroids, and vitellogenin (VTG, an egg protein precursor) data for assessing shovelnose sturgeon sex and reproductive stage. This reference set includes ultrasonic, endoscopic, histologic, and internal images of male and female gonads of shovelnose sturgeon at each reproductive stage along with complementary data on average 17-β estradiol, 11-ketotestosterone, VTG, gonadosomatic index, and polarization index.
Mafra, Valéria; Kubo, Karen S.; Alves-Ferreira, Marcio; Ribeiro-Alves, Marcelo; Stuart, Rodrigo M.; Boava, Leonardo P.; Rodrigues, Carolina M.; Machado, Marcos A.
2012-01-01
Real-time reverse transcription PCR (RT-qPCR) has emerged as an accurate and widely used technique for expression profiling of selected genes. However, obtaining reliable measurements depends on the selection of appropriate reference genes for gene expression normalization. The aim of this work was to assess the expression stability of 15 candidate genes to determine which set of reference genes is best suited for transcript normalization in citrus in different tissues and organs and leaves challenged with five pathogens (Alternaria alternata, Phytophthora parasitica, Xylella fastidiosa and Candidatus Liberibacter asiaticus). We tested traditional genes used for transcript normalization in citrus and orthologs of Arabidopsis thaliana genes described as superior reference genes based on transcriptome data. geNorm and NormFinder algorithms were used to find the best reference genes to normalize all samples and conditions tested. Additionally, each biotic stress was individually analyzed by geNorm. In general, FBOX (encoding a member of the F-box family) and GAPC2 (GAPDH) was the most stable candidate gene set assessed under the different conditions and subsets tested, while CYP (cyclophilin), TUB (tubulin) and CtP (cathepsin) were the least stably expressed genes found. Validation of the best suitable reference genes for normalizing the expression level of the WRKY70 transcription factor in leaves infected with Candidatus Liberibacter asiaticus showed that arbitrary use of reference genes without previous testing could lead to misinterpretation of data. Our results revealed FBOX, SAND (a SAND family protein), GAPC2 and UPL7 (ubiquitin protein ligase 7) to be superior reference genes, and we recommend their use in studies of gene expression in citrus species and relatives. This work constitutes the first systematic analysis for the selection of superior reference genes for transcript normalization in different citrus organs and under biotic stress. PMID:22347455
Computational clustering for viral reference proteomes
Chen, Chuming; Huang, Hongzhan; Mazumder, Raja; Natale, Darren A.; McGarvey, Peter B.; Zhang, Jian; Polson, Shawn W.; Wang, Yuqi; Wu, Cathy H.
2016-01-01
Motivation: The enormous number of redundant sequenced genomes has hindered efforts to analyze and functionally annotate proteins. As the taxonomy of viruses is not uniformly defined, viral proteomes pose special challenges in this regard. Grouping viruses based on the similarity of their proteins at proteome scale can normalize against potential taxonomic nomenclature anomalies. Results: We present Viral Reference Proteomes (Viral RPs), which are computed from complete virus proteomes within UniProtKB. Viral RPs based on 95, 75, 55, 35 and 15% co-membership in proteome similarity based clusters are provided. Comparison of our computational Viral RPs with UniProt’s curator-selected Reference Proteomes indicates that the two sets are consistent and complementary. Furthermore, each Viral RP represents a cluster of virus proteomes that was consistent with virus or host taxonomy. We provide BLASTP search and FTP download of Viral RP protein sequences, and a browser to facilitate the visualization of Viral RPs. Availability and implementation: http://proteininformationresource.org/rps/viruses/ Contact: chenc@udel.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153712
Human liver proteome project: plan, progress, and perspectives.
He, Fuchu
2005-12-01
The Human Liver Proteome Project is the first initiative of the human proteome project for human organs/tissues and aims at writing a modern Prometheus myth. Its global scientific objectives are to reveal the "solar system" of the human liver proteome, expression profiles, modification profiles, a protein linkage (protein-protein interaction) map, and a proteome localization map, and to define an ORFeome, physiome, and pathome. Since it was first proposed in April 2002, the Human Liver Proteome Project has attracted more than 100 laboratories from all over the world. In the ensuing 3 years, we set up a management infrastructure, identified reference laboratories, confirmed standard operating procedures, initiated international research collaborations, and finally achieved the first set of expression profile data.
Schwaighofer, Andreas; Kuligowski, Julia; Quintás, Guillermo; Mayer, Helmut K; Lendl, Bernhard
2018-06-30
Analysis of proteins in bovine milk is usually tackled by time-consuming analytical approaches involving wet-chemical, multi-step sample clean-up procedures. The use of external cavity-quantum cascade laser (EC-QCL) based IR spectroscopy was evaluated as an alternative screening tool for direct and simultaneous quantification of individual proteins (i.e. casein and β-lactoglobulin) and total protein content in commercial bovine milk samples. Mid-IR spectra of protein standard mixtures were used for building partial least squares (PLS) regression models. A sample set comprising different milk types (pasteurized; differently processed extended shelf life, ESL; ultra-high temperature, UHT) was analysed and results were compared to reference methods. Concentration values of the QCL-IR spectroscopy approach obtained within several minutes are in good agreement with reference methods involving multiple sample preparation steps. The potential application as a fast screening method for estimating the heat load applied to liquid milk is demonstrated. Copyright © 2018 Elsevier Ltd. All rights reserved.
Roca, Alberto I
2014-01-01
The 2013 BioVis Contest provided an opportunity to evaluate different paradigms for visualizing protein multiple sequence alignments. Such data sets are becoming extremely large and thus taxing current visualization paradigms. Sequence Logos represent consensus sequences but have limitations for protein alignments. As an alternative, ProfileGrids are a new protein sequence alignment visualization paradigm that represents an alignment as a color-coded matrix of the residue frequency occurring at every homologous position in the aligned protein family. The JProfileGrid software program was used to analyze the BioVis contest data sets to generate figures for comparison with the Sequence Logo reference images. The ProfileGrid representation allows for the clear and effective analysis of protein multiple sequence alignments. This includes both a general overview of the conservation and diversity sequence patterns as well as the interactive ability to query the details of the protein residue distributions in the alignment. The JProfileGrid software is free and available from http://www.ProfileGrid.org.
Mapping of protein- and chromatin-interactions at the nuclear lamina.
Kubben, Nard; Voncken, Jan Willem; Misteli, Tom
2010-01-01
The nuclear envelope and the lamina define the nuclear periphery and are implicated in many nuclear processes including chromatin organization, transcription and DNA replication. Mutations in lamin A proteins, major components of the lamina, interfere with these functions and cause a set of phenotypically diverse diseases referred to as laminopathies. The phenotypic diversity of laminopathies is thought to be the result of alterations in specific protein- and chromatin interactions due to lamin A mutations. Systematic identification of lamin A-protein and -chromatin interactions will be critical to uncover the molecular etiology of laminopathies. Here we summarize and critically discuss recent technology to analyze lamina-protein and-chromatin interactions.
Mapping transcription factor interactome networks using HaloTag protein arrays.
Yazaki, Junshi; Galli, Mary; Kim, Alice Y; Nito, Kazumasa; Aleman, Fernando; Chang, Katherine N; Carvunis, Anne-Ruxandra; Quan, Rosa; Nguyen, Hien; Song, Liang; Alvarez, José M; Huang, Shao-Shan Carol; Chen, Huaming; Ramachandran, Niroshan; Altmann, Stefan; Gutiérrez, Rodrigo A; Hill, David E; Schroeder, Julian I; Chory, Joanne; LaBaer, Joshua; Vidal, Marc; Braun, Pascal; Ecker, Joseph R
2016-07-19
Protein microarrays enable investigation of diverse biochemical properties for thousands of proteins in a single experiment, an unparalleled capacity. Using a high-density system called HaloTag nucleic acid programmable protein array (HaloTag-NAPPA), we created high-density protein arrays comprising 12,000 Arabidopsis ORFs. We used these arrays to query protein-protein interactions for a set of 38 transcription factors and transcriptional regulators (TFs) that function in diverse plant hormone regulatory pathways. The resulting transcription factor interactome network, TF-NAPPA, contains thousands of novel interactions. Validation in a benchmarked in vitro pull-down assay revealed that a random subset of TF-NAPPA validated at the same rate of 64% as a positive reference set of literature-curated interactions. Moreover, using a bimolecular fluorescence complementation (BiFC) assay, we confirmed in planta several interactions of biological interest and determined the interaction localizations for seven pairs. The application of HaloTag-NAPPA technology to plant hormone signaling pathways allowed the identification of many novel transcription factor-protein interactions and led to the development of a proteome-wide plant hormone TF interactome network.
Simpson, Deborah M; Beynon, Robert J
2012-09-01
Systems biology requires knowledge of the absolute amounts of proteins in order to model biological processes and simulate the effects of changes in specific model parameters. Quantification concatamers (QconCATs) are established as a method to provide multiplexed absolute peptide standards for a set of target proteins in isotope dilution standard experiments. Two or more quantotypic peptides representing each of the target proteins are concatenated into a designer gene that is metabolically labelled with stable isotopes in Escherichia coli or other cellular or cell-free systems. Co-digestion of a known amount of QconCAT with the target proteins generates a set of labelled reference peptide standards for the unlabelled analyte counterparts, and by using an appropriate mass spectrometry platform, comparison of the intensities of the peptide ratios delivers absolute quantification of the encoded peptides and in turn the target proteins for which they are surrogates. In this review, we discuss the criteria and difficulties associated with surrogate peptide selection and provide examples in the design of QconCATs for quantification of the proteins of the nuclear factor κB pathway.
PreSSAPro: a software for the prediction of secondary structure by amino acid properties.
Costantini, Susan; Colonna, Giovanni; Facchiano, Angelo M
2007-10-01
PreSSAPro is a software, available to the scientific community as a free web service designed to provide predictions of secondary structures starting from the amino acid sequence of a given protein. Predictions are based on our recently published work on the amino acid propensities for secondary structures in either large but not homogeneous protein data sets, as well as in smaller but homogeneous data sets corresponding to protein structural classes, i.e. all-alpha, all-beta, or alpha-beta proteins. Predictions result improved by the use of propensities evaluated for the right protein class. PreSSAPro predicts the secondary structure according to the right protein class, if known, or gives a multiple prediction with reference to the different structural classes. The comparison of these predictions represents a novel tool to evaluate what sequence regions can assume different secondary structures depending on the structural class assignment, in the perspective of identifying proteins able to fold in different conformations. The service is available at the URL http://bioinformatica.isa.cnr.it/PRESSAPRO/.
Paulovich, Amanda G.; Billheimer, Dean; Ham, Amy-Joan L.; Vega-Montoto, Lorenzo; Rudnick, Paul A.; Tabb, David L.; Wang, Pei; Blackman, Ronald K.; Bunk, David M.; Cardasis, Helene L.; Clauser, Karl R.; Kinsinger, Christopher R.; Schilling, Birgit; Tegeler, Tony J.; Variyath, Asokan Mulayath; Wang, Mu; Whiteaker, Jeffrey R.; Zimmerman, Lisa J.; Fenyo, David; Carr, Steven A.; Fisher, Susan J.; Gibson, Bradford W.; Mesri, Mehdi; Neubert, Thomas A.; Regnier, Fred E.; Rodriguez, Henry; Spiegelman, Cliff; Stein, Stephen E.; Tempst, Paul; Liebler, Daniel C.
2010-01-01
Optimal performance of LC-MS/MS platforms is critical to generating high quality proteomics data. Although individual laboratories have developed quality control samples, there is no widely available performance standard of biological complexity (and associated reference data sets) for benchmarking of platform performance for analysis of complex biological proteomes across different laboratories in the community. Individual preparations of the yeast Saccharomyces cerevisiae proteome have been used extensively by laboratories in the proteomics community to characterize LC-MS platform performance. The yeast proteome is uniquely attractive as a performance standard because it is the most extensively characterized complex biological proteome and the only one associated with several large scale studies estimating the abundance of all detectable proteins. In this study, we describe a standard operating protocol for large scale production of the yeast performance standard and offer aliquots to the community through the National Institute of Standards and Technology where the yeast proteome is under development as a certified reference material to meet the long term needs of the community. Using a series of metrics that characterize LC-MS performance, we provide a reference data set demonstrating typical performance of commonly used ion trap instrument platforms in expert laboratories; the results provide a basis for laboratories to benchmark their own performance, to improve upon current methods, and to evaluate new technologies. Additionally, we demonstrate how the yeast reference, spiked with human proteins, can be used to benchmark the power of proteomics platforms for detection of differentially expressed proteins at different levels of concentration in a complex matrix, thereby providing a metric to evaluate and minimize preanalytical and analytical variation in comparative proteomics experiments. PMID:19858499
[Pathology of basal ganglia in neurodegenerative diseases].
Wakabayashi, Koichi; Tanji, Kunikazu; Mori, Fumiaki
2009-04-01
Intra- and/or extracellular proteinaceous inclusions in the brain tissue are characteristic pathological markers of many neurodegenerative diseases. Tau protein in neurofibrillary tangles and beta-amyloid in senile plaques are associated with Alzheimer's disease. Tau is associated with various neurological conditions, which are collectively referred to as tauopathies. Alpha-synucleinopathy is a term that collectively refers to a set of diseases in which neurodegeneration is accompanied by intracellular accumulation of alpha-synuclein in neurons or glial cells. Recently, TDP-43 has been identified as a major disease protein in the ubiquitinated inclusions in deseases such as amyotrophic lateral sclerosis and frontotemporal lobar degeneration with tau-negative, ubiquitin-positive inclusions. Thus, these neurodegenerative disorders comprise a new disease class, namely, TDP-43 proteinopathy. In this article, we review the present understanding of histopathological features of basal ganglia lesions in protein conformation disorders, including tauopathy, alpha-synucleinopathy, and TDP-43 proteinopathy.
Prokaryotic histone-like protein interacting with RNA polymerase.
Lathe, R; Buc, H; Lecocq, J P; Bautz, E K
1980-01-01
firA mutation of Escherichia coli can render RNA synthesis thermosensitive and confer abnormal sensitivity to rifampicin, an antibiotic that specifically inhibits the activity of RNA polymerase. We previously described the cloning of a chromosomal HindIII fragment containing the firA gene, and we now present strong evidence that the product of this gene is a 17,000-dalton polypeptide which, by various criteria, closely resembles the eukaryotic histones. This protein forms the largest of a unique set of three abundant histone-like proteins (HLP) found in E. coli and is hence referred to as HLPI. We discuss possible routes by which these proteins might affect transcription. Images PMID:6447875
Liver Full Reference Set Application :Timothy Block - Drexel Univ (2010) — EDRN Public Portal
The goal of this application is to determine if the levels of serum GP73 and fucosylated kininogen/acute phase proteins can be used to detect hepatocellular carcinoma (HCC) in the background of liver cirrhosis. The use of the validation set would allow us to directly compare GP73 and fucosylated markers against AFP, AFP-L3 and DCP as well as test them in combination with these markers
Breast Reference Set Application: Richard Zangar-PNNL (2012) — EDRN Public Portal
Our immediate goal is to define a set of biomarkers (composed of circulating plasma proteins) that can be used to distinguish between true and false screens, primarily mammograms. Our preliminary data suggest that different breast cancer subtypes need to be considered when developing this panel. Our long-term goal is to develop a panel of biomarkers that can accurately detect the presence of breast cancer regardless of subtype.
Liver Rapid Reference Set Application: Timothy Block - Drexel Univ (2008) — EDRN Public Portal
The goal of this application is to determine if the levels of serum GP73 and fucosylated kininogen/acute phase proteins can be used to detect hepatocellular carcinoma (HCC) in the background of liver cirrhosis. The use of the validation set would allow us to directly compare GP73 and fucosylated markers against AFP, AFP-L3 and DCP as well as test them in combination with these markers
Defining an essence of structure determining residue contacts in proteins.
Sathyapriya, R; Duarte, Jose M; Stehr, Henning; Filippis, Ioannis; Lappe, Michael
2009-12-01
The network of native non-covalent residue contacts determines the three-dimensional structure of a protein. However, not all contacts are of equal structural significance, and little knowledge exists about a minimal, yet sufficient, subset required to define the global features of a protein. Characterisation of this "structural essence" has remained elusive so far: no algorithmic strategy has been devised to-date that could outperform a random selection in terms of 3D reconstruction accuracy (measured as the Ca RMSD). It is not only of theoretical interest (i.e., for design of advanced statistical potentials) to identify the number and nature of essential native contacts-such a subset of spatial constraints is very useful in a number of novel experimental methods (like EPR) which rely heavily on constraint-based protein modelling. To derive accurate three-dimensional models from distance constraints, we implemented a reconstruction pipeline using distance geometry. We selected a test-set of 12 protein structures from the four major SCOP fold classes and performed our reconstruction analysis. As a reference set, series of random subsets (ranging from 10% to 90% of native contacts) are generated for each protein, and the reconstruction accuracy is computed for each subset. We have developed a rational strategy, termed "cone-peeling" that combines sequence features and network descriptors to select minimal subsets that outperform the reference sets. We present, for the first time, a rational strategy to derive a structural essence of residue contacts and provide an estimate of the size of this minimal subset. Our algorithm computes sparse subsets capable of determining the tertiary structure at approximately 4.8 A Ca RMSD with as little as 8% of the native contacts (Ca-Ca and Cb-Cb). At the same time, a randomly chosen subset of native contacts needs about twice as many contacts to reach the same level of accuracy. This "structural essence" opens new avenues in the fields of structure prediction, empirical potentials and docking.
Defining an Essence of Structure Determining Residue Contacts in Proteins
Sathyapriya, R.; Duarte, Jose M.; Stehr, Henning; Filippis, Ioannis; Lappe, Michael
2009-01-01
The network of native non-covalent residue contacts determines the three-dimensional structure of a protein. However, not all contacts are of equal structural significance, and little knowledge exists about a minimal, yet sufficient, subset required to define the global features of a protein. Characterisation of this “structural essence” has remained elusive so far: no algorithmic strategy has been devised to-date that could outperform a random selection in terms of 3D reconstruction accuracy (measured as the Ca RMSD). It is not only of theoretical interest (i.e., for design of advanced statistical potentials) to identify the number and nature of essential native contacts—such a subset of spatial constraints is very useful in a number of novel experimental methods (like EPR) which rely heavily on constraint-based protein modelling. To derive accurate three-dimensional models from distance constraints, we implemented a reconstruction pipeline using distance geometry. We selected a test-set of 12 protein structures from the four major SCOP fold classes and performed our reconstruction analysis. As a reference set, series of random subsets (ranging from 10% to 90% of native contacts) are generated for each protein, and the reconstruction accuracy is computed for each subset. We have developed a rational strategy, termed “cone-peeling” that combines sequence features and network descriptors to select minimal subsets that outperform the reference sets. We present, for the first time, a rational strategy to derive a structural essence of residue contacts and provide an estimate of the size of this minimal subset. Our algorithm computes sparse subsets capable of determining the tertiary structure at approximately 4.8 Å Ca RMSD with as little as 8% of the native contacts (Ca-Ca and Cb-Cb). At the same time, a randomly chosen subset of native contacts needs about twice as many contacts to reach the same level of accuracy. This “structural essence” opens new avenues in the fields of structure prediction, empirical potentials and docking. PMID:19997489
Protein Loop Structure Prediction Using Conformational Space Annealing.
Heo, Seungryong; Lee, Juyong; Joo, Keehyoung; Shin, Hang-Cheol; Lee, Jooyoung
2017-05-22
We have developed a protein loop structure prediction method by combining a new energy function, which we call E PLM (energy for protein loop modeling), with the conformational space annealing (CSA) global optimization algorithm. The energy function includes stereochemistry, dynamic fragment assembly, distance-scaled finite ideal gas reference (DFIRE), and generalized orientation- and distance-dependent terms. For the conformational search of loop structures, we used the CSA algorithm, which has been quite successful in dealing with various hard global optimization problems. We assessed the performance of E PLM with two widely used loop-decoy sets, Jacobson and RAPPER, and compared the results against the DFIRE potential. The accuracy of model selection from a pool of loop decoys as well as de novo loop modeling starting from randomly generated structures was examined separately. For the selection of a nativelike structure from a decoy set, E PLM was more accurate than DFIRE in the case of the Jacobson set and had similar accuracy in the case of the RAPPER set. In terms of sampling more nativelike loop structures, E PLM outperformed E DFIRE for both decoy sets. This new approach equipped with E PLM and CSA can serve as the state-of-the-art de novo loop modeling method.
2014-01-01
Background The 2013 BioVis Contest provided an opportunity to evaluate different paradigms for visualizing protein multiple sequence alignments. Such data sets are becoming extremely large and thus taxing current visualization paradigms. Sequence Logos represent consensus sequences but have limitations for protein alignments. As an alternative, ProfileGrids are a new protein sequence alignment visualization paradigm that represents an alignment as a color-coded matrix of the residue frequency occurring at every homologous position in the aligned protein family. Results The JProfileGrid software program was used to analyze the BioVis contest data sets to generate figures for comparison with the Sequence Logo reference images. Conclusions The ProfileGrid representation allows for the clear and effective analysis of protein multiple sequence alignments. This includes both a general overview of the conservation and diversity sequence patterns as well as the interactive ability to query the details of the protein residue distributions in the alignment. The JProfileGrid software is free and available from http://www.ProfileGrid.org. PMID:25237393
Muegge, I; Martin, Y C
1999-03-11
A fast, simplified potential-based approach is presented that estimates the protein-ligand binding affinity based on the given 3D structure of a protein-ligand complex. This general, knowledge-based approach exploits structural information of known protein-ligand complexes extracted from the Brookhaven Protein Data Bank and converts it into distance-dependent Helmholtz free interaction energies of protein-ligand atom pairs (potentials of mean force, PMF). The definition of an appropriate reference state and the introduction of a correction term accounting for the volume taken by the ligand were found to be crucial for deriving the relevant interaction potentials that treat solvation and entropic contributions implicitly. A significant correlation between experimental binding affinities and computed score was found for sets of diverse protein-ligand complexes and for sets of different ligands bound to the same target. For 77 protein-ligand complexes taken from the Brookhaven Protein Data Bank, the calculated score showed a standard deviation from observed binding affinities of 1.8 log Ki units and an R2 value of 0.61. The best results were obtained for the subset of 16 serine protease complexes with a standard deviation of 1.0 log Ki unit and an R2 value of 0.86. A set of 33 inhibitors modeled into a crystal structure of HIV-1 protease yielded a standard deviation of 0.8 log Ki units from measured inhibition constants and an R2 value of 0.74. In contrast to empirical scoring functions that show similar or sometimes better correlation with observed binding affinities, our method does not involve deriving specific parameters that fit the observed binding affinities of protein-ligand complexes of a given training set. We compared the performance of the PMF score, Böhm's score (LUDI), and the SMOG score for eight different test sets of protein-ligand complexes. It was found that for the majority of test sets the PMF score performs best. The strength of the new approach presented here lies in its generality as no knowledge about measured binding affinities is needed to derive atomic interaction potentials. The use of the new scoring function in docking studies is outlined.
Gene encoding herbicide safener binding protein
Walton, Jonathan D.; Scott-Craig, John S.
1999-01-01
The cDNA encoding safener binding protein (SafBP), also referred to as SBP1, is set forth in FIG. 5 and SEQ ID No. 1. The deduced amino acid sequence is provided in FIG. 5 and SEQ ID No. 2. Methods of making and using SBP1 and SafBP to alter a plant's sensitivity to certain herbicides or a plant's responsiveness to certain safeners are also provided, as well as expression vectors, transgenic plants or other organisms transfected with said vectors and seeds from said plants.
Liver Rapid Reference Set Application: Gary Norman-INOVA (2012) — EDRN Public Portal
We have developed a new and novel assay for the detection of Golgi protein 73 (GP73), also known as Golgi membrane protein 1 (Golm1) or Golgi phosphoprotein 2 (Golph2), in serum/plasma. The clinical question is to determine the clinical utility of gp73 antigen detection by the new assay for early hepatocullular carcinoma (HCC) diagnosis, for risk-assessment of patients at high risk for progression of their liver disease, and for prognosis.
Pancreatic Reference Set Application: Brian Haab-Van Andel (2012) — EDRN Public Portal
New markers are greatly needed for the detection and diagnosis of pancreatic cancer. Patients at high risk for developing pancreatic cancer (for, example because of genetic predisposition or health status) can be screened by endoscopy or a related imaging procedure, but these methods are expensive and burdensome to the patient. Blood-based markers would facilitate regular screening. In addition, patients with known abnormalities of the pancreas (for example, as observed incidentally from an abdominal scan) need to determine whether they have cancer or not. The great majority of patients with pancreatic findings by CT do not have conditions that require treatment, yet nearly all patients undergo invasive and burdensome procedures as a consequence of the CT. Again, a blood-based marker could alleviate this situation and potentially add accuracy to the diagnosis. In preliminary work we showed the potential for highly-accurate discrimination of pancreatic cancer from pancreatitis and healthy control subjects using a panel of protein and glycan markers in the serum. We used an antibody array platform in which we can obtain sensitive, reproducible measurements of protein abundance and glycosylation status in low sample volumes. The detection of the glycosylation status is important for the high accuracy of the test because the glycans attached to the marker proteins are altered in cancer patients. Based on the good performance in these early studies, we now want to validate the performance in rigorously controlled, blinded sample sets. The reference set developed by the EDRN will enable a definitive characterization of our marker performance. In addition, we can make an accurate comparison to other markers that will be applied to the same set and determine whether disparate markers could be used together for added benefit.
Ionescu, Crina-Maria; Geidl, Stanislav; Svobodová Vařeková, Radka; Koča, Jaroslav
2013-10-28
We focused on the parametrization and evaluation of empirical models for fast and accurate calculation of conformationally dependent atomic charges in proteins. The models were based on the electronegativity equalization method (EEM), and the parametrization procedure was tailored to proteins. We used large protein fragments as reference structures and fitted the EEM model parameters using atomic charges computed by three population analyses (Mulliken, Natural, iterative Hirshfeld), at the Hartree-Fock level with two basis sets (6-31G*, 6-31G**) and in two environments (gas phase, implicit solvation). We parametrized and successfully validated 24 EEM models. When tested on insulin and ubiquitin, all models reproduced quantum mechanics level charges well and were consistent with respect to population analysis and basis set. Specifically, the models showed on average a correlation of 0.961, RMSD 0.097 e, and average absolute error per atom 0.072 e. The EEM models can be used with the freely available EEM implementation EEM_SOLVER.
Reconstruction of the experimentally supported human protein interactome: what can we learn?
Klapa, Maria I; Tsafou, Kalliopi; Theodoridis, Evangelos; Tsakalidis, Athanasios; Moschonas, Nicholas K
2013-10-02
Understanding the topology and dynamics of the human protein-protein interaction (PPI) network will significantly contribute to biomedical research, therefore its systematic reconstruction is required. Several meta-databases integrate source PPI datasets, but the protein node sets of their networks vary depending on the PPI data combined. Due to this inherent heterogeneity, the way in which the human PPI network expands via multiple dataset integration has not been comprehensively analyzed. We aim at assembling the human interactome in a global structured way and exploring it to gain insights of biological relevance. First, we defined the UniProtKB manually reviewed human "complete" proteome as the reference protein-node set and then we mined five major source PPI datasets for direct PPIs exclusively between the reference proteins. We updated the protein and publication identifiers and normalized all PPIs to the UniProt identifier level. The reconstructed interactome covers approximately 60% of the human proteome and has a scale-free structure. No apparent differentiating gene functional classification characteristics were identified for the unrepresented proteins. The source dataset integration augments the network mainly in PPIs. Polyubiquitin emerged as the highest-degree node, but the inclusion of most of its identified PPIs may be reconsidered. The high number (>300) of connections of the subsequent fifteen proteins correlates well with their essential biological role. According to the power-law network structure, the unrepresented proteins should mainly have up to four connections with equally poorly-connected interactors. Reconstructing the human interactome based on the a priori definition of the protein nodes enabled us to identify the currently included part of the human "complete" proteome, and discuss the role of the proteins within the network topology with respect to their function. As the network expansion has to comply with the scale-free theory, we suggest that the core of the human interactome has essentially emerged. Thus, it could be employed in systems biology and biomedical research, despite the considerable number of currently unrepresented proteins. The latter are probably involved in specialized physiological conditions, justifying the scarcity of related PPI information, and their identification can assist in designing relevant functional experiments and targeted text mining algorithms.
Dogra, Vivek; Bagler, Ganesh; Sreenivasulu, Yelam
2015-01-01
Podophyllum hexandrum Royle is an important high-altitude plant of Himalayas with immense medicinal value. Earlier, it was reported that the cell wall hydrolases were up accumulated during radicle protrusion step of Podophyllum seed germination. In the present study, Podophyllum seed Germination protein interaction Network (PGN) was constructed by using the differentially accumulated protein (DAP) data set of Podophyllum during the radicle protrusion step of seed germination, with reference to Arabidopsis protein–protein interaction network (AtPIN). The developed PGN is comprised of a giant cluster with 1028 proteins having 10,519 interactions and a few small clusters with relevant gene ontological signatures. In this analysis, a germination pathway related cluster which is also central to the topology and information dynamics of PGN was obtained with a set of 60 key proteins. Among these, eight proteins which are known to be involved in signaling, metabolism, protein modification, cell wall modification, and cell cycle regulation processes were found commonly highlighted in both the proteomic and interactome analysis. The systems-level analysis of PGN identified the key proteins involved in radicle protrusion step of seed germination in Podophyllum. PMID:26579141
Yeast proteome map (last update).
Perrot, Michel; Moes, Suzette; Massoni, Aurélie; Jenoe, Paul; Boucherie, Hélian
2009-10-01
The identification of proteins separated on 2-D gels is essential to exploit the full potential of 2-D gel electrophoresis for proteomic investigations. For this purpose we have undertaken the systematic identification of Saccharomyces cerevisiae proteins separated on 2-D gels. We report here the identification by mass spectrometry of 100 novel yeast protein spots that have so far not been tackled due to their scarcity on our standard 2-D gels. These identifications extend the number of protein spots identified on our yeast 2-D proteome map to 716. They correspond to 485 unique proteins. Among these, 154 were resolved into several isoforms. The present data set can now be expanded to report for the first time a map of 363 protein isoforms that significantly deepens our knowledge of the yeast proteome. The reference map and a list of all identified proteins can be accessed on the Yeast Protein Map server (www.ibgc.u-bordeaux2.fr/YPM).
2011-01-01
Background Internal control genes with highly uniform expression throughout the experimental conditions are required for accurate gene expression analysis as no universal reference genes exists. In this study, the expression stability of 24 candidate genes from Triticum aestivum cv. Cubus flag leaves grown under organic and conventional farming systems was evaluated in two locations in order to select suitable genes that can be used for normalization of real-time quantitative reverse-transcription PCR (RT-qPCR) reactions. The genes were selected among the most common used reference genes as well as genes encoding proteins involved in several metabolic pathways. Findings Individual genes displayed different expression rates across all samples assayed. Applying geNorm, a set of three potential reference genes were suitable for normalization of RT-qPCR reactions in winter wheat flag leaves cv. Cubus: TaFNRII (ferredoxin-NADP(H) oxidoreductase; AJ457980.1), ACT2 (actin 2; TC234027), and rrn26 (a putative homologue to RNA 26S gene; AL827977.1). In addition of these three genes that were also top-ranked by NormFinder, two extra genes: CYP18-2 (Cyclophilin A, AY456122.1) and TaWIN1 (14-3-3 like protein, AB042193) were most consistently stably expressed. Furthermore, we showed that TaFNRII, ACT2, and CYP18-2 are suitable for gene expression normalization in other two winter wheat varieties (Tommi and Centenaire) grown under three treatments (organic, conventional and no nitrogen) and a different environment than the one tested with cv. Cubus. Conclusions This study provides a new set of reference genes which should improve the accuracy of gene expression analyses when using wheat flag leaves as those related to the improvement of nitrogen use efficiency for cereal production. PMID:21951810
Reference intervals of citrated-native whole blood thromboelastography in premature neonates.
Motta, Mario; Guaragni, Brunetta; Pezzotti, Elena; Rodriguez-Perez, Carmen; Chirico, Gaetano
2017-12-01
Bleeding due to acquired coagulation disorders is a common complication in premature neonates. In this clinical setting, standard coagulation laboratory tests might be unsuitable to investigate the hemostatic function as they reflect the concentration of pro-coagulant proteins but not of anti-coagulant proteins. Thromboelastography (TEG), providing a more complete assessment of hemostasis, may be able to overcome some of these limitations. Unfortunately, experience on the use of TEG in premature neonates is very limited and, in particular in this population, reference ranges of TEG parameters have not been yet evaluated. To evaluate TEG in preterm neonates, and to assess their reference ranges. One hundred and eighteen preterm neonates were analyzed for TEG in a retrospective cohort study. Double-sided 95% reference intervals were calculated using a bootstrap method after Box-Cox transformation. TEG parameters were compared between early-preterm and moderate-/late-preterm neonates and between bleeding and non-bleeding preterm neonates. Comparing early-preterm with moderate-/late-preterm neonates, TEG parameters were not statistically different, except for fibrinolysis which was significantly higher in early preterm neonates. Platelet count significantly correlated with α angle and MA parameters. Bleeding and non-bleeding neonates had similar TEG values. These results reinforce the concept that in stable preterm neonates, in spite of lower concentration of pro- and anti-coagulants proteins, the hemostasis is normally balanced and well functioning. Copyright © 2017 Elsevier B.V. All rights reserved.
Czechowski, Tomasz; Stitt, Mark; Altmann, Thomas; Udvardi, Michael K.; Scheible, Wolf-Rüdiger
2005-01-01
Gene transcripts with invariant abundance during development and in the face of environmental stimuli are essential reference points for accurate gene expression analyses, such as RNA gel-blot analysis or quantitative reverse transcription-polymerase chain reaction (PCR). An exceptionally large set of data from Affymetrix ATH1 whole-genome GeneChip studies provided the means to identify a new generation of reference genes with very stable expression levels in the model plant species Arabidopsis (Arabidopsis thaliana). Hundreds of Arabidopsis genes were found that outperform traditional reference genes in terms of expression stability throughout development and under a range of environmental conditions. Most of these were expressed at much lower levels than traditional reference genes, making them very suitable for normalization of gene expression over a wide range of transcript levels. Specific and efficient primers were developed for 22 genes and tested on a diverse set of 20 cDNA samples. Quantitative reverse transcription-PCR confirmed superior expression stability and lower absolute expression levels for many of these genes, including genes encoding a protein phosphatase 2A subunit, a coatomer subunit, and an ubiquitin-conjugating enzyme. The developed PCR primers or hybridization probes for the novel reference genes will enable better normalization and quantification of transcript levels in Arabidopsis in the future. PMID:16166256
BRSCW Reference Set Application: Joe Buechler - Biosite Inc (2009) — EDRN Public Portal
Over 40 marker assays are available to run on the samples. These include markers such as Osteopontin, Mesothelin, Periostin, Endoglin, intestinal Fatty Acid Binding Protein, and FAS-Ligand, some of which have been previously described in the literature. Other proprietary markers are derived from internal discovery efforts and from collaborator programs.
Chen, Weixin; Chen, Jianye; Lu, Wangjin; Chen, Lei; Fu, Danwen
2012-01-01
Real-time reverse transcription PCR (RT-qPCR) is a preferred method for rapid and accurate quantification of gene expression studies. Appropriate application of RT-qPCR requires accurate normalization though the use of reference genes. As no single reference gene is universally suitable for all experiments, thus reference gene(s) validation under different experimental conditions is crucial for RT-qPCR analysis. To date, only a few studies on reference genes have been done in other plants but none in papaya. In the present work, we selected 21 candidate reference genes, and evaluated their expression stability in 246 papaya fruit samples using three algorithms, geNorm, NormFinder and RefFinder. The samples consisted of 13 sets collected under different experimental conditions, including various tissues, different storage temperatures, different cultivars, developmental stages, postharvest ripening, modified atmosphere packaging, 1-methylcyclopropene (1-MCP) treatment, hot water treatment, biotic stress and hormone treatment. Our results demonstrated that expression stability varied greatly between reference genes and that different suitable reference gene(s) or combination of reference genes for normalization should be validated according to the experimental conditions. In general, the internal reference genes EIF (Eukaryotic initiation factor 4A), TBP1 (TATA binding protein 1) and TBP2 (TATA binding protein 2) genes had a good performance under most experimental conditions, whereas the most widely present used reference genes, ACTIN (Actin 2), 18S rRNA (18S ribosomal RNA) and GAPDH (Glyceraldehyde-3-phosphate dehydrogenase) were not suitable in many experimental conditions. In addition, two commonly used programs, geNorm and Normfinder, were proved sufficient for the validation. This work provides the first systematic analysis for the selection of superior reference genes for accurate transcript normalization in papaya under different experimental conditions. PMID:22952972
Rocha-Martins, Maurício; Njaine, Brian; Silveira, Mariana S
2012-01-01
Housekeeping genes have been commonly used as reference to normalize gene expression and protein content data because of its presumed constitutive expression. In this paper, we challenge the consensual idea that housekeeping genes are reliable controls for expression studies in the retina through the investigation of a panel of reference genes potentially suitable for analysis of different stages of retinal development. We applied statistical tools on combinations of retinal developmental stages to assess the most stable internal controls for quantitative RT-PCR (qRT-PCR). The stability of expression of seven putative reference genes (Actb, B2m, Gapdh, Hprt1, Mapk1, Ppia and Rn18s) was analyzed using geNorm, BestKeeper and Normfinder software. In addition, several housekeeping genes were tested as loading controls for Western blot in the same sample panel, using Image J. Overall, for qRT-PCR the combination of Gapdh and Mapk1 showed the highest stability for most experimental sets. Actb was downregulated in more mature stages, while Rn18s and Hprt1 showed the highest variability. We normalized the expression of cyclin D1 using various reference genes and demonstrated that spurious results may result from blind selection of internal controls. For Western blot significant variation could be seen among four putative internal controls (β-actin, cyclophilin b, α-tubulin and lamin A/C), while MAPK1 was stably expressed. Putative housekeeping genes exhibit significant variation in both mRNA and protein content during retinal development. Our results showed that distinct combinations of internal controls fit for each experimental set in the case of qRT-PCR and that MAPK1 is a reliable loading control for Western blot. The results indicate that biased study outcomes may follow the use of reference genes without prior validation for qRT-PCR and Western blot.
Grobei, Monica A.; Qeli, Ermir; Brunner, Erich; Rehrauer, Hubert; Zhang, Runxuan; Roschitzki, Bernd; Basler, Konrad; Ahrens, Christian H.; Grossniklaus, Ueli
2009-01-01
Pollen, the male gametophyte of flowering plants, represents an ideal biological system to study developmental processes, such as cell polarity, tip growth, and morphogenesis. Upon hydration, the metabolically quiescent pollen rapidly switches to an active state, exhibiting extremely fast growth. This rapid switch requires relevant proteins to be stored in the mature pollen, where they have to retain functionality in a desiccated environment. Using a shotgun proteomics approach, we unambiguously identified ∼3500 proteins in Arabidopsis pollen, including 537 proteins that were not identified in genetic or transcriptomic studies. To generate this comprehensive reference data set, which extends the previously reported pollen proteome by a factor of 13, we developed a novel deterministic peptide classification scheme for protein inference. This generally applicable approach considers the gene model–protein sequence–protein accession relationships. It allowed us to classify and eliminate ambiguities inherently associated with any shotgun proteomics data set, to report a conservative list of protein identifications, and to seamlessly integrate data from previous transcriptomics studies. Manual validation of proteins unambiguously identified by a single, information-rich peptide enabled us to significantly reduce the false discovery rate, while keeping valuable identifications of shorter and lower abundant proteins. Bioinformatic analyses revealed a higher stability of pollen proteins compared to those of other tissues and implied a protein family of previously unknown function in vesicle trafficking. Interestingly, the pollen proteome is most similar to that of seeds, indicating physiological similarities between these developmentally distinct tissues. PMID:19546170
The use of experimental structures to model protein dynamics.
Katebi, Ataur R; Sankar, Kannan; Jia, Kejue; Jernigan, Robert L
2015-01-01
The number of solved protein structures submitted in the Protein Data Bank (PDB) has increased dramatically in recent years. For some specific proteins, this number is very high-for example, there are over 550 solved structures for HIV-1 protease, one protein that is essential for the life cycle of human immunodeficiency virus (HIV) which causes acquired immunodeficiency syndrome (AIDS) in humans. The large number of structures for the same protein and its variants include a sample of different conformational states of the protein. A rich set of structures solved experimentally for the same protein has information buried within the dataset that can explain the functional dynamics and structural mechanism of the protein. To extract the dynamics information and functional mechanism from the experimental structures, this chapter focuses on two methods-Principal Component Analysis (PCA) and Elastic Network Models (ENM). PCA is a widely used statistical dimensionality reduction technique to classify and visualize high-dimensional data. On the other hand, ENMs are well-established simple biophysical method for modeling the functionally important global motions of proteins. This chapter covers the basics of these two. Moreover, an improved ENM version that utilizes the variations found within a given set of structures for a protein is described. As a practical example, we have extracted the functional dynamics and mechanism of HIV-1 protease dimeric structure by using a set of 329 PDB structures of this protein. We have described, step by step, how to select a set of protein structures, how to extract the needed information from the PDB files for PCA, how to extract the dynamics information using PCA, how to calculate ENM modes, how to measure the congruency between the dynamics computed from the principal components (PCs) and the ENM modes, and how to compute entropies using the PCs. We provide the computer programs or references to software tools to accomplish each step and show how to use these programs and tools. We also include computer programs to generate movies based on PCs and ENM modes and describe how to visualize them.
The Use of Experimental Structures to Model Protein Dynamics
Katebi, Ataur R.; Sankar, Kannan; Jia, Kejue; Jernigan, Robert L.
2014-01-01
Summary The number of solved protein structures submitted in the Protein Data Bank (PDB) has increased dramatically in recent years. For some specific proteins, this number is very high – for example, there are over 550 solved structures for HIV-1 protease, one protein that is essential for the life cycle of human immunodeficiency virus (HIV) which causes acquired immunodeficiency syndrome (AIDS) in humans. The large number of structures for the same protein and its variants include a sample of different conformational states of the protein. A rich set of structures solved experimentally for the same protein has information buried within the dataset that can explain the functional dynamics and structural mechanism of the protein. To extract the dynamics information and functional mechanism from the experimental structures, this chapter focuses on two methods – Principal Component Analysis (PCA) and Elastic Network Models (ENM). PCA is a widely used statistical dimensionality reduction technique to classify and visualize high-dimensional data. On the other hand, ENMs are well-established simple biophysical method for modeling the functionally important global motions of proteins. This chapter covers the basics of these two. Moreover, an improved ENM version that utilizes the variations found within a given set of structures for a protein is described. As a practical example, we have extracted the functional dynamics and mechanism of HIV-1 protease dimeric structure by using a set of 329 PDB structures of this protein. We have described, step by step, how to select a set of protein structures, how to extract the needed information from the PDB files for PCA, how to extract the dynamics information using PCA, how to calculate ENM modes, how to measure the congruency between the dynamics computed from the principal components (PCs) and the ENM modes, and how to compute entropies using the PCs. We provide the computer programs or references to software tools to accomplish each step and show how to use these programs and tools. We also include computer programs to generate movies based on PCs and ENM modes and describe how to visualize them. PMID:25330965
Reconstruction of the experimentally supported human protein interactome: what can we learn?
2013-01-01
Background Understanding the topology and dynamics of the human protein-protein interaction (PPI) network will significantly contribute to biomedical research, therefore its systematic reconstruction is required. Several meta-databases integrate source PPI datasets, but the protein node sets of their networks vary depending on the PPI data combined. Due to this inherent heterogeneity, the way in which the human PPI network expands via multiple dataset integration has not been comprehensively analyzed. We aim at assembling the human interactome in a global structured way and exploring it to gain insights of biological relevance. Results First, we defined the UniProtKB manually reviewed human “complete” proteome as the reference protein-node set and then we mined five major source PPI datasets for direct PPIs exclusively between the reference proteins. We updated the protein and publication identifiers and normalized all PPIs to the UniProt identifier level. The reconstructed interactome covers approximately 60% of the human proteome and has a scale-free structure. No apparent differentiating gene functional classification characteristics were identified for the unrepresented proteins. The source dataset integration augments the network mainly in PPIs. Polyubiquitin emerged as the highest-degree node, but the inclusion of most of its identified PPIs may be reconsidered. The high number (>300) of connections of the subsequent fifteen proteins correlates well with their essential biological role. According to the power-law network structure, the unrepresented proteins should mainly have up to four connections with equally poorly-connected interactors. Conclusions Reconstructing the human interactome based on the a priori definition of the protein nodes enabled us to identify the currently included part of the human “complete” proteome, and discuss the role of the proteins within the network topology with respect to their function. As the network expansion has to comply with the scale-free theory, we suggest that the core of the human interactome has essentially emerged. Thus, it could be employed in systems biology and biomedical research, despite the considerable number of currently unrepresented proteins. The latter are probably involved in specialized physiological conditions, justifying the scarcity of related PPI information, and their identification can assist in designing relevant functional experiments and targeted text mining algorithms. PMID:24088582
Distance-Based Configurational Entropy of Proteins from Molecular Dynamics Simulations
Fogolari, Federico; Corazza, Alessandra; Fortuna, Sara; Soler, Miguel Angel; VanSchouwen, Bryan; Brancolini, Giorgia; Corni, Stefano; Melacini, Giuseppe; Esposito, Gennaro
2015-01-01
Estimation of configurational entropy from molecular dynamics trajectories is a difficult task which is often performed using quasi-harmonic or histogram analysis. An entirely different approach, proposed recently, estimates local density distribution around each conformational sample by measuring the distance from its nearest neighbors. In this work we show this theoretically well grounded the method can be easily applied to estimate the entropy from conformational sampling. We consider a set of systems that are representative of important biomolecular processes. In particular: reference entropies for amino acids in unfolded proteins are obtained from a database of residues not participating in secondary structure elements;the conformational entropy of folding of β2-microglobulin is computed from molecular dynamics simulations using reference entropies for the unfolded state;backbone conformational entropy is computed from molecular dynamics simulations of four different states of the EPAC protein and compared with order parameters (often used as a measure of entropy);the conformational and rototranslational entropy of binding is computed from simulations of 20 tripeptides bound to the peptide binding protein OppA and of β2-microglobulin bound to a citrate coated gold surface. This work shows the potential of the method in the most representative biological processes involving proteins, and provides a valuable alternative, principally in the shown cases, where other approaches are problematic. PMID:26177039
Distance-Based Configurational Entropy of Proteins from Molecular Dynamics Simulations.
Fogolari, Federico; Corazza, Alessandra; Fortuna, Sara; Soler, Miguel Angel; VanSchouwen, Bryan; Brancolini, Giorgia; Corni, Stefano; Melacini, Giuseppe; Esposito, Gennaro
2015-01-01
Estimation of configurational entropy from molecular dynamics trajectories is a difficult task which is often performed using quasi-harmonic or histogram analysis. An entirely different approach, proposed recently, estimates local density distribution around each conformational sample by measuring the distance from its nearest neighbors. In this work we show this theoretically well grounded the method can be easily applied to estimate the entropy from conformational sampling. We consider a set of systems that are representative of important biomolecular processes. In particular: reference entropies for amino acids in unfolded proteins are obtained from a database of residues not participating in secondary structure elements;the conformational entropy of folding of β2-microglobulin is computed from molecular dynamics simulations using reference entropies for the unfolded state;backbone conformational entropy is computed from molecular dynamics simulations of four different states of the EPAC protein and compared with order parameters (often used as a measure of entropy);the conformational and rototranslational entropy of binding is computed from simulations of 20 tripeptides bound to the peptide binding protein OppA and of β2-microglobulin bound to a citrate coated gold surface. This work shows the potential of the method in the most representative biological processes involving proteins, and provides a valuable alternative, principally in the shown cases, where other approaches are problematic.
Clustering evolving proteins into homologous families.
Chan, Cheong Xin; Mahbob, Maisarah; Ragan, Mark A
2013-04-08
Clustering sequences into groups of putative homologs (families) is a critical first step in many areas of comparative biology and bioinformatics. The performance of clustering approaches in delineating biologically meaningful families depends strongly on characteristics of the data, including content bias and degree of divergence. New, highly scalable methods have recently been introduced to cluster the very large datasets being generated by next-generation sequencing technologies. However, there has been little systematic investigation of how characteristics of the data impact the performance of these approaches. Using clusters from a manually curated dataset as reference, we examined the performance of a widely used graph-based Markov clustering algorithm (MCL) and a greedy heuristic approach (UCLUST) in delineating protein families coded by three sets of bacterial genomes of different G+C content. Both MCL and UCLUST generated clusters that are comparable to the reference sets at specific parameter settings, although UCLUST tends to under-cluster compositionally biased sequences (G+C content 33% and 66%). Using simulated data, we sought to assess the individual effects of sequence divergence, rate heterogeneity, and underlying G+C content. Performance decreased with increasing sequence divergence, decreasing among-site rate variation, and increasing G+C bias. Two MCL-based methods recovered the simulated families more accurately than did UCLUST. MCL using local alignment distances is more robust across the investigated range of sequence features than are greedy heuristics using distances based on global alignment. Our results demonstrate that sequence divergence, rate heterogeneity and content bias can individually and in combination affect the accuracy with which MCL and UCLUST can recover homologous protein families. For application to data that are more divergent, and exhibit higher among-site rate variation and/or content bias, MCL may often be the better choice, especially if computational resources are not limiting.
Promoting gene expression in plants by permissive histone lysine methylation
Millar, Tony; Finnegan, E Jean
2009-01-01
Plants utilize sophisticated epigenetic regulatory mechanisms to coordinate changes in gene expression during development and in response to environmental stimuli. Epigenetics refers to the modification of DNA and chromatin associated proteins, which affect gene expression and cell function, without changing the DNA sequence. Such modifications are inherited through mitosis, and in rare instances through meiosis, although it can be reversible and thus regulatory. Epigenetic modifications are controlled by groups of proteins, such as the family of histone lysine methytransferases (HKMTs). The catalytic core known as the SET domain encodes HKMT activity and either promotes or represses gene expression. A large family of SET domain proteins is present in Arabidopsis where there is growing evidence that two classes of these genes are involved in promoting gene expression in a diverse range of developmental processes. This review will focus on the function of these two classes and the processes that they control, highlighting the huge potential this regulatory mechanism has in plants. PMID:19816124
Interactome of the hepatitis C virus: Literature mining with ANDSystem.
Saik, Olga V; Ivanisenko, Timofey V; Demenkov, Pavel S; Ivanisenko, Vladimir A
2016-06-15
A study of the molecular genetics mechanisms of host-pathogen interactions is of paramount importance in developing drugs against viral diseases. Currently, the literature contains a huge amount of information that describes interactions between HCV and human proteins. In addition, there are many factual databases that contain experimentally verified data on HCV-host interactions. The sources of such data are the original data along with the data manually extracted from the literature. However, the manual analysis of scientific publications is time consuming and, because of this, databases created with such an approach often do not have complete information. One of the most promising methods to provide actualisation and completeness of information is text mining. Here, with the use of a previously developed method by the authors using ANDSystem, an automated extraction of information on the interactions between HCV and human proteins was conducted. As a data source for the text mining approach, PubMed abstracts and full text articles were used. Additionally, external factual databases were analyzed. On the basis of this analysis, a special version of ANDSystem, extended with the HCV interactome, was created. The HCV interactome contains information about the interactions between 969 human and 11 HCV proteins. Among the 969 proteins, 153 'new' proteins were found not previously referred to in any external databases of protein-protein interactions for HCV-host interactions. Thus, the extended ANDSystem possesses a more comprehensive detailing of HCV-host interactions versus other existing databases. It was interesting that HCV proteins more preferably interact with human proteins that were already involved in a large number of protein-protein interactions as well as those associated with many diseases. Among human proteins of the HCV interactome, there were a large number of proteins regulated by microRNAs. It turned out that the results obtained for protein-protein interactions and microRNA-regulation did not depend on how well the proteins were studied, while protein-disease interactions appeared to be dependent on the level of study. In particular, the mean number of diseases linked to well-studied proteins (proteins were considered well-studied if they were mentioned in 50 or more PubMed publications) from the HCV interactome was 20.8, significantly exceeding the mean number of associations with diseases (10.1) for the total set of well-studied human proteins present in ANDSystem. For proteins not highly poorly-studied investigated, proteins from the HCV interactome (each protein was referred to in less than 50 publications) distribution of the number of diseases associated with them had no statistically significant differences from the distribution of the number of diseases associated with poorly-studied proteins based on the total set of human proteins stored in ANDSystem. With this, the average number of associations with diseases for the HCV interactome and the total set of human proteins were 0.3 and 0.2, respectively. Thus, ANDSystem, extended with the HCV interactome, can be helpful in a wide range of issues related to analyzing HCV-host interactions in the search for anti-HCV drug targets. The demo version of the extended ANDSystem covered here containing only interactions between human proteins, genes, metabolites, diseases, miRNAs and molecular-genetic pathways, as well as interactions between human proteins/genes and HCV proteins, is freely available at the following web address: http://www-bionet.sscc.ru/psd/andhcv/. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
ORCAN-a web-based meta-server for real-time detection and functional annotation of orthologs.
Zielezinski, Andrzej; Dziubek, Michal; Sliski, Jan; Karlowski, Wojciech M
2017-04-15
ORCAN (ORtholog sCANner) is a web-based meta-server for one-click evolutionary and functional annotation of protein sequences. The server combines information from the most popular orthology-prediction resources, including four tools and four online databases. Functional annotation utilizes five additional comparisons between the query and identified homologs, including: sequence similarity, protein domain architectures, functional motifs, Gene Ontology term assignments and a list of associated articles. Furthermore, the server uses a plurality-based rating system to evaluate the orthology relationships and to rank the reference proteins by their evolutionary and functional relevance to the query. Using a dataset of ∼1 million true yeast orthologs as a sample reference set, we show that combining multiple orthology-prediction tools in ORCAN increases the sensitivity and precision by 1-2 percent points. The service is available for free at http://www.combio.pl/orcan/ . wmk@amu.edu.pl. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
DOE Office of Scientific and Technical Information (OSTI.GOV)
Haab, Brian B.; Geierstanger, Bernhard H.; Michailidis, George
2005-08-01
Four different immunoassay and antibody microarray methods performed at four different sites were used to measure the levels of a broad range of proteins (N = 323 assays; 39, 88, 168, and 28 assays at the respective sites; 237 unique analytes) in the human serum and plasma reference specimens distributed by the Plasma Proteome Project (PPP) of the HUPO. The methods provided a means to (1) assess the level of systematic variation in protein abundances associated with blood preparation methods (serum, citrate-anticoagulated-plasma, EDTA-anticoagulated-plasma, or heparin-anticoagulated-plasma) and (2) evaluate the dependence on concentration of MS-based protein identifications from data sets usingmore » the HUPO specimens. Some proteins, particularly cytokines, had highly variable concentrations between the different sample preparations, suggesting specific effects of certain anticoagulants on the stability or availability of these proteins. The linkage of antibody-based measurements from 66 different analytes with the combined MS/MS data from 18 different laboratories showed that protein detection and the quality of MS data increased with analyte concentration. The conclusions from these initial analyses are that the optimal blood preparation method is variable between analytes and that the discovery of blood proteins by MS can be extended to concentrations below the ng/mL range under certain circumstances. Continued developments in antibody-based methods will further advance the scientific goals of the PPP.« less
Wimmer, Helge; Gundacker, Nina C; Griss, Johannes; Haudek, Verena J; Stättner, Stefan; Mohr, Thomas; Zwickl, Hannes; Paulitschke, Verena; Baron, David M; Trittner, Wolfgang; Kubicek, Markus; Bayer, Editha; Slany, Astrid; Gerner, Christopher
2009-06-01
Interpretation of proteome data with a focus on biomarker discovery largely relies on comparative proteome analyses. Here, we introduce a database-assisted interpretation strategy based on proteome profiles of primary cells. Both 2-D-PAGE and shotgun proteomics are applied. We obtain high data concordance with these two different techniques. When applying mass analysis of tryptic spot digests from 2-D gels of cytoplasmic fractions, we typically identify several hundred proteins. Using the same protein fractions, we usually identify more than thousand proteins by shotgun proteomics. The data consistency obtained when comparing these independent data sets exceeds 99% of the proteins identified in the 2-D gels. Many characteristic differences in protein expression of different cells can thus be independently confirmed. Our self-designed SQL database (CPL/MUW - database of the Clinical Proteomics Laboratories at the Medical University of Vienna accessible via www.meduniwien.ac.at/proteomics/database) facilitates (i) quality management of protein identification data, which are based on MS, (ii) the detection of cell type-specific proteins and (iii) of molecular signatures of specific functional cell states. Here, we demonstrate, how the interpretation of proteome profiles obtained from human liver tissue and hepatocellular carcinoma tissue is assisted by the Clinical Proteomics Laboratories at the Medical University of Vienna-database. Therefore, we suggest that the use of reference experiments supported by a tailored database may substantially facilitate data interpretation of proteome profiling experiments.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Willard, K.E.; Thorsrud, A.K.; Munthe, E.
Human leukocyte proteins from more than 150 patients with rheumatoid arthritis, together with age- and sex-matched controls, were analyzed by use of the ISO-DALT technique of two-dimensional polyacrylamide gel electrophoresis. Patients with ankylosing spondylitis, polymyalgia rheumatica, psoriatic arthritis, calcium tendinitis, post-infectious arthritis, and asymmetrical seronegative arthritis were also included as positive controls. Synthesis of several proteins, referred to by number as members of the Rheuma set, is shown to increase in the leukocyte preparations from patients with classical rheumatoid arthritis. Several of these proteins are specific to monocytes or granulocytes; others are of unknown cellular origin, but appear to bemore » unique to rheumatoid arthritis. The Rheuma proteins appear to be indicators of disease activity, because their increased synthesis can be correlated with sedimentation rate and other clinical indices of rheumatoid disease activity.« less
Umar, Arzu; Kang, Hyuk; Timmermans, Annemieke M; Look, Maxime P; Meijer-van Gelder, Marion E; den Bakker, Michael A; Jaitly, Navdeep; Martens, John W M; Luider, Theo M; Foekens, John A; Pasa-Tolić, Ljiljana
2009-06-01
Tamoxifen resistance is a major cause of death in patients with recurrent breast cancer. Current clinical factors can correctly predict therapy response in only half of the treated patients. Identification of proteins that are associated with tamoxifen resistance is a first step toward better response prediction and tailored treatment of patients. In the present study we intended to identify putative protein biomarkers indicative of tamoxifen therapy resistance in breast cancer using nano-LC coupled with FTICR MS. Comparative proteome analysis was performed on approximately 5,500 pooled tumor cells (corresponding to approximately 550 ng of protein lysate/analysis) obtained through laser capture microdissection (LCM) from two independently processed data sets (n = 24 and n = 27) containing both tamoxifen therapy-sensitive and therapy-resistant tumors. Peptides and proteins were identified by matching mass and elution time of newly acquired LC-MS features to information in previously generated accurate mass and time tag reference databases. A total of 17,263 unique peptides were identified that corresponded to 2,556 non-redundant proteins identified with > or = 2 peptides. 1,713 overlapping proteins between the two data sets were used for further analysis. Comparative proteome analysis revealed 100 putatively differentially abundant proteins between tamoxifen-sensitive and tamoxifen-resistant tumors. The presence and relative abundance for 47 differentially abundant proteins were verified by targeted nano-LC-MS/MS in a selection of unpooled, non-microdissected discovery set tumor tissue extracts. ENPP1, EIF3E, and GNB4 were significantly associated with progression-free survival upon tamoxifen treatment for recurrent disease. Differential abundance of our top discriminating protein, extracellular matrix metalloproteinase inducer, was validated by tissue microarray in an independent patient cohort (n = 156). Extracellular matrix metalloproteinase inducer levels were higher in therapy-resistant tumors and significantly associated with an earlier tumor progression following first line tamoxifen treatment (hazard ratio, 1.87; 95% confidence interval, 1.25-2.80; p = 0.002). In summary, comparative proteomics performed on laser capture microdissection-derived breast tumor cells using nano-LC-FTICR MS technology revealed a set of putative biomarkers associated with tamoxifen therapy resistance in recurrent breast cancer.
Using RNA-seq data to select reference genes for normalizing gene expression in apple roots.
Zhou, Zhe; Cong, Peihua; Tian, Yi; Zhu, Yanmin
2017-01-01
Gene expression in apple roots in response to various stress conditions is a less-explored research subject. Reliable reference genes for normalizing quantitative gene expression data have not been carefully investigated. In this study, the suitability of a set of 15 apple genes were evaluated for their potential use as reliable reference genes. These genes were selected based on their low variance of gene expression in apple root tissues from a recent RNA-seq data set, and a few previously reported apple reference genes for other tissue types. Four methods, Delta Ct, geNorm, NormFinder and BestKeeper, were used to evaluate their stability in apple root tissues of various genotypes and under different experimental conditions. A small panel of stably expressed genes, MDP0000095375, MDP0000147424, MDP0000233640, MDP0000326399 and MDP0000173025 were recommended for normalizing quantitative gene expression data in apple roots under various abiotic or biotic stresses. When the most stable and least stable reference genes were used for data normalization, significant differences were observed on the expression patterns of two target genes, MdLecRLK5 (MDP0000228426, a gene encoding a lectin receptor like kinase) and MdMAPK3 (MDP0000187103, a gene encoding a mitogen-activated protein kinase). Our data also indicated that for those carefully validated reference genes, a single reference gene is sufficient for reliable normalization of the quantitative gene expression. Depending on the experimental conditions, the most suitable reference genes can be specific to the sample of interest for more reliable RT-qPCR data normalization.
Using RNA-seq data to select reference genes for normalizing gene expression in apple roots
Zhou, Zhe; Cong, Peihua; Tian, Yi
2017-01-01
Gene expression in apple roots in response to various stress conditions is a less-explored research subject. Reliable reference genes for normalizing quantitative gene expression data have not been carefully investigated. In this study, the suitability of a set of 15 apple genes were evaluated for their potential use as reliable reference genes. These genes were selected based on their low variance of gene expression in apple root tissues from a recent RNA-seq data set, and a few previously reported apple reference genes for other tissue types. Four methods, Delta Ct, geNorm, NormFinder and BestKeeper, were used to evaluate their stability in apple root tissues of various genotypes and under different experimental conditions. A small panel of stably expressed genes, MDP0000095375, MDP0000147424, MDP0000233640, MDP0000326399 and MDP0000173025 were recommended for normalizing quantitative gene expression data in apple roots under various abiotic or biotic stresses. When the most stable and least stable reference genes were used for data normalization, significant differences were observed on the expression patterns of two target genes, MdLecRLK5 (MDP0000228426, a gene encoding a lectin receptor like kinase) and MdMAPK3 (MDP0000187103, a gene encoding a mitogen-activated protein kinase). Our data also indicated that for those carefully validated reference genes, a single reference gene is sufficient for reliable normalization of the quantitative gene expression. Depending on the experimental conditions, the most suitable reference genes can be specific to the sample of interest for more reliable RT-qPCR data normalization. PMID:28934340
Fang, Yi-Kai; Chien, Kun-Yi; Huang, Kuo-Yang; Cheng, Wei-Hung; Ku, Fu-Mann; Lin, Rose; Chen, Ting-Wen; Huang, Po-Jung; Chiu, Cheng-Hsun; Tang, Petrus
2016-11-01
Pentatrichomonas hominis is an anaerobic flagellated protist that colonizes the large intestine of a number of mammals, including cats, dogs, nonhuman primates, and humans. The wide host range of this organism is alarming and suggests a rising zoonotic emergency. However, knowledge on in-depth biology of this protist is still limited. Similar to the human pathogen, Trichomonas vaginalis, P. hominis possesses hydrogenosomes instead of mitochondria. Studies in T. vaginalis indicated that hydrogenosome is essential for cell survival and associated with numerous pivotal biological functions, including drug resistance. To further decipher the biology of this important organelle, we undertook proteomic research in P. hominis hydrogenosomes. Lacking a decoded P. hominis genome, we utilized an RNA sequencing (RNA-seq) data set generated from P. hominis axenic culture as the reference for proteome analysis. Using this in-house reference data set and mass spectrometry (MS), we identified 442 putative hydrogenosomal proteins. Interestingly, the composition of the P. hominis hydrogenosomal proteins is very similar to that of T. vaginalis, but proteins such as Hmp36, Pam16, Pam18, and Isd11 are absent based on both MS and the RNA-seq. Our data underscore that P. hominis expresses different homologs of multiple gene families from T. vaginalis. To the best of our knowledge, we present here the first hydrogenosome proteome in a protist other than T. vaginalis that offers crucial new scholarship for global health, therapeutics, diagnostics, and veterinary medicine research. In addition, the research strategy used here using RNA sequencing and proteomics might inform future multi-omics research in other understudied organisms without decoded genomes.
Dietary protein for athletes: from requirements to metabolic advantage.
Phillips, Stuart M
2006-12-01
The Dietary Reference Intakes (DRI) specify that the requirement for dietary protein for all individuals aged 19 y and older is 0.8 g protein.kg-1.d-1. This Recommended Dietary Allowance (RDA) is cited as adequate for all persons. This amount of protein would be considered by many athletes as the amount to be consumed in a single meal, particularly for strength-training athletes. There does exist, however, published data to suggest that individuals habitually performing resistance and (or) endurance exercise require more protein than their sedentary counterparts. The RDA values for protein are clearly set at "...the level of protein judged to be adequate... to meet the known nutrient needs for practically all healthy people...". The RDA covers protein losses with margins for inter-individual variability and protein quality; the notion of consumption of excess protein above these levels to cover increased needs owing to physical activity is not, however, given any credence. Notwithstanding, diet programs (i.e., energy restriction) espousing the virtue of high protein enjoy continued popularity. A number of well-controlled studies are now published in which "higher" protein diets have been shown to be effective in promoting weight reduction, particularly fat loss. The term "higher" refers to a diet that has people consuming more than the general populations' average intake of approximately 15% of energy from protein, e.g., as much as 30%-35%, which is within an Acceptable Macronutrient Distribution Range (AMDR) as laid out in the DRIs. Of relevance to athletes and those in clinical practice is the fact that higher protein diets have quite consistently been shown to result in greater weight loss, greater fat loss, and preservation of lean mass as compared with "lower" protein diets. A framework for understanding dietary protein intake within the context of weight loss and athletic performance is laid out.
Petyuk, Vladislav A.; Qian, Wei-Jun; Hinault, Charlotte; Gritsenko, Marina A.; Singhal, Mudita; Monroe, Matthew E.; Camp, David G.; Kulkarni, Rohit N.; Smith, Richard D.
2009-01-01
The pancreatic islets of Langerhans, and especially the insulin-producing beta cells, play a central role in the maintenance of glucose homeostasis. Alterations in the expression of multiple proteins in the islets that contribute to the maintenance of islet function are likely to underlie the pathogenesis of type 2 diabetes. To identify proteins that constitute the islet proteome, we provide the first comprehensive proteomic characterization of pancreatic islets for mouse, the most commonly used animal model in diabetes research. Using strong cation exchange fractionation coupled with reversed phase LC-MS/MS we report the confident identification of 17,350 different tryptic peptides covering 2,612 proteins having at least two unique peptides per protein. The dataset also identified ~60 post-translationally modified peptides including oxidative modifications and phosphorylation. While many of the identified phosphorylation sites corroborate those previously known, the oxidative modifications observed on cysteinyl residues reveal potentially novel information suggesting a role for oxidative stress in islet function. Comparative analysis with 15 available proteomic datasets from other mouse tissues and cells revealed a set of 133 proteins predominantly expressed in pancreatic islets. This unique set of proteins, in addition to those with known functions such as peptide hormones secreted from the islets, contains several proteins with as yet unknown functions. The mouse islet protein and peptide database accessible at http://ncrr.pnl.gov, provides an important reference resource for the research community to facilitate research in the diabetes and metabolism fields. PMID:18570455
Gamir, Jordi; Darwiche, Rabih; Van't Hof, Pieter; Choudhary, Vineet; Stumpe, Michael; Schneiter, Roger; Mauch, Felix
2017-02-01
Pathogenesis-related proteins played a pioneering role 50 years ago in the discovery of plant innate immunity as a set of proteins that accumulated upon pathogen challenge. The most abundant of these proteins, PATHOGENESIS-RELATED 1 (PR-1) encodes a small antimicrobial protein that has become, as a marker of plant immune signaling, one of the most referred to plant proteins. The biochemical activity and mode of action of PR-1 proteins has remained elusive, however. Here, we provide genetic and biochemical evidence for the capacity of PR-1 proteins to bind sterols, and demonstrate that the inhibitory effect on pathogen growth is caused by the sequestration of sterol from pathogens. In support of our findings, sterol-auxotroph pathogens such as the oomycete Phytophthora are particularly sensitive to PR-1, whereas sterol-prototroph fungal pathogens become highly sensitive only when sterol biosynthesis is compromised. Our results are in line with previous findings showing that plants with enhanced PR-1 expression are particularly well protected against oomycete pathogens. © 2016 The Authors The Plant Journal © 2016 John Wiley & Sons Ltd.
Lung Reference Set A Application: Dawn Coverley- University of York (2011) — EDRN Public Portal
A variant of the nuclear matrix factor Ciz1 is prevalent in lung cancer cell lines and tumours, but not in adjacent lung tissue, giving rise to a protein that is stable enough to be detected in just one ul of plasma. This project evaluates the potential of variant Ciz1 as an early detection tool for lung cancer, using variant-selective antibodies.
Wang, Zihao; Lorin, Clarisse; Koutsoukos, Marguerite; Franco, David; Bayat, Babak; Zhang, Ying; Carfi, Andrea; Barnett, Susan W.; Porter, Frederick
2016-01-01
Two HIV-1 subtype C gp120 protein candidates were the selected antigens for several experimental vaccine regimens now under evaluation in HVTN 100 Phase I/II clinical trial aiming to support the start of the HVTN 702 Phase IIb/III trial in southern Africa, which is designed to confirm and extend the partial protection seen against HIV-1 infection in the RV144 Thai trial. Here, we report the comprehensive physicochemical characterization of the gp120 reference materials that are representative of the clinical trial materials. Gp120 proteins were stably expressed in Chinese Hamster Ovary (CHO) cells and subsequently purified and formulated. A panel of analytical techniques was used to characterize the physicochemical properties of the two protein molecules. When formulated in the AS01 Adjuvant System, the bivalent subtype C gp120 antigens elicited 1086.C- and TV1.C-specific binding antibody and CD4+ T cell responses in mice. All the characteristics were highly representative of the Clinical Trial Materials (CTM). Data from this report demonstrate the immunogenicity of the gp120 antigens, provide comprehensive characterization of the molecules, set the benchmark for assessment of current and future CTM lots, and lay the physicochemical groundwork for interpretation of future clinical trial data. PMID:27187483
Automatic Classification of Protein Structure Using the Maximum Contact Map Overlap Metric
DOE Office of Scientific and Technical Information (OSTI.GOV)
Andonov, Rumen; Djidjev, Hristo Nikolov; Klau, Gunnar W.
In this paper, we propose a new distance measure for comparing two protein structures based on their contact map representations. We show that our novel measure, which we refer to as the maximum contact map overlap (max-CMO) metric, satisfies all properties of a metric on the space of protein representations. Having a metric in that space allows one to avoid pairwise comparisons on the entire database and, thus, to significantly accelerate exploring the protein space compared to no-metric spaces. We show on a gold standard superfamily classification benchmark set of 6759 proteins that our exact k-nearest neighbor (k-NN) scheme classifiesmore » up to 224 out of 236 queries correctly and on a larger, extended version of the benchmark with 60; 850 additional structures, up to 1361 out of 1369 queries. Finally, our k-NN classification thus provides a promising approach for the automatic classification of protein structures based on flexible contact map overlap alignments.« less
Automatic Classification of Protein Structure Using the Maximum Contact Map Overlap Metric
Andonov, Rumen; Djidjev, Hristo Nikolov; Klau, Gunnar W.; ...
2015-10-09
In this paper, we propose a new distance measure for comparing two protein structures based on their contact map representations. We show that our novel measure, which we refer to as the maximum contact map overlap (max-CMO) metric, satisfies all properties of a metric on the space of protein representations. Having a metric in that space allows one to avoid pairwise comparisons on the entire database and, thus, to significantly accelerate exploring the protein space compared to no-metric spaces. We show on a gold standard superfamily classification benchmark set of 6759 proteins that our exact k-nearest neighbor (k-NN) scheme classifiesmore » up to 224 out of 236 queries correctly and on a larger, extended version of the benchmark with 60; 850 additional structures, up to 1361 out of 1369 queries. Finally, our k-NN classification thus provides a promising approach for the automatic classification of protein structures based on flexible contact map overlap alignments.« less
Experimental Protein Structure Verification by Scoring with a Single, Unassigned NMR Spectrum.
Courtney, Joseph M; Ye, Qing; Nesbitt, Anna E; Tang, Ming; Tuttle, Marcus D; Watt, Eric D; Nuzzio, Kristin M; Sperling, Lindsay J; Comellas, Gemma; Peterson, Joseph R; Morrissey, James H; Rienstra, Chad M
2015-10-06
Standard methods for de novo protein structure determination by nuclear magnetic resonance (NMR) require time-consuming data collection and interpretation efforts. Here we present a qualitatively distinct and novel approach, called Comparative, Objective Measurement of Protein Architectures by Scoring Shifts (COMPASS), which identifies the best structures from a set of structural models by numerical comparison with a single, unassigned 2D (13)C-(13)C NMR spectrum containing backbone and side-chain aliphatic signals. COMPASS does not require resonance assignments. It is particularly well suited for interpretation of magic-angle spinning solid-state NMR spectra, but also applicable to solution NMR spectra. We demonstrate COMPASS with experimental data from four proteins--GB1, ubiquitin, DsbA, and the extracellular domain of human tissue factor--and with reconstructed spectra from 11 additional proteins. For all these proteins, with molecular mass up to 25 kDa, COMPASS distinguished the correct fold, most often within 1.5 Å root-mean-square deviation of the reference structure. Copyright © 2015 Elsevier Ltd. All rights reserved.
Calus, M P L; de Haas, Y; Veerkamp, R F
2013-10-01
Genomic selection holds the promise to be particularly beneficial for traits that are difficult or expensive to measure, such that access to phenotypes on large daughter groups of bulls is limited. Instead, cow reference populations can be generated, potentially supplemented with existing information from the same or (highly) correlated traits available on bull reference populations. The objective of this study, therefore, was to develop a model to perform genomic predictions and genome-wide association studies based on a combined cow and bull reference data set, with the accuracy of the phenotypes differing between the cow and bull genomic selection reference populations. The developed bivariate Bayesian stochastic search variable selection model allowed for an unbalanced design by imputing residuals in the residual updating scheme for all missing records. The performance of this model is demonstrated on a real data example, where the analyzed trait, being milk fat or protein yield, was either measured only on a cow or a bull reference population, or recorded on both. Our results were that the developed bivariate Bayesian stochastic search variable selection model was able to analyze 2 traits, even though animals had measurements on only 1 of 2 traits. The Bayesian stochastic search variable selection model yielded consistently higher accuracy for fat yield compared with a model without variable selection, both for the univariate and bivariate analyses, whereas the accuracy of both models was very similar for protein yield. The bivariate model identified several additional quantitative trait loci peaks compared with the single-trait models on either trait. In addition, the bivariate models showed a marginal increase in accuracy of genomic predictions for the cow traits (0.01-0.05), although a greater increase in accuracy is expected as the size of the bull population increases. Our results emphasize that the chosen value of priors in Bayesian genomic prediction models are especially important in small data sets. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Bladder Cancer-associated Protein, a Potential Prognostic Biomarker in Human Bladder Cancer*
Moreira, José M. A.; Ohlsson, Gita; Gromov, Pavel; Simon, Ronald; Sauter, Guido; Celis, Julio E.; Gromova, Irina
2010-01-01
It is becoming increasingly clear that no single marker will have the sensitivity and specificity necessary to be used on its own for diagnosis/prognosis of tumors. Interpatient and intratumor heterogeneity provides overwhelming odds against the existence of such an ideal marker. With this in mind, our laboratory has been applying a long term systematic approach to identify multiple biomarkers that can be used for clinical purposes. As a result of these studies, we have identified and reported several candidate biomarker proteins that are deregulated in bladder cancer. Following the conceptual biomarker development phases proposed by the Early Detection Research Network, we have taken some of the most promising candidate proteins into postdiscovery validation studies, and here we report on the characterization of one such biomarker, the bladder cancer-associated protein (BLCAP), formerly termed Bc10. To characterize BLCAP protein expression and cellular localization patterns in benign bladder urothelium and urothelial carcinomas (UCs), we used two independent sets of samples from different patient cohorts: a reference set consisting of 120 bladder specimens (formalin-fixed as well as frozen biopsies) and a validation set consisting of 2,108 retrospectively collected UCs with long term clinical follow-up. We could categorize the UCs examined into four groups based on levels of expression and subcellular localization of BLCAP protein and showed that loss of BLCAP expression is associated with tumor progression. The results indicated that increased expression of this protein confers an adverse patient outcome, suggesting that categorization of staining patterns for this protein may have prognostic value. Finally, we applied a combinatorial two-marker discriminator using BLCAP and adipocyte-type fatty acid-binding protein, another UC biomarker previously reported by us, and found that the combination of the two markers correlated more closely with grade and/or stage of disease than the individual markers. The implications of these results in biomarker discovery are discussed. PMID:19783793
Pritchard, Caroline; O'Connor, Gavin; Ashcroft, Alison E
2013-08-06
To achieve comparability of measurement results of protein amount of substance content between clinical laboratories, suitable reference materials are required. The impact on measurement comparability of potential differences in the tertiary and quaternary structure of protein reference standards is as yet not well understood. With the use of human growth hormone as a model protein, the potential of ion mobility spectrometry-mass spectrometry as a tool to assess differences in the structure of protein reference materials and their interactions with antibodies has been investigated here.
Ensembl comparative genomics resources.
Herrero, Javier; Muffato, Matthieu; Beal, Kathryn; Fitzgerald, Stephen; Gordon, Leo; Pignatelli, Miguel; Vilella, Albert J; Searle, Stephen M J; Amode, Ridwan; Brent, Simon; Spooner, William; Kulesha, Eugene; Yates, Andrew; Flicek, Paul
2016-01-01
Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org. © The Author(s) 2016. Published by Oxford University Press.
Phommasone, Koukeo; Althaus, Thomas; Souvanthong, Phonesavanh; Phakhounthong, Khansoudaphone; Soyvienvong, Laxoy; Malapheth, Phatthaphone; Mayxay, Mayfong; Pavlicek, Rebecca L; Paris, Daniel H; Dance, David; Newton, Paul; Lubell, Yoel
2016-02-04
C-Reactive Protein (CRP) has been shown to be an accurate biomarker for discriminating bacterial from viral infections in febrile patients in Southeast Asia. Here we investigate the accuracy of existing rapid qualitative and semi-quantitative tests as compared with a quantitative reference test to assess their potential for use in remote tropical settings. Blood samples were obtained from consecutive patients recruited to a prospective fever study at three sites in rural Laos. At each site, one of three rapid qualitative or semi-quantitative tests was performed, as well as a corresponding quantitative NycoCard Reader II as a reference test. We estimate the sensitivity and specificity of the three tests against a threshold of 10 mg/L and kappa values for the agreement of the two semi-quantitative tests with the results of the reference test. All three tests showed high sensitivity, specificity and kappa values as compared with the NycoCard Reader II. With a threshold of 10 mg/L the sensitivity of the tests ranged from 87-98 % and the specificity from 91-98 %. The weighted kappa values for the semi-quantitative tests were 0.7 and 0.8. The use of CRP rapid tests could offer an inexpensive and effective approach to improve the targeting of antibiotics in remote settings where health facilities are basic and laboratories are absent. This study demonstrates that accurate CRP rapid tests are commercially available; evaluations of their clinical impact and cost-effectiveness at point of care is warranted.
Ensembl comparative genomics resources
Muffato, Matthieu; Beal, Kathryn; Fitzgerald, Stephen; Gordon, Leo; Pignatelli, Miguel; Vilella, Albert J.; Searle, Stephen M. J.; Amode, Ridwan; Brent, Simon; Spooner, William; Kulesha, Eugene; Yates, Andrew; Flicek, Paul
2016-01-01
Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org. PMID:26896847
Yang, Ming; Ge, Yan; Wu, Jiayan; Xiao, Jingfa; Yu, Jun
2011-05-20
Coevolution can be seen as the interdependency between evolutionary histories. In the context of protein evolution, functional correlation proteins are ever-present coordinated evolutionary characters without disruption of organismal integrity. As to complex system, there are two forms of protein--protein interactions in vivo, which refer to inter-complex interaction and intra-complex interaction. In this paper, we studied the difference of coevolution characters between inter-complex interaction and intra-complex interaction using "Mirror tree" method on the respiratory chain (RC) proteins. We divided the correlation coefficients of every pairwise RC proteins into two groups corresponding to the binary protein--protein interaction in intra-complex and the binary protein--protein interaction in inter-complex, respectively. A dramatical discrepancy is detected between the coevolution characters of the two sets of protein interactions (Wilcoxon test, p-value = 4.4 × 10(-6)). Our finding reveals some critical information on coevolutionary study and assists the mechanical investigation of protein--protein interaction. Furthermore, the results also provide some unique clue for supramolecular organization of protein complexes in the mitochondrial inner membrane. More detailed binding sites map and genome information of nuclear encoded RC proteins will be extraordinary valuable for the further mitochondria dynamics study. Copyright © 2011. Published by Elsevier Ltd.
Cho, Jin-Young; Lee, Hyoung-Joo; Jeong, Seul-Ki; Kim, Kwang-Youl; Kwon, Kyung-Hoon; Yoo, Jong Shin; Omenn, Gilbert S; Baker, Mark S; Hancock, William S; Paik, Young-Ki
2015-12-04
Approximately 2.9 billion long base-pair human reference genome sequences are known to encode some 20 000 representative proteins. However, 3000 proteins, that is, ~15% of all proteins, have no or very weak proteomic evidence and are still missing. Missing proteins may be present in rare samples in very low abundance or be only temporarily expressed, causing problems in their detection and protein profiling. In particular, some technical limitations cause missing proteins to remain unassigned. For example, current mass spectrometry techniques have high limits and error rates for the detection of complex biological samples. An insufficient proteome coverage in a reference sequence database and spectral library also raises major issues. Thus, the development of a better strategy that results in greater sensitivity and accuracy in the search for missing proteins is necessary. To this end, we used a new strategy, which combines a reference spectral library search and a simulated spectral library search, to identify missing proteins. We built the human iRefSPL, which contains the original human reference spectral library and additional peptide sequence-spectrum match entries from other species. We also constructed the human simSPL, which contains the simulated spectra of 173 907 human tryptic peptides determined by MassAnalyzer (version 2.3.1). To prove the enhanced analytical performance of the combination of the human iRefSPL and simSPL methods for the identification of missing proteins, we attempted to reanalyze the placental tissue data set (PXD000754). The data from each experiment were analyzed using PeptideProphet, and the results were combined using iProphet. For the quality control, we applied the class-specific false-discovery rate filtering method. All of the results were filtered at a false-discovery rate of <1% at the peptide and protein levels. The quality-controlled results were then cross-checked with the neXtProt DB (2014-09-19 release). The two spectral libraries, iRefSPL and simSPL, were designed to ensure no overlap of the proteome coverage. They were shown to be complementary to spectral library searching and significantly increased the number of matches. From this trial, 12 new missing proteins were identified that passed the following criterion: at least 2 peptides of 7 or more amino acids in length or one of 9 or more amino acids in length with one or more unique sequences. Thus, the iRefSPL and simSPL combination can be used to help identify peptides that have not been detected by conventional sequence database searches with improved sensitivity and a low error rate.
MitoMiner: a data warehouse for mitochondrial proteomics data
Smith, Anthony C.; Blackshaw, James A.; Robinson, Alan J.
2012-01-01
MitoMiner (http://mitominer.mrc-mbu.cam.ac.uk/) is a data warehouse for the storage and analysis of mitochondrial proteomics data gathered from publications of mass spectrometry and green fluorescent protein tagging studies. In MitoMiner, these data are integrated with data from UniProt, Gene Ontology, Online Mendelian Inheritance in Man, HomoloGene, Kyoto Encyclopaedia of Genes and Genomes and PubMed. The latest release of MitoMiner stores proteomics data sets from 46 studies covering 11 different species from eumetazoa, viridiplantae, fungi and protista. MitoMiner is implemented by using the open source InterMine data warehouse system, which provides a user interface allowing users to upload data for analysis, personal accounts to store queries and results and enables queries of any data in the data model. MitoMiner also provides lists of proteins for use in analyses, including the new MitoMiner mitochondrial proteome reference sets that specify proteins with substantial experimental evidence for mitochondrial localization. As further mitochondrial proteomics data sets from normal and diseased tissue are published, MitoMiner can be used to characterize the variability of the mitochondrial proteome between tissues and investigate how changes in the proteome may contribute to mitochondrial dysfunction and mitochondrial-associated diseases such as cancer, neurodegenerative diseases, obesity, diabetes, heart failure and the ageing process. PMID:22121219
Structural Isosteres of Phosphate Groups in the Protein Data Bank.
Zhang, Yuezhou; Borrel, Alexandre; Ghemtio, Leo; Regad, Leslie; Boije Af Gennäs, Gustav; Camproux, Anne-Claude; Yli-Kauhaluoma, Jari; Xhaard, Henri
2017-03-27
We developed a computational workflow to mine the Protein Data Bank for isosteric replacements that exist in different binding site environments but have not necessarily been identified and exploited in compound design. Taking phosphate groups as examples, the workflow was used to construct 157 data sets, each composed of a reference protein complexed with AMP, ADP, ATP, or pyrophosphate as well other ligands. Phosphate binding sites appear to have a high hydration content and large size, resulting in U-shaped bioactive conformations recurrently found across unrelated protein families. A total of 16 413 replacements were extracted, filtered for a significant structural overlap on phosphate groups, and sorted according to their SMILES codes. In addition to the classical isosteres of phosphate, such as carboxylate, sulfone, or sulfonamide, unexpected replacements that do not conserve charge or polarity, such as aryl, aliphatic, or positively charged groups, were found.
Automated selected reaction monitoring software for accurate label-free protein quantification.
Teleman, Johan; Karlsson, Christofer; Waldemarson, Sofia; Hansson, Karin; James, Peter; Malmström, Johan; Levander, Fredrik
2012-07-06
Selected reaction monitoring (SRM) is a mass spectrometry method with documented ability to quantify proteins accurately and reproducibly using labeled reference peptides. However, the use of labeled reference peptides becomes impractical if large numbers of peptides are targeted and when high flexibility is desired when selecting peptides. We have developed a label-free quantitative SRM workflow that relies on a new automated algorithm, Anubis, for accurate peak detection. Anubis efficiently removes interfering signals from contaminating peptides to estimate the true signal of the targeted peptides. We evaluated the algorithm on a published multisite data set and achieved results in line with manual data analysis. In complex peptide mixtures from whole proteome digests of Streptococcus pyogenes we achieved a technical variability across the entire proteome abundance range of 6.5-19.2%, which was considerably below the total variation across biological samples. Our results show that the label-free SRM workflow with automated data analysis is feasible for large-scale biological studies, opening up new possibilities for quantitative proteomics and systems biology.
NASA Astrophysics Data System (ADS)
Sicard, François; Senet, Patrick
2013-06-01
Well-Tempered Metadynamics (WTmetaD) is an efficient method to enhance the reconstruction of the free-energy surface of proteins. WTmetaD guarantees a faster convergence in the long time limit in comparison with the standard metadynamics. It still suffers, however, from the same limitation, i.e., the non-trivial choice of pertinent collective variables (CVs). To circumvent this problem, we couple WTmetaD with a set of CVs generated from a dihedral Principal Component Analysis (dPCA) on the Ramachandran dihedral angles describing the backbone structure of the protein. The dPCA provides a generic method to extract relevant CVs built from internal coordinates, and does not depend on the alignment to an arbitrarily chosen reference structure as usual in Cartesian PCA. We illustrate the robustness of this method in the case of a reference model protein, the small and very diffusive Met-enkephalin pentapeptide. We propose a justification a posteriori of the considered number of CVs necessary to bias the metadynamics simulation in terms of the one-dimensional free-energy profiles associated with Ramachandran dihedral angles along the amino-acid sequence.
Sicard, François; Senet, Patrick
2013-06-21
Well-Tempered Metadynamics (WTmetaD) is an efficient method to enhance the reconstruction of the free-energy surface of proteins. WTmetaD guarantees a faster convergence in the long time limit in comparison with the standard metadynamics. It still suffers, however, from the same limitation, i.e., the non-trivial choice of pertinent collective variables (CVs). To circumvent this problem, we couple WTmetaD with a set of CVs generated from a dihedral Principal Component Analysis (dPCA) on the Ramachandran dihedral angles describing the backbone structure of the protein. The dPCA provides a generic method to extract relevant CVs built from internal coordinates, and does not depend on the alignment to an arbitrarily chosen reference structure as usual in Cartesian PCA. We illustrate the robustness of this method in the case of a reference model protein, the small and very diffusive Met-enkephalin pentapeptide. We propose a justification a posteriori of the considered number of CVs necessary to bias the metadynamics simulation in terms of the one-dimensional free-energy profiles associated with Ramachandran dihedral angles along the amino-acid sequence.
A Circular Dichroism Reference Database for Membrane Proteins
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wallace,B.; Wien, F.; Stone, T.
2006-01-01
Membrane proteins are a major product of most genomes and the target of a large number of current pharmaceuticals, yet little information exists on their structures because of the difficulty of crystallising them; hence for the most part they have been excluded from structural genomics programme targets. Furthermore, even methods such as circular dichroism (CD) spectroscopy which seek to define secondary structure have not been fully exploited because of technical limitations to their interpretation for membrane embedded proteins. Empirical analyses of circular dichroism (CD) spectra are valuable for providing information on secondary structures of proteins. However, the accuracy of themore » results depends on the appropriateness of the reference databases used in the analyses. Membrane proteins have different spectral characteristics than do soluble proteins as a result of the low dielectric constants of membrane bilayers relative to those of aqueous solutions (Chen & Wallace (1997) Biophys. Chem. 65:65-74). To date, no CD reference database exists exclusively for the analysis of membrane proteins, and hence empirical analyses based on current reference databases derived from soluble proteins are not adequate for accurate analyses of membrane protein secondary structures (Wallace et al (2003) Prot. Sci. 12:875-884). We have therefore created a new reference database of CD spectra of integral membrane proteins whose crystal structures have been determined. To date it contains more than 20 proteins, and spans the range of secondary structures from mostly helical to mostly sheet proteins. This reference database should enable more accurate secondary structure determinations of membrane embedded proteins and will become one of the reference database options in the CD calculation server DICHROWEB (Whitmore & Wallace (2004) NAR 32:W668-673).« less
Coram, Tristan E; Pang, Edwin C K
2006-11-01
Using microarray technology and a set of chickpea (Cicer arietinum L.) unigenes, grasspea (Lathyrus sativus L.) expressed sequence tags (ESTs) and lentil (Lens culinaris Med.) resistance gene analogues, the ascochyta blight (Ascochyta rabiei (Pass.) L.) resistance response was studied in four chickpea genotypes, including resistant, moderately resistant, susceptible and wild relative (Cicer echinospermum L.) genotypes. The experimental system minimized environmental effects and was conducted in reference design, in which samples from mock-inoculated controls acted as reference against post-inoculation samples. Robust data quality was achieved through the use of three biological replicates (including a dye swap), the inclusion of negative controls and strict selection criteria for differentially expressed genes, including a fold change cut-off determined by self-self hybridizations, Student's t-test and multiple testing correction (P < 0.05). Microarray observations were also validated by quantitative reverse transcriptase-polymerase chain reaction (RT-PCR). The time course expression patterns of 756 microarray features resulted in the differential expression of 97 genes in at least one genotype at one time point. k-means clustering grouped the genes into clusters of similar observations for each genotype, and comparisons between A. rabiei-resistant and A. rabiei-susceptible genotypes revealed potential gene 'signatures' predictive of effective A. rabiei resistance. These genes included several pathogenesis-related proteins, SNAKIN2 antimicrobial peptide, proline-rich protein, disease resistance response protein DRRG49-C, environmental stress-inducible protein, leucine-zipper protein, polymorphic antigen membrane protein, Ca-binding protein and several unknown proteins. The potential involvement of these genes and their pathways of induction are discussed. This study represents the first large-scale gene expression profiling in chickpea, and future work will focus on the functional validation of the genes of interest.
Novel nonlinear knowledge-based mean force potentials based on machine learning.
Dong, Qiwen; Zhou, Shuigeng
2011-01-01
The prediction of 3D structures of proteins from amino acid sequences is one of the most challenging problems in molecular biology. An essential task for solving this problem with coarse-grained models is to deduce effective interaction potentials. The development and evaluation of new energy functions is critical to accurately modeling the properties of biological macromolecules. Knowledge-based mean force potentials are derived from statistical analysis of proteins of known structures. Current knowledge-based potentials are almost in the form of weighted linear sum of interaction pairs. In this study, a class of novel nonlinear knowledge-based mean force potentials is presented. The potential parameters are obtained by nonlinear classifiers, instead of relative frequencies of interaction pairs against a reference state or linear classifiers. The support vector machine is used to derive the potential parameters on data sets that contain both native structures and decoy structures. Five knowledge-based mean force Boltzmann-based or linear potentials are introduced and their corresponding nonlinear potentials are implemented. They are the DIH potential (single-body residue-level Boltzmann-based potential), the DFIRE-SCM potential (two-body residue-level Boltzmann-based potential), the FS potential (two-body atom-level Boltzmann-based potential), the HR potential (two-body residue-level linear potential), and the T32S3 potential (two-body atom-level linear potential). Experiments are performed on well-established decoy sets, including the LKF data set, the CASP7 data set, and the Decoys “R”Us data set. The evaluation metrics include the energy Z score and the ability of each potential to discriminate native structures from a set of decoy structures. Experimental results show that all nonlinear potentials significantly outperform the corresponding Boltzmann-based or linear potentials, and the proposed discriminative framework is effective in developing knowledge-based mean force potentials. The nonlinear potentials can be widely used for ab initio protein structure prediction, model quality assessment, protein docking, and other challenging problems in computational biology.
Song, Shangxin; Hooiveld, Guido J; Zhang, Wei; Li, Mengjie; Zhao, Fan; Zhu, Jing; Xu, Xinglian; Muller, Michael; Li, Chunbao; Zhou, Guanghong
2016-04-01
It has been reported that isolated dietary soy and meat proteins have distinct effects on physiology and liver gene expression, but the impact on protein expression responses are unknown. Because these may differ from gene expression responses, we investigated dietary protein-induced changes in liver proteome. Rats were fed for 1 week semisynthetic diets that differed only regarding protein source; casein (reference) was fully replaced by isolated soy, chicken, fish, or pork protein. Changes in liver proteome were measured by iTRAQ labeling and LC-ESI-MS/MS. A robust set totaling 1437 unique proteins was identified and subjected to differential protein analysis and biological interpretation. Compared with casein, all other protein sources reduced the abundance of proteins involved in fatty acid metabolism and Pparα signaling pathway. All dietary proteins, except chicken, increased oxidoreductive transformation reactions but reduced energy and essential amino acid metabolic pathways. Only soy protein increased the metabolism of sulfur-containing and nonessential amino acids. Soy and fish proteins increased translation and mRNA processing, whereas only chicken protein increased TCA cycle but reduced immune responses. These findings were partially in line with previously reported transcriptome results. This study further shows the distinct effects of soy and meat proteins on liver metabolism in rats.
Quantitative phylogenetic assessment of microbial communities in diverse environments.
von Mering, C; Hugenholtz, P; Raes, J; Tringe, S G; Doerks, T; Jensen, L J; Ward, N; Bork, P
2007-02-23
The taxonomic composition of environmental communities is an important indicator of their ecology and function. We used a set of protein-coding marker genes, extracted from large-scale environmental shotgun sequencing data, to provide a more direct, quantitative, and accurate picture of community composition than that provided by traditional ribosomal RNA-based approaches depending on the polymerase chain reaction. Mapping marker genes from four diverse environmental data sets onto a reference species phylogeny shows that certain communities evolve faster than others. The method also enables determination of preferred habitats for entire microbial clades and provides evidence that such habitat preferences are often remarkably stable over time.
Evaluation of peak-picking algorithms for protein mass spectrometry.
Bauer, Chris; Cramer, Rainer; Schuchhardt, Johannes
2011-01-01
Peak picking is an early key step in MS data analysis. We compare three commonly used approaches to peak picking and discuss their merits by means of statistical analysis. Methods investigated encompass signal-to-noise ratio, continuous wavelet transform, and a correlation-based approach using a Gaussian template. Functionality of the three methods is illustrated and discussed in a practical context using a mass spectral data set created with MALDI-TOF technology. Sensitivity and specificity are investigated using a manually defined reference set of peaks. As an additional criterion, the robustness of the three methods is assessed by a perturbation analysis and illustrated using ROC curves.
Barua, Pragya; Subba, Pratigya; Lande, Nilesh Vikram; Mangalaparthi, Kiran K; Prasad, T S Keshava; Chakraborty, Subhra; Chakraborty, Niranjan
2016-06-30
Plasma membrane (PM) encompasses total cellular contents, serving as semi-porous barrier to cell exterior. This living barrier regulates all cellular exchanges in a spatio-temporal fashion. Most of the essential tasks of PMs including molecular transport, cell-cell interaction and signal transduction are carried out by their proteinaceous components, which make the PM protein repertoire to be diverse and dynamic. Here, we report the systematic analysis of PM proteome of a food legume, chickpea and develop a PM proteome reference map. Proteins were extracted from highly enriched PM fraction of four-week-old seedlings using aqueous two-phase partitioning. To address a population of PM proteins that is as comprehensive as possible, both gel-based and gel-free approaches were employed, which led to the identification of a set of 2732 non-redundant proteins. These included both integral proteins having bilayer spanning domains as well as peripheral proteins associated with PMs through posttranslational modifications or protein-protein interactions. Further, the proteins were subjected to various in-silico analyses and functionally classified based on their gene ontology. Finally an inventory of the complete set of PM proteins, identified in several monocot and dicot species, was created for comparative study with the generated PM protein dataset of chickpea. Chickpea, a rich source of dietary proteins, is the second most cultivated legume, which is grown over 10 million hectares of land worldwide. The annual global production of chickpea hovers around 8.5 million metric tons. Recent chickpea genome sequencing effort has provided a broad genetic basis for highlighting the important traits that may fortify other crop legumes. Improvement in chickpea varieties can further strengthen the world food security, which includes food availability, access and utilization. It is known that the phenotypic trait of a cultivar is the manifestation of the orchestrated functions of its proteins. Study of the PM proteome offers insights into the mechanism of communication between the cell and its environment by identification of receptors, signalling proteins and membrane transporters. Knowledge of the PM protein repertoire of a relatively dehydration tolerant chickpea variety, JG-62, can contribute in development of strategies for metabolic reprograming of crop species and breeding applications. Copyright © 2016 Elsevier B.V. All rights reserved.
Validated age-specific reference values for CSF total protein levels in children.
Kahlmann, V; Roodbol, J; van Leeuwen, N; Ramakers, C R B; van Pelt, D; Neuteboom, R F; Catsman-Berrevoets, C E; de Wit, M C Y; Jacobs, B C
2017-07-01
To define age-specific reference values for cerebrospinal fluid (CSF) total protein levels for children and validate these values in children with Guillain-Barré syndrome (GBS), acute disseminated encephalomyelitis (ADEM) and multiple sclerosis (MS). Reference values for CSF total protein levels were determined in an extensive cohort of diagnostic samples from children (<18 year) evaluated at Erasmus Medical Center/Sophia Children's Hospital. These reference values were confirmed in children diagnosed with disorders unrelated to raised CSF total protein level and validated in children with GBS, ADEM and MS. The test results of 6145 diagnostic CSF samples from 3623 children were used to define reference values. The reference values based on the upper limit of the 95% CI (i.e. upper limit of normal) were for 6 months-2 years 0.25 g/L, 2-6 years 0.25 g/L, 6-12 years 0.28 g/L, 12-18 years 0.34 g/L. These reference values were confirmed in a subgroup of 378 children diagnosed with disorders that are not typically associated with increased CSF total protein. In addition, the CSF total protein levels in these children in the first 6 months after birth were highly variable (median 0.47 g/L, IQR 0.26-0.65). According to these new reference values, CSF total protein level was elevated in 85% of children with GBS, 66% with ADEM and 23% with MS. More accurate age-specific reference values for CSF total protein levels in children were determined. These new reference values are more sensitive than currently used values for diagnosing GBS and ADEM in children. Copyright © 2017 European Paediatric Neurology Society. Published by Elsevier Ltd. All rights reserved.
Gaupels, Frank; Knauer, Torsten; van Bel, Aart J E
2008-01-01
This study investigated advantages and drawbacks of two sieve-tube sap sampling methods for comparison of phloem proteins in powdery mildew-infested vs. non-infested Hordeum vulgare plants. In one approach, sieve tube sap was collected by stylectomy. Aphid stylets were cut and immediately covered with silicon oil to prevent any contamination or modification of exudates. In this way, a maximum of 1muL pure phloem sap could be obtained per hour. Interestingly, after pathogen infection exudation from microcauterized stylets was reduced to less than 40% of control plants, suggesting that powdery mildew induced sieve tube-occlusion mechanisms. In contrast to the laborious stylectomy, facilitated exudation using EDTA to prevent calcium-mediated callose formation is quick and easy with a large volume yield. After two-dimensional (2D) electrophoresis, a digital overlay of the protein sets extracted from EDTA solutions and stylet exudates showed that some major spots were the same with both sampling techniques. However, EDTA exudates also contained large amounts of contaminative proteins of unknown origin. A combinatory approach may be most favourable for studies in which the protein composition of phloem sap is compared between control and pathogen-infected plants. Facilitated exudation may be applied for subtractive identification of differentially expressed proteins by 2D/mass spectrometry, which requires large amounts of protein. A reference gel loaded with pure phloem sap from stylectomy may be useful for confirmation of phloem origin of candidate spots by digital overlay. The method provides a novel opportunity to study differential expression of phloem proteins in monocotyledonous plant species.
2011-01-01
Background Mapping protein primary sequences to their three dimensional folds referred to as the 'second genetic code' remains an unsolved scientific problem. A crucial part of the problem concerns the geometrical specificity in side chain association leading to densely packed protein cores, a hallmark of correctly folded native structures. Thus, any model of packing within proteins should constitute an indispensable component of protein folding and design. Results In this study an attempt has been made to find, characterize and classify recurring patterns in the packing of side chain atoms within a protein which sustains its native fold. The interaction of side chain atoms within the protein core has been represented as a contact network based on the surface complementarity and overlap between associating side chain surfaces. Some network topologies definitely appear to be preferred and they have been termed 'packing motifs', analogous to super secondary structures in proteins. Study of the distribution of these motifs reveals the ubiquitous presence of typical smaller graphs, which appear to get linked or coalesce to give larger graphs, reminiscent of the nucleation-condensation model in protein folding. One such frequently occurring motif, also envisaged as the unit of clustering, the three residue clique was invariably found in regions of dense packing. Finally, topological measures based on surface contact networks appeared to be effective in discriminating sequences native to a specific fold amongst a set of decoys. Conclusions Out of innumerable topological possibilities, only a finite number of specific packing motifs are actually realized in proteins. This small number of motifs could serve as a basis set in the construction of larger networks. Of these, the triplet clique exhibits distinct preference both in terms of composition and geometry. PMID:21605466
Uptake of recommended common reference intervals for chemical pathology in Australia.
Jones, Graham Rd; Koetsier, Sabrina
2017-05-01
Background Reference intervals are a vital part of reporting numerical pathology results. It is known, however, that variation in reference intervals between laboratories is common, even when analytical methods support common reference intervals. In response to this, in Australia, the Australasian Association of Clinical Biochemists together with the Royal College of Pathologists of Australasia published in 2014 a set of recommended common reference intervals for 11 common serum analytes (sodium, potassium, chloride, bicarbonate, creatinine male, creatinine female, calcium, calcium adjusted for albumin, phosphate, magnesium, lactate dehydrogenase, alkaline phosphatase and total protein). Methods Uptake of recommended common reference intervals in Australian laboratories was assessed using data from four annual cycles of the RCPAQAP reference intervals external quality assurance programme. Results Over three years, from 2013 to 2016, the use of the recommended upper and lower reference limits has increased from 40% to 83%. Nearly half of the intervals in use by enrolled laboratories in 2016 have been changed in this time period, indicating an active response to the guidelines. Conclusions These data support the activities of the Australasian Association of Clinical Biochemists and Royal College of Pathologists of Australasia in demonstrating a change in laboratory behaviour to reduce unnecessary variation in reference intervals and thus provide a consistent message to doctor and patients irrespective of the laboratory used.
Prediction of allosteric sites and mediating interactions through bond-to-bond propensities
NASA Astrophysics Data System (ADS)
Amor, B. R. C.; Schaub, M. T.; Yaliraki, S. N.; Barahona, M.
2016-08-01
Allostery is a fundamental mechanism of biological regulation, in which binding of a molecule at a distant location affects the active site of a protein. Allosteric sites provide targets to fine-tune protein activity, yet we lack computational methodologies to predict them. Here we present an efficient graph-theoretical framework to reveal allosteric interactions (atoms and communication pathways strongly coupled to the active site) without a priori information of their location. Using an atomistic graph with energy-weighted covalent and weak bonds, we define a bond-to-bond propensity quantifying the non-local effect of instantaneous bond fluctuations propagating through the protein. Significant interactions are then identified using quantile regression. We exemplify our method with three biologically important proteins: caspase-1, CheY, and h-Ras, correctly predicting key allosteric interactions, whose significance is additionally confirmed against a reference set of 100 proteins. The almost-linear scaling of our method renders it suitable for high-throughput searches for candidate allosteric sites.
Prediction of allosteric sites and mediating interactions through bond-to-bond propensities
Amor, B. R. C.; Schaub, M. T.; Yaliraki, S. N.; Barahona, M.
2016-01-01
Allostery is a fundamental mechanism of biological regulation, in which binding of a molecule at a distant location affects the active site of a protein. Allosteric sites provide targets to fine-tune protein activity, yet we lack computational methodologies to predict them. Here we present an efficient graph-theoretical framework to reveal allosteric interactions (atoms and communication pathways strongly coupled to the active site) without a priori information of their location. Using an atomistic graph with energy-weighted covalent and weak bonds, we define a bond-to-bond propensity quantifying the non-local effect of instantaneous bond fluctuations propagating through the protein. Significant interactions are then identified using quantile regression. We exemplify our method with three biologically important proteins: caspase-1, CheY, and h-Ras, correctly predicting key allosteric interactions, whose significance is additionally confirmed against a reference set of 100 proteins. The almost-linear scaling of our method renders it suitable for high-throughput searches for candidate allosteric sites. PMID:27561351
Experimental Protein Structure Verification by Scoring with a Single, Unassigned NMR Spectrum
Courtney, Joseph M.; Ye, Qing; Nesbitt, Anna E.; Tang, Ming; Tuttle, Marcus D.; Watt, Eric D.; Nuzzio, Kristin M.; Sperling, Lindsay J.; Comellas, Gemma; Peterson, Joseph R.; Morrissey, James H.; Rienstra, Chad M.
2016-01-01
Standard methods for de novo protein structure determination by nuclear magnetic resonance (NMR) require time-consuming data collection and interpretation efforts. Here we present a qualitatively distinct and novel approach, called Comparative, Objective Measurement of Protein Architectures by Scoring Shifts (COMPASS), which identifies the best structures from a set of structural models by numerical comparison with a single, unassigned 2D 13C-13C NMR spectrum containing backbone and side-chain aliphatic signals. COMPASS does not require resonance assignments. It is particularly well suited for interpretation of magic-angle spinning solid-state NMR spectra, but also applicable to solution NMR spectra. We demonstrate COMPASS with experimental data from four proteins—GB1, ubiquitin, DsbA, and the extracellular domain of human tissue factor—and with reconstructed spectra from 11 additional proteins. For all these proteins, with molecular mass up to 25 kDa, COMPASS distinguished the correct fold, most often within 1.5 Å root-mean-square deviation of the reference structure. PMID:26365800
NASA Astrophysics Data System (ADS)
Sarti, E.; Zamuner, S.; Cossio, P.; Laio, A.; Seno, F.; Trovato, A.
2013-12-01
In protein structure prediction it is of crucial importance, especially at the refinement stage, to score efficiently large sets of models by selecting the ones that are closest to the native state. We here present a new computational tool, BACHSCORE, that allows its users to rank different structural models of the same protein according to their quality, evaluated by using the BACH++ (Bayesian Analysis Conformation Hunt) scoring function. The original BACH statistical potential was already shown to discriminate with very good reliability the protein native state in large sets of misfolded models of the same protein. BACH++ features a novel upgrade in the solvation potential of the scoring function, now computed by adapting the LCPO (Linear Combination of Pairwise Orbitals) algorithm. This change further enhances the already good performance of the scoring function. BACHSCORE can be accessed directly through the web server: bachserver.pd.infn.it. Catalogue identifier: AEQD_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEQD_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: GNU General Public License version 3 No. of lines in distributed program, including test data, etc.: 130159 No. of bytes in distributed program, including test data, etc.: 24 687 455 Distribution format: tar.gz Programming language: C++. Computer: Any computer capable of running an executable produced by a g++ compiler (4.6.3 version). Operating system: Linux, Unix OS-es. RAM: 1 073 741 824 bytes Classification: 3. Nature of problem: Evaluate the quality of a protein structural model, taking into account the possible “a priori” knowledge of a reference primary sequence that may be different from the amino-acid sequence of the model; the native protein structure should be recognized as the best model. Solution method: The contact potential scores the occurrence of any given type of residue pair in 5 possible contact classes (α-helical contact, parallel β-sheet contact, anti-parallel β-sheet contact, side-chain contact, no contact). The solvation potential scores the occurrence of any residue type in 2 possible environments: buried and solvent exposed. Residue environment is assigned by adapting the LCPO algorithm. Residues present in the reference primary sequence and not present in the model structure contribute to the model score as solvent exposed and as non contacting all other residues. Restrictions: Input format file according to the Protein Data Bank standard Additional comments: Parameter values used in the scoring function can be found in the file /folder-to-bachscore/BACH/examples/bach_std.par. Running time: Roughly one minute to score one hundred structures on a desktop PC, depending on their size.
Sun, Meng; Lu, Ming-Xing; Tang, Xiao-Tian; Du, Yu-Zhou
2015-01-01
The pink stem borer, Sesamia inferens, which is endemic in China and other parts of Asia, is a major pest of rice and causes significant yield loss in this host plant. Very few studies have addressed gene expression in S. inferens. Quantitative real-time PCR (qRT-PCR) is currently the most accurate and sensitive method for gene expression analysis. In qRT-PCR, data are normalized using reference genes, which help control for internal differences and reduce error between samples. In this study, seven candidate reference genes, 18S ribosomal RNA (18S rRNA), elongation factor 1 (EF1), glyceraldehyde-3-phosphate dehydrogenase (GAPDH), ribosomal protein S13 (RPS13), ribosomal protein S20 (RPS20), tubulin (TUB), and β-actin (ACTB) were evaluated for their suitability in normalizing gene expression under different experimental conditions. The results indicated that three genes (RPS13, RPS20, and EF1) were optimal for normalizing gene expression in different insect tissues (head, epidermis, fat body, foregut, midgut, hindgut, Malpighian tubules, haemocytes, and salivary glands). 18S rRNA, EF1, and GAPDH were best for normalizing expression with respect to developmental stages and sex (egg masses; first, second, third, fourth, fifth, and sixth instar larvae; male and female pupae; and one-day-old male and female adults). 18S rRNA, RPS20, and TUB were optimal for fifth instars exposed to different temperatures (-8, -6, -4, -2, 0, and 27°C). To validate this recommendation, the expression profile of a target gene heat shock protein 83 gene (hsp83) was investigated, and results showed the selection was necessary and effective. In conclusion, this study describes reference gene sets that can be used to accurately measure gene expression in S. inferens.
Schacherer, Lindsey J; Xie, Weiping; Owens, Michaela A; Alarcon, Clara; Hu, Tiger X
2016-09-01
Liquid chromatography coupled with tandem mass spectrometry is increasingly used for protein detection for transgenic crops research. Currently this is achieved with protein reference standards which may take a significant time or efforts to obtain and there is a need for rapid protein detection without protein reference standards. A sensitive and specific method was developed to detect target proteins in transgenic maize leaf crude extract at concentrations as low as ∼30 ng mg(-1) dry leaf without the need of reference standards or any sample enrichment. A hybrid Q-TRAP mass spectrometer was used to monitor all potential tryptic peptides of the target proteins in both transgenic and non-transgenic samples. The multiple reaction monitoring-initiated detection and sequencing (MIDAS) approach was used for initial peptide/protein identification via Mascot database search. Further confirmation was achieved by direct comparison between transgenic and non-transgenic samples. Definitive confirmation was provided by running the same experiments of synthetic peptides or protein standards, if available. A targeted proteomic mass spectrometry method using MIDAS approach is an ideal methodology for detection of new proteins in early stages of transgenic crop research and development when neither protein reference standards nor antibodies are available. © 2016 Society of Chemical Industry. © 2016 Society of Chemical Industry.
Vinklárková, Bára; Chromý, Vratislav; Šprongl, Luděk; Bittová, Miroslava; Rikanová, Milena; Ohnútková, Ivana; Žaludová, Lenka
2015-01-01
To select a Kjeldahl procedure suitable for the determination of total protein in reference materials used in laboratory medicine, we reviewed in our previous article Kjeldahl methods adopted by clinical chemistry and found an indirect two-step analysis by total Kjeldahl nitrogen corrected for its nonprotein nitrogen and a direct analysis made on isolated protein precipitates. In this article, we compare both procedures on various reference materials. An indirect Kjeldahl method gave falsely lower results than a direct analysis. Preliminary performance parameters qualify the direct Kjeldahl analysis as a suitable primary reference procedure for the certification of total protein in reference laboratories.
Dutagaci, Bercem; Wittayanarakul, Kitiyaporn; Mori, Takaharu; Feig, Michael
2017-06-13
A scoring protocol based on implicit membrane-based scoring functions and a new protocol for optimizing the positioning of proteins inside the membrane was evaluated for its capacity to discriminate native-like states from misfolded decoys. A decoy set previously established by the Baker lab (Proteins: Struct., Funct., Genet. 2006, 62, 1010-1025) was used along with a second set that was generated to cover higher resolution models. The Implicit Membrane Model 1 (IMM1), IMM1 model with CHARMM 36 parameters (IMM1-p36), generalized Born with simple switching (GBSW), and heterogeneous dielectric generalized Born versions 2 (HDGBv2) and 3 (HDGBv3) were tested along with the new HDGB van der Waals (HDGBvdW) model that adds implicit van der Waals contributions to the solvation free energy. For comparison, scores were also calculated with the distance-scaled finite ideal-gas reference (DFIRE) scoring function. Z-scores for native state discrimination, energy vs root-mean-square deviation (RMSD) correlations, and the ability to select the most native-like structures as top-scoring decoys were evaluated to assess the performance of the scoring functions. Ranking of the decoys in the Baker set that were relatively far from the native state was challenging and dominated largely by packing interactions that were captured best by DFIRE with less benefit of the implicit membrane-based models. Accounting for the membrane environment was much more important in the second decoy set where especially the HDGB-based scoring functions performed very well in ranking decoys and providing significant correlations between scores and RMSD, which shows promise for improving membrane protein structure prediction and refinement applications. The new membrane structure scoring protocol was implemented in the MEMScore web server ( http://feiglab.org/memscore ).
Stratification of co-evolving genomic groups using ranked phylogenetic profiles
Freilich, Shiri; Goldovsky, Leon; Gottlieb, Assaf; Blanc, Eric; Tsoka, Sophia; Ouzounis, Christos A
2009-01-01
Background Previous methods of detecting the taxonomic origins of arbitrary sequence collections, with a significant impact to genome analysis and in particular metagenomics, have primarily focused on compositional features of genomes. The evolutionary patterns of phylogenetic distribution of genes or proteins, represented by phylogenetic profiles, provide an alternative approach for the detection of taxonomic origins, but typically suffer from low accuracy. Herein, we present rank-BLAST, a novel approach for the assignment of protein sequences into genomic groups of the same taxonomic origin, based on the ranking order of phylogenetic profiles of target genes or proteins across the reference database. Results The rank-BLAST approach is validated by computing the phylogenetic profiles of all sequences for five distinct microbial species of varying degrees of phylogenetic proximity, against a reference database of 243 fully sequenced genomes. The approach - a combination of sequence searches, statistical estimation and clustering - analyses the degree of sequence divergence between sets of protein sequences and allows the classification of protein sequences according to the species of origin with high accuracy, allowing taxonomic classification of 64% of the proteins studied. In most cases, a main cluster is detected, representing the corresponding species. Secondary, functionally distinct and species-specific clusters exhibit different patterns of phylogenetic distribution, thus flagging gene groups of interest. Detailed analyses of such cases are provided as examples. Conclusion Our results indicate that the rank-BLAST approach can capture the taxonomic origins of sequence collections in an accurate and efficient manner. The approach can be useful both for the analysis of genome evolution and the detection of species groups in metagenomics samples. PMID:19860884
Kotrri, Gynter; Fusch, Gerhard; Kwan, Celia; Choi, Dasol; Choi, Arum; Al Kafi, Nisreen; Rochow, Niels; Fusch, Christoph
2016-02-26
Commercial infrared (IR) milk analyzers are being increasingly used in research settings for the macronutrient measurement of breast milk (BM) prior to its target fortification. These devices, however, may not provide reliable measurement if not properly calibrated. In the current study, we tested a correction algorithm for a Near-IR milk analyzer (Unity SpectraStar, Brookfield, CT, USA) for fat and protein measurements, and examined the effect of pasteurization on the IR matrix and the stability of fat, protein, and lactose. Measurement values generated through Near-IR analysis were compared against those obtained through chemical reference methods to test the correction algorithm for the Near-IR milk analyzer. Macronutrient levels were compared between unpasteurized and pasteurized milk samples to determine the effect of pasteurization on macronutrient stability. The correction algorithm generated for our device was found to be valid for unpasteurized and pasteurized BM. Pasteurization had no effect on the macronutrient levels and the IR matrix of BM. These results show that fat and protein content can be accurately measured and monitored for unpasteurized and pasteurized BM. Of additional importance is the implication that donated human milk, generally low in protein content, has the potential to be target fortified.
MIPS: curated databases and comprehensive secondary data resources in 2010.
Mewes, H Werner; Ruepp, Andreas; Theis, Fabian; Rattei, Thomas; Walter, Mathias; Frishman, Dmitrij; Suhre, Karsten; Spannagl, Manuel; Mayer, Klaus F X; Stümpflen, Volker; Antonov, Alexey
2011-01-01
The Munich Information Center for Protein Sequences (MIPS at the Helmholtz Center for Environmental Health, Neuherberg, Germany) has many years of experience in providing annotated collections of biological data. Selected data sets of high relevance, such as model genomes, are subjected to careful manual curation, while the bulk of high-throughput data is annotated by automatic means. High-quality reference resources developed in the past and still actively maintained include Saccharomyces cerevisiae, Neurospora crassa and Arabidopsis thaliana genome databases as well as several protein interaction data sets (MPACT, MPPI and CORUM). More recent projects are PhenomiR, the database on microRNA-related phenotypes, and MIPS PlantsDB for integrative and comparative plant genome research. The interlinked resources SIMAP and PEDANT provide homology relationships as well as up-to-date and consistent annotation for 38,000,000 protein sequences. PPLIPS and CCancer are versatile tools for proteomics and functional genomics interfacing to a database of compilations from gene lists extracted from literature. A novel literature-mining tool, EXCERBT, gives access to structured information on classified relations between genes, proteins, phenotypes and diseases extracted from Medline abstracts by semantic analysis. All databases described here, as well as the detailed descriptions of our projects can be accessed through the MIPS WWW server (http://mips.helmholtz-muenchen.de).
MIPS: curated databases and comprehensive secondary data resources in 2010
Mewes, H. Werner; Ruepp, Andreas; Theis, Fabian; Rattei, Thomas; Walter, Mathias; Frishman, Dmitrij; Suhre, Karsten; Spannagl, Manuel; Mayer, Klaus F.X.; Stümpflen, Volker; Antonov, Alexey
2011-01-01
The Munich Information Center for Protein Sequences (MIPS at the Helmholtz Center for Environmental Health, Neuherberg, Germany) has many years of experience in providing annotated collections of biological data. Selected data sets of high relevance, such as model genomes, are subjected to careful manual curation, while the bulk of high-throughput data is annotated by automatic means. High-quality reference resources developed in the past and still actively maintained include Saccharomyces cerevisiae, Neurospora crassa and Arabidopsis thaliana genome databases as well as several protein interaction data sets (MPACT, MPPI and CORUM). More recent projects are PhenomiR, the database on microRNA-related phenotypes, and MIPS PlantsDB for integrative and comparative plant genome research. The interlinked resources SIMAP and PEDANT provide homology relationships as well as up-to-date and consistent annotation for 38 000 000 protein sequences. PPLIPS and CCancer are versatile tools for proteomics and functional genomics interfacing to a database of compilations from gene lists extracted from literature. A novel literature-mining tool, EXCERBT, gives access to structured information on classified relations between genes, proteins, phenotypes and diseases extracted from Medline abstracts by semantic analysis. All databases described here, as well as the detailed descriptions of our projects can be accessed through the MIPS WWW server (http://mips.helmholtz-muenchen.de). PMID:21109531
Oxya hyla hyla (Orthoptera: Acrididae) as an Alternative Protein Source for Japanese Quail
Das, Mousumi; Mandal, Suman Kalyan
2014-01-01
Nutrient composition of the grasshoppers Oxya hyla hyla showed that they are a rich nutrient source containing 687.7 g protein/kg of dry body weight. Their antinutrient values fell within nutritionally acceptable values of the poultry bird Coturnix japonica japonica (Japanese quail). The most required essential amino acids and fatty acids were also present in sufficient amount. For feeding trial nine diets were formulated on an equal crude protein (230 g/kg) basis with grasshopper meal, fish meal, and soybean meal. Three sets of diets with grasshopper meal were prepared with 50 g/kg, 100 g/kg, and 150 g/kg grasshopper of total feed. Similarly, other diet sets were prepared with fish meal and also with soybean meal. Results were compared with another group of Japanese quails fed on a reference diet that was considered as control. Two experiments were conducted with a total number of 600, seven-day-old, Japanese quails. In experiment 1 for determination of growth performance, quails were randomly distributed into ten groups of males and ten groups of females containing 30 birds each. In experiment 2 for determination of laying performance, identical ten groups were prepared in ten repetitions (2 females and 1 male in each group) from the six-week-old birds of experiment 1. Birds of diet set GM2 have gained the highest body weight (male 4.04 g/bird/day; female 5.01 g/bird/day) followed by birds of FM3 diet set (male 3.72 g/bird/day; female 4.40 g/bird/day), whereas birds of reference diet have gained 3.05 g/bird/day for male and 3.23 g/bird/day for female. Feed conversion ratio (FCR) of birds fed with GM2 was the lowest (male 3.33; female 2.97) whereas FCR of R group was higher (male 4.37; female 4.65) than grasshopper meal and fish meal based diets. Hen day production percentage was higher (72.2) in GM2 group, followed by FM3 (63.5) group. R group had lower 1st egg weight (9.0 g), weight gain (8.2 g), percentage of hen day production (41.8%), higher feed intake (33.6 g/day/bird), and age at 1st laid egg than the grasshopper meal and fish meal based diets. So growth and laying performance of the birds were significantly better in grasshopper meal and fish meal added diet fed sets than the reference diet fed group; among all the dietary groups 100 g/kg grasshopper meal added diet mostly gave significantly better results followed by 150 g/kg fish meal added diets. It was ascertained that the O. hyla hyla meal had pronounced positive response on the birds. So, the quails could be easily fed 100 g/kg grasshopper meal added diet as it was the most suitable alternative feedstuff compared to the conventional protein source based diets. PMID:27355015
Evaluation and Design of Genome-Wide CRISPR/SpCas9 Knockout Screens
Hart, Traver; Tong, Amy Hin Yan; Chan, Katie; Van Leeuwen, Jolanda; Seetharaman, Ashwin; Aregger, Michael; Chandrashekhar, Megha; Hustedt, Nicole; Seth, Sahil; Noonan, Avery; Habsid, Andrea; Sizova, Olga; Nedyalkova, Lyudmila; Climie, Ryan; Tworzyanski, Leanne; Lawson, Keith; Sartori, Maria Augusta; Alibeh, Sabriyeh; Tieu, David; Masud, Sanna; Mero, Patricia; Weiss, Alexander; Brown, Kevin R.; Usaj, Matej; Billmann, Maximilian; Rahman, Mahfuzur; Costanzo, Michael; Myers, Chad L.; Andrews, Brenda J.; Boone, Charles; Durocher, Daniel; Moffat, Jason
2017-01-01
The adaptation of CRISPR/SpCas9 technology to mammalian cell lines is transforming the study of human functional genomics. Pooled libraries of CRISPR guide RNAs (gRNAs) targeting human protein-coding genes and encoded in viral vectors have been used to systematically create gene knockouts in a variety of human cancer and immortalized cell lines, in an effort to identify whether these knockouts cause cellular fitness defects. Previous work has shown that CRISPR screens are more sensitive and specific than pooled-library shRNA screens in similar assays, but currently there exists significant variability across CRISPR library designs and experimental protocols. In this study, we reanalyze 17 genome-scale knockout screens in human cell lines from three research groups, using three different genome-scale gRNA libraries. Using the Bayesian Analysis of Gene Essentiality algorithm to identify essential genes, we refine and expand our previously defined set of human core essential genes from 360 to 684 genes. We use this expanded set of reference core essential genes, CEG2, plus empirical data from six CRISPR knockout screens to guide the design of a sequence-optimized gRNA library, the Toronto KnockOut version 3.0 (TKOv3) library. We then demonstrate the high effectiveness of the library relative to reference sets of essential and nonessential genes, as well as other screens using similar approaches. The optimized TKOv3 library, combined with the CEG2 reference set, provide an efficient, highly optimized platform for performing and assessing gene knockout screens in human cell lines. PMID:28655737
Kostálová, D; Bezáková, L; Oblozinský, M; Kardosová, A
2004-09-01
Aloe vera is widely used in food supplements, beverages, pharmaceuticals, and cosmetics. It has been long recognized as an effective natural remedy for its wound-healing properties and its positive influence on other inflammatory skin disorders. Major proteins and mono- and polysaccharides were identified and analysed from Aloe vera commercial extract. Molecular weight of proteins calculated from the sets of molecular weight reference standards, ranged from 70 kDa for the largest to 14 kDa for the smallest ones. IR spectral analysis of the carbohydrate fraction shows that the main carbohydrate copound is acetylated (1 --> 4)-beta-D-mannan substituated with D-galactose and D-glucose. The results have shown that proteins and polysaccharides are a necessary component in the study of biological activity of Aloe vera leaf extract.
Dodhia, Kejal; Stoll, Thomas; Hastie, Marcus; Furuki, Eiko; Ellwood, Simon R.; Williams, Angela H.; Tan, Yew-Foon; Testa, Alison C.; Gorman, Jeffrey J.; Oliver, Richard P.
2016-01-01
Parastagonospora nodorum, the causal agent of Septoria nodorum blotch (SNB), is an economically important pathogen of wheat (Triticum spp.), and a model for the study of necrotrophic pathology and genome evolution. The reference P. nodorum strain SN15 was the first Dothideomycete with a published genome sequence, and has been used as the basis for comparison within and between species. Here we present an updated reference genome assembly with corrections of SNP and indel errors in the underlying genome assembly from deep resequencing data as well as extensive manual annotation of gene models using transcriptomic and proteomic sources of evidence (https://github.com/robsyme/Parastagonospora_nodorum_SN15). The updated assembly and annotation includes 8,366 genes with modified protein sequence and 866 new genes. This study shows the benefits of using a wide variety of experimental methods allied to expert curation to generate a reliable set of gene models. PMID:26840125
Structure Prediction and Analysis of DNA Transposon and LINE Retrotransposon Proteins*
Abrusán, György; Zhang, Yang; Szilágyi, András
2013-01-01
Despite the considerable amount of research on transposable elements, no large-scale structural analyses of the TE proteome have been performed so far. We predicted the structures of hundreds of proteins from a representative set of DNA and LINE transposable elements and used the obtained structural data to provide the first general structural characterization of TE proteins and to estimate the frequency of TE domestication and horizontal transfer events. We show that 1) ORF1 and Gag proteins of retrotransposons contain high amounts of structural disorder; thus, despite their very low conservation, the presence of disordered regions and probably their chaperone function is conserved. 2) The distribution of SCOP classes in DNA transposons and LINEs indicates that the proteins of DNA transposons are more ancient, containing folds that already existed when the first cellular organisms appeared. 3) DNA transposon proteins have lower contact order than randomly selected reference proteins, indicating rapid folding, most likely to avoid protein aggregation. 4) Structure-based searches for TE homologs indicate that the overall frequency of TE domestication events is low, whereas we found a relatively high number of cases where horizontal transfer, frequently involving parasites, is the most likely explanation for the observed homology. PMID:23530042
Rubino, L; Di Franco, A; Russo, M
2000-01-01
Carnation Italian ringspot tombusvirus encodes a protein, referred to as 36K, that possesses a mitochondrial targeting signal and two transmembrane segments which are thought to anchor this protein to the outer membrane of the mitochondrial envelope of infected plant cells. To determine the topology of the virus protein inserted in the cell membrane, as well as the sequence requirements for targeting and insertion, an in vivo system was set up in which this could be analysed in the absence of productive virus infection. The 36K protein was expressed in the yeast Saccharomyces cerevisiae in native form or fused to the green fluorescent protein. Using a fluorescence microscope, large green-fluorescing cytoplasmic aggregates were visible which stained red when cells were treated with the vital stain MitoTracker, which is specific for mitochondria. These aggregates were shown by electron microscopy to be composed of either mitochondria or membranes. The latter type was particularly abundant for the construct in which the green fluorescent protein was fused at the N terminus of the 36K protein. Immunoelectron microscopy demonstrated that the viral protein is present in the anomalous aggregates and Western blot analysis of protein extracts showed 36K to be resistant to alkaline, urea or salt extraction, a property of integral membrane proteins.
Tan, Jean-Marie; Payne, Elizabeth J.; Lin, Lynlee L.; Sinnya, Sudipta; Raphael, Anthony P.; Lambie, Duncan; Frazer, Ian H.; Dinger, Marcel E.; Soyer, H. Peter
2017-01-01
Identification of appropriate reference genes (RGs) is critical to accurate data interpretation in quantitative real-time PCR (qPCR) experiments. In this study, we have utilised next generation RNA sequencing (RNA-seq) to analyse the transcriptome of a panel of non-melanoma skin cancer lesions, identifying genes that are consistently expressed across all samples. Genes encoding ribosomal proteins were amongst the most stable in this dataset. Validation of this RNA-seq data was examined using qPCR to confirm the suitability of a set of highly stable genes for use as qPCR RGs. These genes will provide a valuable resource for the normalisation of qPCR data for the analysis of non-melanoma skin cancer. PMID:28852586
Genome variations associated with viral susceptibility and calcification in Emiliania huxleyi.
Kegel, Jessica U; John, Uwe; Valentin, Klaus; Frickenhaus, Stephan
2013-01-01
Emiliania huxleyi, a key player in the global carbon cycle is one of the best studied coccolithophores with respect to biogeochemical cycles, climatology, and host-virus interactions. Strains of E. huxleyi show phenotypic plasticity regarding growth behaviour, light-response, calcification, acidification, and virus susceptibility. This phenomenon is likely a consequence of genomic differences, or transcriptomic responses, to environmental conditions or threats such as viral infections. We used an E. huxleyi genome microarray based on the sequenced strain CCMP1516 (reference strain) to perform comparative genomic hybridizations (CGH) of 16 E. huxleyi strains of different geographic origin. We investigated the genomic diversity and plasticity and focused on the identification of genes related to virus susceptibility and coccolith production (calcification). Among the tested 31940 gene models a core genome of 14628 genes was identified by hybridization among 16 E. huxleyi strains. 224 probes were characterized as specific for the reference strain CCMP1516. Compared to the sequenced E. huxleyi strain CCMP1516 variation in gene content of up to 30 percent among strains was observed. Comparison of core and non-core transcripts sets in terms of annotated functions reveals a broad, almost equal functional coverage over all KOG-categories of both transcript sets within the whole annotated genome. Within the variable (non-core) genome we identified genes associated with virus susceptibility and calcification. Genes associated with virus susceptibility include a Bax inhibitor-1 protein, three LRR receptor-like protein kinases, and mitogen-activated protein kinase. Our list of transcripts associated with coccolith production will stimulate further research, e.g. by genetic manipulation. In particular, the V-type proton ATPase 16 kDa proteolipid subunit is proposed to be a plausible target gene for further calcification studies.
Genome Variations Associated with Viral Susceptibility and Calcification in Emiliania huxleyi
Kegel, Jessica U.; John, Uwe; Valentin, Klaus; Frickenhaus, Stephan
2013-01-01
Emiliania huxleyi, a key player in the global carbon cycle is one of the best studied coccolithophores with respect to biogeochemical cycles, climatology, and host-virus interactions. Strains of E. huxleyi show phenotypic plasticity regarding growth behaviour, light-response, calcification, acidification, and virus susceptibility. This phenomenon is likely a consequence of genomic differences, or transcriptomic responses, to environmental conditions or threats such as viral infections. We used an E. huxleyi genome microarray based on the sequenced strain CCMP1516 (reference strain) to perform comparative genomic hybridizations (CGH) of 16 E. huxleyi strains of different geographic origin. We investigated the genomic diversity and plasticity and focused on the identification of genes related to virus susceptibility and coccolith production (calcification). Among the tested 31940 gene models a core genome of 14628 genes was identified by hybridization among 16 E. huxleyi strains. 224 probes were characterized as specific for the reference strain CCMP1516. Compared to the sequenced E. huxleyi strain CCMP1516 variation in gene content of up to 30 percent among strains was observed. Comparison of core and non-core transcripts sets in terms of annotated functions reveals a broad, almost equal functional coverage over all KOG-categories of both transcript sets within the whole annotated genome. Within the variable (non-core) genome we identified genes associated with virus susceptibility and calcification. Genes associated with virus susceptibility include a Bax inhibitor-1 protein, three LRR receptor-like protein kinases, and mitogen-activated protein kinase. Our list of transcripts associated with coccolith production will stimulate further research, e.g. by genetic manipulation. In particular, the V-type proton ATPase 16 kDa proteolipid subunit is proposed to be a plausible target gene for further calcification studies. PMID:24260453
A multicenter study benchmarks software tools for label-free proteome quantification.
Navarro, Pedro; Kuharev, Jörg; Gillet, Ludovic C; Bernhardt, Oliver M; MacLean, Brendan; Röst, Hannes L; Tate, Stephen A; Tsou, Chih-Chiang; Reiter, Lukas; Distler, Ute; Rosenberger, George; Perez-Riverol, Yasset; Nesvizhskii, Alexey I; Aebersold, Ruedi; Tenzer, Stefan
2016-11-01
Consistent and accurate quantification of proteins by mass spectrometry (MS)-based proteomics depends on the performance of instruments, acquisition methods and data analysis software. In collaboration with the software developers, we evaluated OpenSWATH, SWATH 2.0, Skyline, Spectronaut and DIA-Umpire, five of the most widely used software methods for processing data from sequential window acquisition of all theoretical fragment-ion spectra (SWATH)-MS, which uses data-independent acquisition (DIA) for label-free protein quantification. We analyzed high-complexity test data sets from hybrid proteome samples of defined quantitative composition acquired on two different MS instruments using different SWATH isolation-window setups. For consistent evaluation, we developed LFQbench, an R package, to calculate metrics of precision and accuracy in label-free quantitative MS and report the identification performance, robustness and specificity of each software tool. Our reference data sets enabled developers to improve their software tools. After optimization, all tools provided highly convergent identification and reliable quantification performance, underscoring their robustness for label-free quantitative proteomics.
Brown, Peter; Pullan, Wayne; Yang, Yuedong; Zhou, Yaoqi
2016-02-01
The three dimensional tertiary structure of a protein at near atomic level resolution provides insight alluding to its function and evolution. As protein structure decides its functionality, similarity in structure usually implies similarity in function. As such, structure alignment techniques are often useful in the classifications of protein function. Given the rapidly growing rate of new, experimentally determined structures being made available from repositories such as the Protein Data Bank, fast and accurate computational structure comparison tools are required. This paper presents SPalignNS, a non-sequential protein structure alignment tool using a novel asymmetrical greedy search technique. The performance of SPalignNS was evaluated against existing sequential and non-sequential structure alignment methods by performing trials with commonly used datasets. These benchmark datasets used to gauge alignment accuracy include (i) 9538 pairwise alignments implied by the HOMSTRAD database of homologous proteins; (ii) a subset of 64 difficult alignments from set (i) that have low structure similarity; (iii) 199 pairwise alignments of proteins with similar structure but different topology; and (iv) a subset of 20 pairwise alignments from the RIPC set. SPalignNS is shown to achieve greater alignment accuracy (lower or comparable root-mean squared distance with increased structure overlap coverage) for all datasets, and the highest agreement with reference alignments from the challenging dataset (iv) above, when compared with both sequentially constrained alignments and other non-sequential alignments. SPalignNS was implemented in C++. The source code, binary executable, and a web server version is freely available at: http://sparks-lab.org yaoqi.zhou@griffith.edu.au. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Algorithms for database-dependent search of MS/MS data.
Matthiesen, Rune
2013-01-01
The frequent used bottom-up strategy for identification of proteins and their associated modifications generate nowadays typically thousands of MS/MS spectra that normally are matched automatically against a protein sequence database. Search engines that take as input MS/MS spectra and a protein sequence database are referred as database-dependent search engines. Many programs both commercial and freely available exist for database-dependent search of MS/MS spectra and most of the programs have excellent user documentation. The aim here is therefore to outline the algorithm strategy behind different search engines rather than providing software user manuals. The process of database-dependent search can be divided into search strategy, peptide scoring, protein scoring, and finally protein inference. Most efforts in the literature have been put in to comparing results from different software rather than discussing the underlining algorithms. Such practical comparisons can be cluttered by suboptimal implementation and the observed differences are frequently caused by software parameters settings which have not been set proper to allow even comparison. In other words an algorithmic idea can still be worth considering even if the software implementation has been demonstrated to be suboptimal. The aim in this chapter is therefore to split the algorithms for database-dependent searching of MS/MS data into the above steps so that the different algorithmic ideas become more transparent and comparable. Most search engines provide good implementations of the first three data analysis steps mentioned above, whereas the final step of protein inference are much less developed for most search engines and is in many cases performed by an external software. The final part of this chapter illustrates how protein inference is built into the VEMS search engine and discusses a stand-alone program SIR for protein inference that can import a Mascot search result.
Chromý, Vratislav; Vinklárková, Bára; Šprongl, Luděk; Bittová, Miroslava
2015-01-01
We found previously that albumin-calibrated total protein in certified reference materials causes unacceptable positive bias in analysis of human sera. The simplest way to cure this defect is the use of human-based serum/plasma standards calibrated by the Kjeldahl method. Such standards, commutative with serum samples, will compensate for bias caused by lipids and bilirubin in most human sera. To find a suitable primary reference procedure for total protein in reference materials, we reviewed Kjeldahl methods adopted by laboratory medicine. We found two methods recommended for total protein in human samples: an indirect analysis based on total Kjeldahl nitrogen corrected for its nonprotein nitrogen and a direct analysis made on isolated protein precipitates. The methods found will be assessed in a subsequent article.
Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics.
Deutsch, Eric W; Sun, Zhi; Campbell, David S; Binz, Pierre-Alain; Farrah, Terry; Shteynberg, David; Mendoza, Luis; Omenn, Gilbert S; Moritz, Robert L
2016-11-04
The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances-a problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ∼20,000 primary isoforms plus contaminants to a very large database that includes almost all nonredundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the discovered peptides against a more complex database. We have set up an automated system that downloads all the source databases on the first of each month and automatically generates a new set of search databases and makes them available for download at http://www.peptideatlas.org/thisp/ .
Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics
Deutsch, Eric W.; Sun, Zhi; Campbell, David S.; Binz, Pierre-Alain; Farrah, Terry; Shteynberg, David; Mendoza, Luis; Omenn, Gilbert S.; Moritz, Robert L.
2016-01-01
The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances – a problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ~20,000 primary isoforms plus contaminants to a very large database that includes almost all non-redundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the discovered peptides against a more complex database. We have set up an automated system that downloads all the source databases on the first of each month and automatically generates a new set of search databases and makes them available for download at http://www.peptideatlas.org/thisp/. PMID:27577934
Structural re-alignment in an immunologic surface region of ricin A chain
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zemla, A T; Zhou, C E
2007-07-24
We compared structure alignments generated by several protein structure comparison programs to determine whether existing methods would satisfactorily align residues at a highly conserved position within an immunogenic loop in ribosome inactivating proteins (RIPs). Using default settings, structure alignments generated by several programs (CE, DaliLite, FATCAT, LGA, MAMMOTH, MATRAS, SHEBA, SSM) failed to align the respective conserved residues, although LGA reported correct residue-residue (R-R) correspondences when the beta-carbon (Cb) position was used as the point of reference in the alignment calculations. Further tests using variable points of reference indicated that points distal from the beta carbon along a vector connectingmore » the alpha and beta carbons yielded rigid structural alignments in which residues known to be highly conserved in RIPs were reported as corresponding residues in structural comparisons between ricin A chain, abrin-A, and other RIPs. Results suggest that approaches to structure alignment employing alternate point representations corresponding to side chain position may yield structure alignments that are more consistent with observed conservation of functional surface residues than do standard alignment programs, which apply uniform criteria for alignment (i.e., alpha carbon (Ca) as point of reference) along the entirety of the peptide chain. We present the results of tests that suggest the utility of allowing user-specified points of reference in generating alternate structural alignments, and we present a web server for automatically generating such alignments.« less
Slocum, Joshua D; First, Jeremy T; Webb, Lauren J
2017-07-20
Measurement of the magnitude, direction, and functional importance of electric fields in biomolecules has been a long-standing experimental challenge. pK a shifts of titratable residues have been the most widely implemented measurements of the local electrostatic environment around the labile proton, and experimental data sets of pK a shifts in a variety of systems have been used to test and refine computational prediction capabilities of protein electrostatic fields. A more direct and increasingly popular technique to measure electric fields in proteins is Stark effect spectroscopy, where the change in absorption energy of a chromophore relative to a reference state is related to the change in electric field felt by the chromophore. While there are merits to both of these methods and they are both reporters of local electrostatic environment, they are fundamentally different measurements, and to our knowledge there has been no direct comparison of these two approaches in a single protein. We have recently demonstrated that green fluorescent protein (GFP) is an ideal model system for measuring changes in electric fields in a protein interior caused by amino acid mutations using both electronic and vibrational Stark effect chromophores. Here we report the changes in pK a of the GFP fluorophore in response to the same mutations and show that they are in excellent agreement with Stark effect measurements. This agreement in the results of orthogonal experiments reinforces our confidence in the experimental results of both Stark effect and pK a measurements and provides an excellent target data set to benchmark diverse protein electrostatics calculations. We used this experimental data set to test the pK a prediction ability of the adaptive Poisson-Boltzmann solver (APBS) and found that a simple continuum dielectric model of the GFP interior is insufficient to accurately capture the measured pK a and Stark effect shifts. We discuss some of the limitations of this continuum-based model in this system and offer this experimentally self-consistent data set as a target benchmark for electrostatics models, which could allow for a more rigorous test of pK a prediction techniques due to the unique environment of the water-filled GFP barrel compared to traditional globular proteins.
Analysis of protein-coding genetic variation in 60,706 humans.
Lek, Monkol; Karczewski, Konrad J; Minikel, Eric V; Samocha, Kaitlin E; Banks, Eric; Fennell, Timothy; O'Donnell-Luria, Anne H; Ware, James S; Hill, Andrew J; Cummings, Beryl B; Tukiainen, Taru; Birnbaum, Daniel P; Kosmicki, Jack A; Duncan, Laramie E; Estrada, Karol; Zhao, Fengmei; Zou, James; Pierce-Hoffman, Emma; Berghout, Joanne; Cooper, David N; Deflaux, Nicole; DePristo, Mark; Do, Ron; Flannick, Jason; Fromer, Menachem; Gauthier, Laura; Goldstein, Jackie; Gupta, Namrata; Howrigan, Daniel; Kiezun, Adam; Kurki, Mitja I; Moonshine, Ami Levy; Natarajan, Pradeep; Orozco, Lorena; Peloso, Gina M; Poplin, Ryan; Rivas, Manuel A; Ruano-Rubio, Valentin; Rose, Samuel A; Ruderfer, Douglas M; Shakir, Khalid; Stenson, Peter D; Stevens, Christine; Thomas, Brett P; Tiao, Grace; Tusie-Luna, Maria T; Weisburd, Ben; Won, Hong-Hee; Yu, Dongmei; Altshuler, David M; Ardissino, Diego; Boehnke, Michael; Danesh, John; Donnelly, Stacey; Elosua, Roberto; Florez, Jose C; Gabriel, Stacey B; Getz, Gad; Glatt, Stephen J; Hultman, Christina M; Kathiresan, Sekar; Laakso, Markku; McCarroll, Steven; McCarthy, Mark I; McGovern, Dermot; McPherson, Ruth; Neale, Benjamin M; Palotie, Aarno; Purcell, Shaun M; Saleheen, Danish; Scharf, Jeremiah M; Sklar, Pamela; Sullivan, Patrick F; Tuomilehto, Jaakko; Tsuang, Ming T; Watkins, Hugh C; Wilson, James G; Daly, Mark J; MacArthur, Daniel G
2016-08-18
Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.
Cube - an online tool for comparison and contrasting of protein sequences.
Zhang, Zong Hong; Khoo, Aik Aun; Mihalek, Ivana
2013-01-01
When comparing sequences of similar proteins, two kinds of questions can be asked, and the related two kinds of inference made. First, one may ask to what degree they are similar, and then, how they differ. In the first case one may tentatively conclude that the conserved elements common to all sequences are of central and common importance to the protein's function. In the latter case the regions of specialization may be discriminative of the function or binding partners across subfamilies of related proteins. Experimental efforts - mutagenesis or pharmacological intervention - can then be pointed in either direction, depending on the context of the study. Cube simplifies this process for users that already have their favorite sets of sequences, and helps them collate the information by visualization of the conservation and specialization scores on the sequence and on the structure, and by spreadsheet tabulation. All information can be visualized on the spot, or downloaded for reference and later inspection. http://eopsf.org/cube.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pucci, Fabrizio, E-mail: fapucci@ulb.ac.be; Bourgeas, Raphaël, E-mail: rbourgeas@ulb.ac.be; Rooman, Marianne, E-mail: mrooman@ulb.ac.be
We have set up and manually curated a dataset containing experimental information on the impact of amino acid substitutions in a protein on its thermal stability. It consists of a repository of experimentally measured melting temperatures (T{sub m}) and their changes upon point mutations (ΔT{sub m}) for proteins having a well-resolved x-ray structure. This high-quality dataset is designed for being used for the training or benchmarking of in silico thermal stability prediction methods. It also reports other experimentally measured thermodynamic quantities when available, i.e., the folding enthalpy (ΔH) and heat capacity (ΔC{sub P}) of the wild type proteins and theirmore » changes upon mutations (ΔΔH and ΔΔC{sub P}), as well as the change in folding free energy (ΔΔG) at a reference temperature. These data are analyzed in view of improving our insights into the correlation between thermal and thermodynamic stabilities, the asymmetry between the number of stabilizing and destabilizing mutations, and the difference in stabilization potential of thermostable versus mesostable proteins.« less
GPU-based cloud service for Smith-Waterman algorithm using frequency distance filtration scheme.
Lee, Sheng-Ta; Lin, Chun-Yuan; Hung, Che Lun
2013-01-01
As the conventional means of analyzing the similarity between a query sequence and database sequences, the Smith-Waterman algorithm is feasible for a database search owing to its high sensitivity. However, this algorithm is still quite time consuming. CUDA programming can improve computations efficiently by using the computational power of massive computing hardware as graphics processing units (GPUs). This work presents a novel Smith-Waterman algorithm with a frequency-based filtration method on GPUs rather than merely accelerating the comparisons yet expending computational resources to handle such unnecessary comparisons. A user friendly interface is also designed for potential cloud server applications with GPUs. Additionally, two data sets, H1N1 protein sequences (query sequence set) and human protein database (database set), are selected, followed by a comparison of CUDA-SW and CUDA-SW with the filtration method, referred to herein as CUDA-SWf. Experimental results indicate that reducing unnecessary sequence alignments can improve the computational time by up to 41%. Importantly, by using CUDA-SWf as a cloud service, this application can be accessed from any computing environment of a device with an Internet connection without time constraints.
nGASP - the nematode genome annotation assessment project
DOE Office of Scientific and Technical Information (OSTI.GOV)
Coghlan, A; Fiedler, T J; McKay, S J
2008-12-19
While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets for 10 Mb of the C. elegans genome. Predictions were compared to reference gene sets consisting of confirmed or manually curated gene models from WormBase. The most accurate gene-finders were 'combiner'more » algorithms, which made use of transcript- and protein-alignments and multi-genome alignments, as well as gene predictions from other gene-finders. Gene-finders that used alignments of ESTs, mRNAs and proteins came in second place. There was a tie for third place between gene-finders that used multi-genome alignments and ab initio gene-finders. The median gene level sensitivity of combiners was 78% and their specificity was 42%, which is nearly the same accuracy as reported for combiners in the human genome. C. elegans genes with exons of unusual hexamer content, as well as those with many exons, short exons, long introns, a weak translation start signal, weak splice sites, or poorly conserved orthologs were the most challenging for gene-finders. While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets for 10 Mb of the C. elegans genome. Predictions were compared to reference gene sets consisting of confirmed or manually curated gene models from WormBase. The most accurate gene-finders were 'combiner' algorithms, which made use of transcript- and protein-alignments and multi-genome alignments, as well as gene predictions from other gene-finders. Gene-finders that used alignments of ESTs, mRNAs and proteins came in second place. There was a tie for third place between gene-finders that used multi-genome alignments and ab initio gene-finders. The median gene level sensitivity of combiners was 78% and their specificity was 42%, which is nearly the same accuracy as reported for combiners in the human genome. C. elegans genes with exons of unusual hexamer content, as well as those with many exons, short exons, long introns, a weak translation start signal, weak splice sites, or poorly conserved orthologs were the most challenging for gene-finders.« less
Revisiting and re-engineering the classical zinc finger peptide: consensus peptide-1 (CP-1).
Besold, Angelique N; Widger, Leland R; Namuswe, Frances; Michalek, Jamie L; Michel, Sarah L J; Goldberg, David P
2016-04-01
Zinc plays key structural and catalytic roles in biology. Structural zinc sites are often referred to as zinc finger (ZF) sites, and the classical ZF contains a Cys2His2 motif that is involved in coordinating Zn(II). An optimized Cys2His2 ZF, named consensus peptide 1 (CP-1), was identified more than 20 years ago using a limited set of sequenced proteins. We have reexamined the CP-1 sequence, using our current, much larger database of sequenced proteins that have been identified from high-throughput sequencing methods, and found the sequence to be largely unchanged. The CCHH ligand set of CP-1 was then altered to a CAHH motif to impart hydrolytic activity. This ligand set mimics the His2Cys ligand set of peptide deformylase (PDF), a hydrolytically active M(II)-centered (M = Zn or Fe) protein. The resultant peptide [CP-1(CAHH)] was evaluated for its ability to coordinate Zn(II) and Co(II) ions, adopt secondary structure, and promote hydrolysis. CP-1(CAHH) was found to coordinate Co(II) and Zn(II) and a pentacoordinate geometry for Co(II)-CP-1(CAHH) was implicated from UV-vis data. This suggests a His2Cys(H2O)2 environment at the metal center. The Zn(II)-bound CP-1(CAHH) was shown to adopt partial secondary structure by 1-D (1)H NMR spectroscopy. Both Zn(II)-CP-1(CAHH) and Co(II)-CP-1(CAHH) show good hydrolytic activity toward the test substrate 4-nitrophenyl acetate, exhibiting faster rates than most active synthetic Zn(II) complexes.
A differential equation for the Generalized Born radii.
Fogolari, Federico; Corazza, Alessandra; Esposito, Gennaro
2013-06-28
The Generalized Born (GB) model offers a convenient way of representing electrostatics in complex macromolecules like proteins or nucleic acids. The computation of atomic GB radii is currently performed by different non-local approaches involving volume or surface integrals. Here we obtain a non-linear second-order partial differential equation for the Generalized Born radius, which may be solved using local iterative algorithms. The equation is derived under the assumption that the usual GB approximation to the reaction field obeys Laplace's equation. The equation admits as particular solutions the correct GB radii for the sphere and the plane. The tests performed on a set of 55 different proteins show an overall agreement with other reference GB models and "perfect" Poisson-Boltzmann based values.
Bioinformatic flowchart and database to investigate the origins and diversity of Clan AA peptidases
Llorens, Carlos; Futami, Ricardo; Renaud, Gabriel; Moya, Andrés
2009-01-01
Background Clan AA of aspartic peptidases relates the family of pepsin monomers evolutionarily with all dimeric peptidases encoded by eukaryotic LTR retroelements. Recent findings describing various pools of single-domain nonviral host peptidases, in prokaryotes and eukaryotes, indicate that the diversity of clan AA is larger than previously thought. The ensuing approach to investigate this enzyme group is by studying its phylogeny. However, clan AA is a difficult case to study due to the low similarity and different rates of evolution. This work is an ongoing attempt to investigate the different clan AA families to understand the cause of their diversity. Results In this paper, we describe in-progress database and bioinformatic flowchart designed to characterize the clan AA protein domain based on all possible protein families through ancestral reconstructions, sequence logos, and hidden markov models (HMMs). The flowchart includes the characterization of a major consensus sequence based on 6 amino acid patterns with correspondence with Andreeva's model, the structural template describing the clan AA peptidase fold. The set of tools is work in progress we have organized in a database within the GyDB project, referred to as Clan AA Reference Database . Conclusion The pre-existing classification combined with the evolutionary history of LTR retroelements permits a consistent taxonomical collection of sequence logos and HMMs. This set is useful for gene annotation but also a reference to evaluate the diversity of, and the relationships among, the different families. Comparisons among HMMs suggest a common ancestor for all dimeric clan AA peptidases that is halfway between single-domain nonviral peptidases and those coded by Ty3/Gypsy LTR retroelements. Sequence logos reveal how all clan AA families follow similar protein domain architecture related to the peptidase fold. In particular, each family nucleates a particular consensus motif in the sequence position related to the flap. The different motifs constitute a network where an alanine-asparagine-like variable motif predominates, instead of the canonical flap of the HIV-1 peptidase and closer relatives. Reviewers This article was reviewed by Daniel H. Haft, Vladimir Kapitonov (nominated by Jerry Jurka), and Ben M. Dunn (nominated by Claus Wilke). PMID:19173708
Shen, Yang; Bax, Ad
2013-01-01
A new program, TALOS-N, is introduced for predicting protein backbone torsion angles from NMR chemical shifts. The program relies far more extensively on the use of trained artificial neural networks than its predecessor, TALOS+. Validation on an independent set of proteins indicates that backbone torsion angles can be predicted for a larger, ≥ 90% fraction of the residues, with an error rate smaller than ca 3.5%, using an acceptance criterion that is nearly two-fold tighter than that used previously, and a root mean square difference between predicted and crystallographically observed (φ,ψ) torsion angles of ca 12°. TALOS-N also reports sidechain χ1 rotameric states for about 50% of the residues, and a consistency with reference structures of 89%. The program includes a neural network trained to identify secondary structure from residue sequence and chemical shifts. PMID:23728592
A leap into the chemical space of protein-protein interaction inhibitors.
Villoutreix, B O; Labbé, C M; Lagorce, D; Laconde, G; Sperandio, O
2012-01-01
Protein-protein interactions (PPI) are involved in vital cellular processes and are therefore associated to a growing number of diseases. But working with them as therapeutic targets comes with some major hurdles that require substantial mutations from our way to design drugs on historical targets such as enzymes and G-Protein Coupled Receptor (GPCR). Among the numerous ways we could improve our methodologies to maximize the potential of developing new chemical entities on PPI targets, is the fundamental question of what type of compounds should we use to identify the first hits and among which chemical space should we navigate to optimize them to the drug candidate stage. In this review article, we cover different aspects on PPI but with the aim to gain some insights into the specific nature of the chemical space of PPI inhibitors. We describe the work of different groups to highlight such properties and discuss their respective approach. We finally discuss a case study in which we describe the properties of a set of 115 PPI inhibitors that we compare to a reference set of 1730 enzyme inhibitors. This case study highlights interesting properties such as the unfortunate price that still needs to be paid by PPI inhibitors in terms of molecular weight, hydrophobicity, and aromaticity in order to reach a critical level of activity. But it also shows that not all PPI targets are equivalent, and that some PPI targets can demonstrate a better druggability by illustrating the better drug likeness of their associated inhibitors.
Omelchenko, Marina V; Galperin, Michael Y; Wolf, Yuri I; Koonin, Eugene V
2010-04-30
Evolutionarily unrelated proteins that catalyze the same biochemical reactions are often referred to as analogous - as opposed to homologous - enzymes. The existence of numerous alternative, non-homologous enzyme isoforms presents an interesting evolutionary problem; it also complicates genome-based reconstruction of the metabolic pathways in a variety of organisms. In 1998, a systematic search for analogous enzymes resulted in the identification of 105 Enzyme Commission (EC) numbers that included two or more proteins without detectable sequence similarity to each other, including 34 EC nodes where proteins were known (or predicted) to have distinct structural folds, indicating independent evolutionary origins. In the past 12 years, many putative non-homologous isofunctional enzymes were identified in newly sequenced genomes. In addition, efforts in structural genomics resulted in a vastly improved structural coverage of proteomes, providing for definitive assessment of (non)homologous relationships between proteins. We report the results of a comprehensive search for non-homologous isofunctional enzymes (NISE) that yielded 185 EC nodes with two or more experimentally characterized - or predicted - structurally unrelated proteins. Of these NISE sets, only 74 were from the original 1998 list. Structural assignments of the NISE show over-representation of proteins with the TIM barrel fold and the nucleotide-binding Rossmann fold. From the functional perspective, the set of NISE is enriched in hydrolases, particularly carbohydrate hydrolases, and in enzymes involved in defense against oxidative stress. These results indicate that at least some of the non-homologous isofunctional enzymes were recruited relatively recently from enzyme families that are active against related substrates and are sufficiently flexible to accommodate changes in substrate specificity.
SIBIS: a Bayesian model for inconsistent protein sequence estimation.
Khenoussi, Walyd; Vanhoutrève, Renaud; Poch, Olivier; Thompson, Julie D
2014-09-01
The prediction of protein coding genes is a major challenge that depends on the quality of genome sequencing, the accuracy of the model used to elucidate the exonic structure of the genes and the complexity of the gene splicing process leading to different protein variants. As a consequence, today's protein databases contain a huge amount of inconsistency, due to both natural variants and sequence prediction errors. We have developed a new method, called SIBIS, to detect such inconsistencies based on the evolutionary information in multiple sequence alignments. A Bayesian framework, combined with Dirichlet mixture models, is used to estimate the probability of observing specific amino acids and to detect inconsistent or erroneous sequence segments. We evaluated the performance of SIBIS on a reference set of protein sequences with experimentally validated errors and showed that the sensitivity is significantly higher than previous methods, with only a small loss of specificity. We also assessed a large set of human sequences from the UniProt database and found evidence of inconsistency in 48% of the previously uncharacterized sequences. We conclude that the integration of quality control methods like SIBIS in automatic analysis pipelines will be critical for the robust inference of structural, functional and phylogenetic information from these sequences. Source code, implemented in C on a linux system, and the datasets of protein sequences are freely available for download at http://www.lbgi.fr/∼julie/SIBIS. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Cross-Link Guided Molecular Modeling with ROSETTA
Leitner, Alexander; Rosenberger, George; Aebersold, Ruedi; Malmström, Lars
2013-01-01
Chemical cross-links identified by mass spectrometry generate distance restraints that reveal low-resolution structural information on proteins and protein complexes. The technology to reliably generate such data has become mature and robust enough to shift the focus to the question of how these distance restraints can be best integrated into molecular modeling calculations. Here, we introduce three workflows for incorporating distance restraints generated by chemical cross-linking and mass spectrometry into ROSETTA protocols for comparative and de novo modeling and protein-protein docking. We demonstrate that the cross-link validation and visualization software Xwalk facilitates successful cross-link data integration. Besides the protocols we introduce XLdb, a database of chemical cross-links from 14 different publications with 506 intra-protein and 62 inter-protein cross-links, where each cross-link can be mapped on an experimental structure from the Protein Data Bank. Finally, we demonstrate on a protein-protein docking reference data set the impact of virtual cross-links on protein docking calculations and show that an inter-protein cross-link can reduce on average the RMSD of a docking prediction by 5.0 Å. The methods and results presented here provide guidelines for the effective integration of chemical cross-link data in molecular modeling calculations and should advance the structural analysis of particularly large and transient protein complexes via hybrid structural biology methods. PMID:24069194
CORUM: the comprehensive resource of mammalian protein complexes
Ruepp, Andreas; Brauner, Barbara; Dunger-Kaltenbach, Irmtraud; Frishman, Goar; Montrone, Corinna; Stransky, Michael; Waegele, Brigitte; Schmidt, Thorsten; Doudieu, Octave Noubibou; Stümpflen, Volker; Mewes, H. Werner
2008-01-01
Protein complexes are key molecular entities that integrate multiple gene products to perform cellular functions. The CORUM (http://mips.gsf.de/genre/proj/corum/index.html) database is a collection of experimentally verified mammalian protein complexes. Information is manually derived by critical reading of the scientific literature from expert annotators. Information about protein complexes includes protein complex names, subunits, literature references as well as the function of the complexes. For functional annotation, we use the FunCat catalogue that enables to organize the protein complex space into biologically meaningful subsets. The database contains more than 1750 protein complexes that are built from 2400 different genes, thus representing 12% of the protein-coding genes in human. A web-based system is available to query, view and download the data. CORUM provides a comprehensive dataset of protein complexes for discoveries in systems biology, analyses of protein networks and protein complex-associated diseases. Comparable to the MIPS reference dataset of protein complexes from yeast, CORUM intends to serve as a reference for mammalian protein complexes. PMID:17965090
Creskey, Marybeth C; Li, Changgui; Wang, Junzhi; Girard, Michel; Lorbetskie, Barry; Gravel, Caroline; Farnsworth, Aaron; Li, Xuguang; Smith, Daryl G S; Cyr, Terry D
2012-07-06
Current methods for quality control of inactivated influenza vaccines prior to regulatory approval include determining the hemagglutinin (HA) content by single radial immunodiffusion (SRID), verifying neuraminidase (NA) enzymatic activity, and demonstrating that the levels of the contaminant protein ovalbumin are below a set threshold of 1 μg/dose. The SRID assays require the availability of strain-specific reference HA antigens and antibodies, the production of which is a potential rate-limiting step in vaccine development and release, particularly during a pandemic. Immune responses induced by neuraminidase also contribute to protection from infection; however, the amounts of NA antigen in influenza vaccines are currently not quantified or standardized. Here, we report a method for vaccine analysis that yields simultaneous quantification of HA and NA levels much more rapidly than conventional HA quantification techniques, while providing additional valuable information on the total protein content. Enzymatically digested vaccine proteins were analyzed by LC-MS(E), a mass spectrometric technology that allows absolute quantification of analytes, including the HA and NA antigens, other structural influenza proteins and chicken egg proteins associated with the manufacturing process. This method has potential application for increasing the accuracy of reference antigen standards and for validating label claims for HA content in formulated vaccines. It can also be used to monitor NA and chicken egg protein content in order to monitor manufacturing consistency. While this is a useful methodology with potential for broad application, we also discuss herein some of the inherent limitations of this approach and the care and caution that must be taken in its use as a tool for absolute protein quantification. The variations in HA, NA and chicken egg protein concentrations in the vaccines analyzed in this study are indicative of the challenges associated with the current manufacturing and quality control testing procedures. Crown Copyright © 2012. Published by Elsevier Ltd. All rights reserved.
Complement Set Reference after Implicitly Small Quantities: An Event-Related Potentials Study
ERIC Educational Resources Information Center
Ingram, Joanne; Ferguson, Heather J.
2018-01-01
An anaphoric reference to the complement-set is a reference to the set that does not fulfil the predicate of the preceding sentence. Preferred reference to the complement-set has been found in eye movements when a character's implicit desire for a high amount has been denied using a negative emotion. We recorded event-related potentials to examine…
Vödisch, Martin; Albrecht, Daniela; Lessing, Franziska; Schmidt, André D; Winkler, Robert; Guthke, Reinhard; Brakhage, Axel A; Kniemeyer, Olaf
2009-03-01
The filamentous fungus Aspergillus fumigatus has become the most important airborne fungal pathogen causing life-threatening infections in immunosuppressed patients. We established a 2-D reference map for A. fumigatus. Using MALDI-TOF-MS/MS, we identified 381 spots representing 334 proteins. Proteins involved in cellular metabolism, protein synthesis, transport processes and cell cycle were most abundant. Furthermore, we established a protocol for the isolation of mitochondria of A. fumigatus and developed a mitochondrial proteome reference map. 147 proteins represented by 234 spots were identified.
Mitran, Catherine J; Mbonye, Anthony K; Hawkes, Michael; Yanow, Stephanie K
2018-06-04
Malaria rapid diagnostic tests (RDTs) are widely used in clinical and surveillance settings. However, the performance of most RDTs has not been characterized at parasite densities below detection by microscopy. We present findings from Uganda, where RDT results from 491 participants with suspected malaria were correlated with quantitative polymerase chain reaction (qPCR)-defined parasitemia. Compared with qPCR, the sensitivity and specificity of the RDT for Plasmodium falciparum mono-infections were 76% (95% confidence interval [CI]: 68-83%) and 95% (95% CI: 92-97%), respectively. The sensitivity of the RDT at parasite densities between 0.2 and 200 parasites/μL was surprisingly high (87%, 95% CI: 74-94%). The high sensitivity of the RDT is likely because of histidine-rich protein 2 from submicroscopic infections, gametocytes, or sequestered parasites. These findings underscore the importance of evaluating different RDTs in field studies against qPCR reference testing to better define the sensitivity and specificity, particularly at low parasite densities.
Development of a dedicated peptide tandem mass spectral library for conservation science.
Fremout, Wim; Dhaenens, Maarten; Saverwyns, Steven; Sanyova, Jana; Vandenabeele, Peter; Deforce, Dieter; Moens, Luc
2012-05-30
In recent years, the use of liquid chromatography tandem mass spectrometry (LC-MS/MS) on tryptic digests of cultural heritage objects has attracted much attention. It allows for unambiguous identification of peptides and proteins, and even in complex mixtures species-specific identification becomes feasible with minimal sample consumption. Determination of the peptides is commonly based on theoretical cleavage of known protein sequences and on comparison of the expected peptide fragments with those found in the MS/MS spectra. In this approach, complex computer programs, such as Mascot, perform well identifying known proteins, but fail when protein sequences are unknown or incomplete. Often, when trying to distinguish evolutionarily well preserved collagens of different species, Mascot lacks the required specificity. Complementary and often more accurate information on the proteins can be obtained using a reference library of MS/MS spectra of species-specific peptides. Therefore, a library dedicated to various sources of proteins in works of art was set up, with an initial focus on collagen rich materials. This paper discusses the construction and the advantages of this spectral library for conservation science, and its application on a number of samples from historical works of art. Copyright © 2012 Elsevier B.V. All rights reserved.
Longitudinal Urinary Protein Variability in Participants of the Space Flight Simulation Program.
Khristenko, Nina A; Larina, Irina M; Domon, Bruno
2016-01-04
Urine is a valuable material for the diagnosis of renal pathologies and to investigate the effects of their treatment. However, the variability in protein abundance in the context of normal homeostasis remains a major challenge in urinary proteomics. In this study, the analysis of urine samples collected from healthy individuals, rigorously selected to take part in the MARS-500 spaceflight simulation program, provided a unique opportunity to estimate normal concentration ranges for an extended set of urinary proteins. In order to systematically identify and reliably quantify peptides/proteins across a large sample cohort, a targeted mass spectrometry method was developed. The performance of parallel reaction monitoring (PRM) analyses was improved by implementing tight control of the monitoring windows during LC-MS/MS runs, using an on-the-fly correction routine. Matching the experimentally obtained MS/MS spectra with reference fragmentation patterns allowed dependable peptide identifications to be made. Following optimization and evaluation, the targeted method was applied to investigate protein abundance variability in 56 urine samples, collected from six volunteers participating in the MARS-500 program. The intrapersonal protein concentration ranges were determined for each individual and showed unexpectedly high abundance variation, with an average difference of 1 order of magnitude.
Sun, Meng; Lu, Ming-Xing; Tang, Xiao-Tian; Du, Yu-Zhou
2015-01-01
The pink stem borer, Sesamia inferens, which is endemic in China and other parts of Asia, is a major pest of rice and causes significant yield loss in this host plant. Very few studies have addressed gene expression in S. inferens. Quantitative real-time PCR (qRT-PCR) is currently the most accurate and sensitive method for gene expression analysis. In qRT-PCR, data are normalized using reference genes, which help control for internal differences and reduce error between samples. In this study, seven candidate reference genes, 18S ribosomal RNA (18S rRNA), elongation factor 1 (EF1), glyceraldehyde-3-phosphate dehydrogenase (GAPDH), ribosomal protein S13 (RPS13), ribosomal protein S20 (RPS20), tubulin (TUB), and β-actin (ACTB) were evaluated for their suitability in normalizing gene expression under different experimental conditions. The results indicated that three genes (RPS13, RPS20, and EF1) were optimal for normalizing gene expression in different insect tissues (head, epidermis, fat body, foregut, midgut, hindgut, Malpighian tubules, haemocytes, and salivary glands). 18S rRNA, EF1, and GAPDH were best for normalizing expression with respect to developmental stages and sex (egg masses; first, second, third, fourth, fifth, and sixth instar larvae; male and female pupae; and one-day-old male and female adults). 18S rRNA, RPS20, and TUB were optimal for fifth instars exposed to different temperatures (−8, −6, −4, −2, 0, and 27°C). To validate this recommendation, the expression profile of a target gene heat shock protein 83 gene (hsp83) was investigated, and results showed the selection was necessary and effective. In conclusion, this study describes reference gene sets that can be used to accurately measure gene expression in S. inferens. PMID:25585250
Paasch, Uwe; Heidenreich, Falk; Pursche, Theresia; Kuhlisch, Eberhard; Kettner, Karina; Grunewald, Sonja; Kratzsch, Jürgen; Dittmar, Gunnar; Glander, Hans-Jürgen; Hoflack, Bernard; Kriegel, Thomas M
2011-08-01
Metabolic disorders like diabetes mellitus and obesity may compromise the fertility of men and women. To unveil disease-associated proteomic changes potentially affecting male fertility, the proteomes of sperm cells from type-1 diabetic, type-2 diabetic, non-diabetic obese and clinically healthy individuals were comparatively analyzed by difference gel electrophoresis. The adaptation of a general protein extraction procedure to the solubilization of proteins from sperm cells allowed for the resolution of 3187 fluorescent spots in the difference gel electrophoresis image of the master gel, which contained the entirety of solubilized sperm proteins. Comparison of the pathological and reference proteomes by applying an average abundance ratio setting of 1.6 and a p ≤ 0.05 criterion resulted in the identification of 79 fluorescent spots containing proteins that were present at significantly changed levels in the sperm cells. Biometric evaluation of the fluorescence data followed by mass spectrometric protein identification revealed altered levels of 12, 71, and 13 protein species in the proteomes of the type-1 diabetic, type-2 diabetic, and non-diabetic obese patients, respectively, with considerably enhanced amounts of the same set of one molecular form of semenogelin-1, one form of clusterin, and two forms of lactotransferrin in each group of pathologic samples. Remarkably, β-galactosidase-1-like protein was the only protein that was detected at decreased levels in all three pathologic situations. The former three proteins are part of the eppin (epididymal proteinase inhibitor) protein complex, which is thought to fulfill fertilization-related functions, such as ejaculate sperm protection, motility regulation and gain of competence for acrosome reaction, whereas the putative role of the latter protein to function as a glycosyl hydrolase during sperm maturation remains to be explored at the protein/enzyme level. The strikingly similar differences detected in the three groups of pathological sperm proteomes reflect a disease-associated enhanced formation of predominantly proteolytically modified forms of three eppin protein complex components, possibly as a response to enduring hyperglycemia and enhanced oxidative stress.
Emission spectra profiling of fluorescent proteins in living plant cells
2013-01-01
Background Fluorescence imaging at high spectral resolution allows the simultaneous recording of multiple fluorophores without switching optical filters, which is especially useful for time-lapse analysis of living cells. The collected emission spectra can be used to distinguish fluorophores by a computation analysis called linear unmixing. The availability of accurate reference spectra for different fluorophores is crucial for this type of analysis. The reference spectra used by plant cell biologists are in most cases derived from the analysis of fluorescent proteins in solution or produced in animal cells, although these spectra are influenced by both the cellular environment and the components of the optical system. For instance, plant cells contain various autofluorescent compounds, such as cell wall polymers and chlorophyll, that affect the spectral detection of some fluorophores. Therefore, it is important to acquire both reference and experimental spectra under the same biological conditions and through the same imaging systems. Results Entry clones (pENTR) of fluorescent proteins (FPs) were constructed in order to create C- or N-terminal protein fusions with the MultiSite Gateway recombination technology. The emission spectra for eight FPs, fused C-terminally to the A- or B-type cyclin dependent kinases (CDKA;1 and CDKB1;1) and transiently expressed in epidermal cells of tobacco (Nicotiana benthamiana), were determined by using the Olympus FluoView™ FV1000 Confocal Laser Scanning Microscope. These experimental spectra were then used in unmixing experiments in order to separate the emission of fluorophores with overlapping spectral properties in living plant cells. Conclusions Spectral imaging and linear unmixing have a great potential for efficient multicolor detection in living plant cells. The emission spectra for eight of the most commonly used FPs were obtained in epidermal cells of tobacco leaves and used in unmixing experiments. The generated set of FP Gateway entry vectors represents a valuable resource for plant cell biologists. PMID:23552272
Evaluation and Cross-Comparison of Lexical Entities of Biological Interest (LexEBI)
Rebholz-Schuhmann, Dietrich; Kim, Jee-Hyub; Yan, Ying; Dixit, Abhishek; Friteyre, Caroline; Hoehndorf, Robert; Backofen, Rolf; Lewin, Ian
2013-01-01
Motivation Biomedical entities, their identifiers and names, are essential in the representation of biomedical facts and knowledge. In the same way, the complete set of biomedical and chemical terms, i.e. the biomedical “term space” (the “Lexeome”), forms a key resource to achieve the full integration of the scientific literature with biomedical data resources: any identified named entity can immediately be normalized to the correct database entry. This goal does not only require that we are aware of all existing terms, but would also profit from knowing all their senses and their semantic interpretation (ambiguities, nestedness). Result This study compiles a resource for lexical terms of biomedical interest in a standard format (called “LexEBI”), determines the overall number of terms, their reuse in different resources and the nestedness of terms. LexEBI comprises references for protein and gene entries and their term variants and chemical entities amongst other terms. In addition, disease terms have been identified from Medline and PubmedCentral and added to LexEBI. Our analysis demonstrates that the baseforms of terms from the different semantic types show only little polysemous use. Nonetheless, the term variants of protein and gene names (PGNs) frequently contain species mentions, which should have been avoided according to protein annotation guidelines. Furthermore, the protein and gene entities as well as the chemical entities, both do comprise enzymes leading to hierarchical polysemy, and a large portion of PGNs make reference to a chemical entity. Altogether, according to our analysis based on the Medline distribution, 401,869 unique PGNs in the documents contain a reference to 25,022 chemical entities, 3,125 disease terms or 1,576 species mentions. Conclusion LexEBI delivers the complete biomedical and chemical Lexeome in a standardized representation (http://www.ebi.ac.uk/Rebholz-srv/LexEBI/). The resource provides the disease terms as open source content, and fully interlinks terms across resources. PMID:24124474
Li, Jieyue; Xiong, Liang; Schneider, Jeff; Murphy, Robert F
2012-06-15
Knowledge of the subcellular location of a protein is crucial for understanding its functions. The subcellular pattern of a protein is typically represented as the set of cellular components in which it is located, and an important task is to determine this set from microscope images. In this article, we address this classification problem using confocal immunofluorescence images from the Human Protein Atlas (HPA) project. The HPA contains images of cells stained for many proteins; each is also stained for three reference components, but there are many other components that are invisible. Given one such cell, the task is to classify the pattern type of the stained protein. We first randomly select local image regions within the cells, and then extract various carefully designed features from these regions. This region-based approach enables us to explicitly study the relationship between proteins and different cell components, as well as the interactions between these components. To achieve these two goals, we propose two discriminative models that extend logistic regression with structured latent variables. The first model allows the same protein pattern class to be expressed differently according to the underlying components in different regions. The second model further captures the spatial dependencies between the components within the same cell so that we can better infer these components. To learn these models, we propose a fast approximate algorithm for inference, and then use gradient-based methods to maximize the data likelihood. In the experiments, we show that the proposed models help improve the classification accuracies on synthetic data and real cellular images. The best overall accuracy we report in this article for classifying 942 proteins into 13 classes of patterns is about 84.6%, which to our knowledge is the best so far. In addition, the dependencies learned are consistent with prior knowledge of cell organization. http://murphylab.web.cmu.edu/software/.
Martínez-Castilla, León P.; Rodríguez-Sotres, Rogelio
2010-01-01
Background Despite the remarkable progress of bioinformatics, how the primary structure of a protein leads to a three-dimensional fold, and in turn determines its function remains an elusive question. Alignments of sequences with known function can be used to identify proteins with the same or similar function with high success. However, identification of function-related and structure-related amino acid positions is only possible after a detailed study of every protein. Folding pattern diversity seems to be much narrower than sequence diversity, and the amino acid sequences of natural proteins have evolved under a selective pressure comprising structural and functional requirements acting in parallel. Principal Findings The approach described in this work begins by generating a large number of amino acid sequences using ROSETTA [Dantas G et al. (2003) J Mol Biol 332:449–460], a program with notable robustness in the assignment of amino acids to a known three-dimensional structure. The resulting sequence-sets showed no conservation of amino acids at active sites, or protein-protein interfaces. Hidden Markov models built from the resulting sequence sets were used to search sequence databases. Surprisingly, the models retrieved from the database sequences belonged to proteins with the same or a very similar function. Given an appropriate cutoff, the rate of false positives was zero. According to our results, this protocol, here referred to as Rd.HMM, detects fine structural details on the folding patterns, that seem to be tightly linked to the fitness of a structural framework for a specific biological function. Conclusion Because the sequence of the native protein used to create the Rd.HMM model was always amongst the top hits, the procedure is a reliable tool to score, very accurately, the quality and appropriateness of computer-modeled 3D-structures, without the need for spectroscopy data. However, Rd.HMM is very sensitive to the conformational features of the models' backbone. PMID:20830209
Brassica napus seed endosperm - metabolism and signaling in a dead end tissue.
Lorenz, Christin; Rolletschek, Hardy; Sunderhaus, Stephanie; Braun, Hans-Peter
2014-08-28
Oilseeds are an important element of human nutrition and of increasing significance for the production of industrial materials. The development of the seeds is based on a coordinated interplay of the embryo and its surrounding tissue, the endosperm. This study aims to give insights into the physiological role of endosperm for seed development in the oilseed crop Brassica napus. Using protein separation by two-dimensional (2D) isoelectric focusing (IEF)/SDS polyacrylamide gel electrophoresis (PAGE) and protein identification by mass spectrometry three proteome projects were carried out: (i) establishment of an endosperm proteome reference map, (ii) proteomic characterization of endosperm development and (iii) comparison of endosperm and embryo proteomes. The endosperm proteome reference map comprises 930 distinct proteins, including enzymes involved in genetic information processing, carbohydrate metabolism, environmental information processing, energy metabolism, cellular processes and amino acid metabolism. To investigate dynamic changes in protein abundance during seed development, total soluble proteins were extracted from embryo and endosperm fractions at defined time points. Proteins involved in sugar converting and recycling processes, ascorbate metabolism, amino acid biosynthesis and redox balancing were found to be of special importance for seed development in B. napus. Implications for the seed filling process and the function of the endosperm for seed development are discussed. The endosperm is of key importance for embryo development during seed formation in plants. We present a broad study for characterizing endosperm proteins in the oilseed plant B. napus. Furthermore, a project on the biochemical interplay between the embryo and the endosperm during seed development is presented. We provide evidence that the endosperm includes a complete set of enzymes necessary for plant primary metabolism. Combination of our results with metabolome data will further improve systems-level understanding of the seed filling process and provide rational strategies for plant bioengineering. Copyright © 2014 Elsevier B.V. All rights reserved.
Omasits, Ulrich; Varadarajan, Adithi R; Schmid, Michael; Goetze, Sandra; Melidis, Damianos; Bourqui, Marc; Nikolayeva, Olga; Québatte, Maxime; Patrignani, Andrea; Dehio, Christoph; Frey, Juerg E; Robinson, Mark D; Wollscheid, Bernd; Ahrens, Christian H
2017-12-01
Accurate annotation of all protein-coding sequences (CDSs) is an essential prerequisite to fully exploit the rapidly growing repertoire of completely sequenced prokaryotic genomes. However, large discrepancies among the number of CDSs annotated by different resources, missed functional short open reading frames (sORFs), and overprediction of spurious ORFs represent serious limitations. Our strategy toward accurate and complete genome annotation consolidates CDSs from multiple reference annotation resources, ab initio gene prediction algorithms and in silico ORFs (a modified six-frame translation considering alternative start codons) in an integrated proteogenomics database (iPtgxDB) that covers the entire protein-coding potential of a prokaryotic genome. By extending the PeptideClassifier concept of unambiguous peptides for prokaryotes, close to 95% of the identifiable peptides imply one distinct protein, largely simplifying downstream analysis. Searching a comprehensive Bartonella henselae proteomics data set against such an iPtgxDB allowed us to unambiguously identify novel ORFs uniquely predicted by each resource, including lipoproteins, differentially expressed and membrane-localized proteins, novel start sites and wrongly annotated pseudogenes. Most novelties were confirmed by targeted, parallel reaction monitoring mass spectrometry, including unique ORFs and single amino acid variations (SAAVs) identified in a re-sequenced laboratory strain that are not present in its reference genome. We demonstrate the general applicability of our strategy for genomes with varying GC content and distinct taxonomic origin. We release iPtgxDBs for B. henselae , Bradyrhizobium diazoefficiens and Escherichia coli and the software to generate both proteogenomics search databases and integrated annotation files that can be viewed in a genome browser for any prokaryote. © 2017 Omasits et al.; Published by Cold Spring Harbor Laboratory Press.
Bogomolov, Andrey; Belikova, Valeria; Galyanin, Vladislav; Melenteva, Anastasiia; Meyer, Hans
2017-05-15
New technique of diffuse reflectance spectroscopic analysis of milk fat and total protein content in the visible (Vis) and adjacent near infrared (NIR) region (400-995nm) has been developed and tested. Sample analysis was performed through a probe having eight 200-µm fiber channels forming a linear array. One of the end fibers was used for the illumination and other seven - for the spectroscopic detection of diffusely reflected light. One of the detection channels was used as a reference to normalize the spectra and to convert them into absorbance-equivalent units. The method has been tested experimentally using a designed sample set prepared from industrial raw milk standards with widely varying fat and protein content. To increase the modelling robustness all milk samples were measured in three different homogenization degrees. Comprehensive data analysis has shown the advantage of combining both spectral and spatial resolution in the same measurement and revealed the most relevant channels and wavelength regions. The modelling accuracy was further improved using joint variable selection and preprocessing optimization method based on the genetic algorithm. The root mean-square errors of different validation methods were below 0.10% for fat and below 0.08% for total protein content. Based on the present experimental data, it was computationally shown that the full-spectrum analysis in this method can be replaced by a sensor measurement at several specific wavelengths, for instance, using light-emitting diodes (LEDs) for illumination. Two optimal sensor configurations have been suggested: with nine LEDs for the analysis of fat and seven - for protein content. Both simulated sensors exhibit nearly the same component determination accuracy as corresponding full-spectrum analysis. Copyright © 2017 Elsevier B.V. All rights reserved.
Hadrévi, Jenny; Hellström, Fredrik; Kieselbach, Thomas; Malm, Christer; Pedrosa-Domellöf, Fatima
2011-08-10
The trapezius muscle is a neck muscle that is susceptible to chronic pain conditions associated with repetitive tasks, commonly referred to as chronic work-related myalgia, hence making the trapezius a muscle of clinical interest. To provide a basis for further investigations of the proteomic traits of the trapezius muscle in disease, two-dimensional difference gel electrophoresis (2D-DIGE) was performed on the healthy trapezius using vastus lateralis as a reference. To obtain as much information as possible from the vast proteomic data set, both one-way ANOVA, with and without false discovery rate (FDR) correlation, and partial least square projection to latent structures with discriminant analysis (PLS-DA) were combined to compare the outcome of the analysis. The trapezius and vastus lateralis showed significant differences in metabolic, contractile and regulatory proteins, with different results depending on choice of statistical approach and pre-processing technique. Using the standard method, FDR correlated one-way ANOVA, 42 protein spots differed significantly in abundance between the two muscles. Complementary analysis using immunohistochemistry and western blot confirmed the results from the 2D-DIGE analysis. The proteomic approach used in the present study combining 2D-DIGE and multivariate modelling provided a more comprehensive comparison of the protein profiles of the human trapezius and vastus lateralis muscle, than previously possible to obtain with immunohistochemistry or SDS-PAGE alone. Although 2D-DIGE has inherent limitations it is particularly useful to comprehensively screen for important structural and metabolic proteins, and appears to be a promising tool for future studies of patients suffering from chronic work related myalgia or other muscle diseases.
Constraints on lateral gene transfer in promoting fimbrial usher protein diversity and function.
Stubenrauch, Christopher J; Dougan, Gordon; Lithgow, Trevor; Heinz, Eva
2017-11-01
Fimbriae are long, adhesive structures widespread throughout members of the family Enterobacteriaceae. They are multimeric extrusions, which are moved out of the bacterial cell through an integral outer membrane protein called usher. The complex folding mechanics of the usher protein were recently revealed to be catalysed by the membrane-embedded translocation and assembly module (TAM). Here, we examine the diversity of usher proteins across a wide range of extraintestinal (ExPEC) and enteropathogenic (EPEC) Escherichia coli , and further focus on a so far undescribed chaperone-usher system, with this usher referred to as UshC. The fimbrial system containing UshC is distributed across a discrete set of EPEC types, including model strains like E2348/67, as well as ExPEC ST131, currently the most prominent multi-drug-resistant uropathogenic E. coli strain worldwide. Deletion of the TAM from a naive strain of E. coli results in a drastic time delay in folding of UshC, which can be observed for a protein from EPEC as well as for two introduced proteins from related organisms, Yersinia and Enterobacter We suggest that this models why the TAM machinery is essential for efficient folding of proteins acquired via lateral gene transfer. © 2017 The Authors.
Constraints on lateral gene transfer in promoting fimbrial usher protein diversity and function
Stubenrauch, Christopher J.; Dougan, Gordon; Lithgow, Trevor
2017-01-01
Fimbriae are long, adhesive structures widespread throughout members of the family Enterobacteriaceae. They are multimeric extrusions, which are moved out of the bacterial cell through an integral outer membrane protein called usher. The complex folding mechanics of the usher protein were recently revealed to be catalysed by the membrane-embedded translocation and assembly module (TAM). Here, we examine the diversity of usher proteins across a wide range of extraintestinal (ExPEC) and enteropathogenic (EPEC) Escherichia coli, and further focus on a so far undescribed chaperone–usher system, with this usher referred to as UshC. The fimbrial system containing UshC is distributed across a discrete set of EPEC types, including model strains like E2348/67, as well as ExPEC ST131, currently the most prominent multi-drug-resistant uropathogenic E. coli strain worldwide. Deletion of the TAM from a naive strain of E. coli results in a drastic time delay in folding of UshC, which can be observed for a protein from EPEC as well as for two introduced proteins from related organisms, Yersinia and Enterobacter. We suggest that this models why the TAM machinery is essential for efficient folding of proteins acquired via lateral gene transfer. PMID:29142104
Sevy, Alexander M.; Jacobs, Tim M.; Crowe, James E.; Meiler, Jens
2015-01-01
Computational protein design has found great success in engineering proteins for thermodynamic stability, binding specificity, or enzymatic activity in a ‘single state’ design (SSD) paradigm. Multi-specificity design (MSD), on the other hand, involves considering the stability of multiple protein states simultaneously. We have developed a novel MSD algorithm, which we refer to as REstrained CONvergence in multi-specificity design (RECON). The algorithm allows each state to adopt its own sequence throughout the design process rather than enforcing a single sequence on all states. Convergence to a single sequence is encouraged through an incrementally increasing convergence restraint for corresponding positions. Compared to MSD algorithms that enforce (constrain) an identical sequence on all states the energy landscape is simplified, which accelerates the search drastically. As a result, RECON can readily be used in simulations with a flexible protein backbone. We have benchmarked RECON on two design tasks. First, we designed antibodies derived from a common germline gene against their diverse targets to assess recovery of the germline, polyspecific sequence. Second, we design “promiscuous”, polyspecific proteins against all binding partners and measure recovery of the native sequence. We show that RECON is able to efficiently recover native-like, biologically relevant sequences in this diverse set of protein complexes. PMID:26147100
Zhao, Panpan; Zhong, Jiayong; Liu, Wanting; Zhao, Jing; Zhang, Gong
2017-12-01
Multiple search engines based on various models have been developed to search MS/MS spectra against a reference database, providing different results for the same data set. How to integrate these results efficiently with minimal compromise on false discoveries is an open question due to the lack of an independent, reliable, and highly sensitive standard. We took the advantage of the translating mRNA sequencing (RNC-seq) result as a standard to evaluate the integration strategies of the protein identifications from various search engines. We used seven mainstream search engines (Andromeda, Mascot, OMSSA, X!Tandem, pFind, InsPecT, and ProVerB) to search the same label-free MS data sets of human cell lines Hep3B, MHCCLM3, and MHCC97H from the Chinese C-HPP Consortium for Chromosomes 1, 8, and 20. As expected, the union of seven engines resulted in a boosted false identification, whereas the intersection of seven engines remarkably decreased the identification power. We found that identifications of at least two out of seven engines resulted in maximizing the protein identification power while minimizing the ratio of suspicious/translation-supported identifications (STR), as monitored by our STR index, based on RNC-Seq. Furthermore, this strategy also significantly improves the peptides coverage of the protein amino acid sequence. In summary, we demonstrated a simple strategy to significantly improve the performance for shotgun mass spectrometry by protein-level integrating multiple search engines, maximizing the utilization of the current MS spectra without additional experimental work.
Priorities and trends in the study of proteins in eye research, 1924–2014
Semba, Richard D.; Lam, Maggie; Sun, Kai; Zhang, Pingbo; Schaumberg, Debra A.; Ferrucci, Luigi; Ping, Peipei; Van Eyk, Jennifer E.
2015-01-01
Purpose To identify the proteins that are relevant to eye research and develop assays for the study of a set of these proteins. Experimental Design We conducted a bibliometric analysis by merging gene lists for human and mouse from the National Center for Biotechnology Information FTP site and combining them with PubMed references that were retrieved with the search terms “eye”[MeSH Terms] OR “eye”[All Fields] OR “eyes”[All Fields]. Results For human and mouse eye studies, respectively, the total number of publications was 13,525 and 23,895, and the total number of proteins was 4,050 and 4,717. For proteins in human and mouse eye studies, respectively, 88.7% and 81.7% had five or fewer citations. The top fifty most intensively studied proteins for human and mouse eye studies were generally in the areas of photoreceptors and phototransduction, inflammation and angiogenesis, neurodevelopment, lens transparency, and cell cycle and cellular processes. We proposed selected reaction monitoring assays that were developed in silico for the top fifty most intensively studied proteins in human and mouse eye research. Conclusions and clinical relevance We conclude that scientists engaged in eye research tend to focus on the same proteins. Newer resources and tools in proteomics can expand the investigations to lesser-known proteins of the eye. PMID:26123431
Isaza, Ramiro; Wiedner, Ellen; Hiser, Sarah; Cray, Carolyn
2014-09-01
Acute phase protein (APP) immunoassays and serum protein electrophoresis (SPEP) are assays for evaluating the inflammatory response and have use as diagnostic tools in a variety of species. Acute phase proteins are markers of inflammation that are highly conserved across different species while SPEP separates and quantifies serum protein fractions based on their physical properties. In the current study, serum samples from 35 clinically healthy Asian elephants (Elephas maximus) were analyzed using automated assays for C-reactive protein, serum amyloid A, and haptoglobin and SPEP. Robust methods were used to generate reference intervals for the APPs: C-reactive protein (1.3-12.8 mg/l), serum amyloid A (0-47.5 mg/l), and haptoglobin (0-1.10 mg/ml). In addition, SPEP was performed on these samples to establish reference intervals for each protein fraction. A combination of APPs and SPEP measurements are valuable adjunctive diagnostic tools in elephant health care. © 2014 The Author(s).
[Reference values of proteins for the Venezuelan population].
Guerra, Marisa; Hernández, María N; López, Michelle; Alfaro, María J
2013-12-01
This study presents the reference values for protein requirements. The consumption of the Venezuelan population was obtained according to the Food Consumption Monitoring Survey (ESCA) 2010-2012. The diet provided good quality proteins, combining animal and vegetable foods in an approximate ratio of 1:1. The reference values were calculated based on the safe levels of protein intake recommended by WHO/FAO/UN 2007, with an adjustment for protein supply depending on age, weight, and contribution to the caloric formula of proteins for light physical activity. The reference values for protein requirements recommended as safe levels of intake in g/kg/day are 1.14 to 1.80 for males and females less than one-year-old, from 1 to 3 years, 0.90 to 1.14; from 4 to 6 years old, 0.86 to 0.89; and from 7 to 10 years old, 0.91 to 0.92. For adolescents, the average is 0.88 and 1.07 for males and females, respectively. In adults from 20 to 59 years old, 0.83 for men and women is recommended, and for older adults, 1.00 for men and women. In pregnant women, additional consumptions are recommended according to gestation time. Adolescent pregnant women must consume additional 1.2 to 1.7 g/kg/day to normal requirement. In breastfeeding women, the values differ between the first six months postnatal period and after six months of breastfeeding. The reference values for protein in this update were lower than the values of the 2000 version.
Repair of Double-Strand Breaks by End Joining
Chiruvella, Kishore K.; Liang, Zhuobin; Wilson, Thomas E.
2013-01-01
Nonhomologous end joining (NHEJ) refers to a set of genome maintenance pathways in which two DNA double-strand break (DSB) ends are (re)joined by apposition, processing, and ligation without the use of extended homology to guide repair. Canonical NHEJ (c-NHEJ) is a well-defined pathway with clear roles in protecting the integrity of chromosomes when DSBs arise. Recent advances have revealed much about the identity, structure, and function of c-NHEJ proteins, but many questions exist regarding their concerted action in the context of chromatin. Alternative NHEJ (alt-NHEJ) refers to more recently described mechanism(s) that repair DSBs in less-efficient backup reactions. There is great interest in defining alt-NHEJ more precisely, including its regulation relative to c-NHEJ, in light of evidence that alt-NHEJ can execute chromosome rearrangements. Progress toward these goals is reviewed. PMID:23637284
Elastic network model of learned maintained contacts to predict protein motion
Putz, Ines
2017-01-01
We present a novel elastic network model, lmcENM, to determine protein motion even for localized functional motions that involve substantial changes in the protein’s contact topology. Existing elastic network models assume that the contact topology remains unchanged throughout the motion and are thus most appropriate to simulate highly collective function-related movements. lmcENM uses machine learning to differentiate breaking from maintained contacts. We show that lmcENM accurately captures functional transitions unexplained by the classical ENM and three reference ENM variants, while preserving the simplicity of classical ENM. We demonstrate the effectiveness of our approach on a large set of proteins covering different motion types. Our results suggest that accurately predicting a “deformation-invariant” contact topology offers a promising route to increase the general applicability of ENMs. We also find that to correctly predict this contact topology a combination of several features seems to be relevant which may vary slightly depending on the protein. Additionally, we present case studies of two biologically interesting systems, Ferric Citrate membrane transporter FecA and Arachidonate 15-Lipoxygenase. PMID:28854238
Bagchi, Torit Baran; Sharma, Srigopal; Chattopadhyay, Krishnendu
2016-01-15
With the escalating persuasion of economic and nutritional importance of rice grain protein and nutritional components of rice bran (RB), NIRS can be an effective tool for high throughput screening in rice breeding programme. Optimization of NIRS is prerequisite for accurate prediction of grain quality parameters. In the present study, 173 brown rice (BR) and 86 RB samples with a wide range of values were used to compare the calibration models generated by different chemometrics for grain protein (GPC) and amylose content (AC) of BR and proximate compositions (protein, crude oil, moisture, ash and fiber content) of RB. Various modified partial least square (mPLSs) models corresponding with the best mathematical treatments were identified for all components. Another set of 29 genotypes derived from the breeding programme were employed for the external validation of these calibration models. High accuracy of all these calibration and prediction models was ensured through pair t-test and correlation regression analysis between reference and predicted values. Copyright © 2015 Elsevier Ltd. All rights reserved.
Ghafouri, Bijar; Carlsson, Anders; Holmberg, Sara; Thelin, Anders; Tagesson, Christer
2016-05-10
Farmers have an increased risk for musculoskeletal disorders (MSD) such as osteoarthritis of the hip, low back pain, and neck and upper limb complaints. The underlying mechanisms are not fully understood. Work-related exposures and inflammatory responses might be involved. Our objective was to identify plasma proteins that differentiated farmers with MSD from rural referents. Plasma samples from 13 farmers with MSD and rural referents were included in the investigation. Gel based proteomics was used for protein analysis and proteins that differed significantly between the groups were identified by mass spectrometry. In total, 15 proteins differed significantly between the groups. The levels of leucine-rich alpha-2-glycoprotein, haptoglobin, complement factor B, serotransferrin, one isoform of kininogen, one isoform of alpha-1-antitrypsin, and two isoforms of hemopexin were higher in farmers with MSD than in referents. On the other hand, the levels of alpha-2-HS-glycoprotein, alpha-1B-glycoprotein, vitamin D- binding protein, apolipoprotein A1, antithrombin, one isoform of kininogen, and one isoform of alpha-1-antitrypsin were lower in farmers than in referents. Many of the identified proteins are known to be involved in inflammation. Farmers with MSD had altered plasma levels of protein biomarkers compared to the referents, indicating that farmers with MSD may be subject to a more systemic inflammation. It is possible that the identified differences of proteins may give clues to the biochemical changes occurring during the development and progression of MSD in farmers, and that one or several of these protein biomarkers might eventually be used to identify and prevent work-related MSD.
Solernou, Albert; Hanson, Benjamin S; Richardson, Robin A; Welch, Robert; Read, Daniel J; Harlen, Oliver G; Harris, Sarah A
2018-03-01
Fluctuating Finite Element Analysis (FFEA) is a software package designed to perform continuum mechanics simulations of proteins and other globular macromolecules. It combines conventional finite element methods with stochastic thermal noise, and is appropriate for simulations of large proteins and protein complexes at the mesoscale (length-scales in the range of 5 nm to 1 μm), where there is currently a paucity of modelling tools. It requires 3D volumetric information as input, which can be low resolution structural information such as cryo-electron tomography (cryo-ET) maps or much higher resolution atomistic co-ordinates from which volumetric information can be extracted. In this article we introduce our open source software package for performing FFEA simulations which we have released under a GPLv3 license. The software package includes a C ++ implementation of FFEA, together with tools to assist the user to set up the system from Electron Microscopy Data Bank (EMDB) or Protein Data Bank (PDB) data files. We also provide a PyMOL plugin to perform basic visualisation and additional Python tools for the analysis of FFEA simulation trajectories. This manuscript provides a basic background to the FFEA method, describing the implementation of the core mechanical model and how intermolecular interactions and the solvent environment are included within this framework. We provide prospective FFEA users with a practical overview of how to set up an FFEA simulation with reference to our publicly available online tutorials and manuals that accompany this first release of the package.
Pérez-Pérez, Rafael; López, Juan A.; García-Santos, Eva; Camafeita, Emilio; Gómez-Serrano, María; Ortega-Delgado, Francisco J.; Ricart, Wifredo; Fernández-Real, José M.; Peral, Belén
2012-01-01
Background Protein expression studies based on the two major intra-abdominal human fat depots, the subcutaneous and the omental fat, can shed light into the mechanisms involved in obesity and its co-morbidities. Here we address, for the first time, the identification and validation of reference proteins for data standardization, which are essential for accurate comparison of protein levels in expression studies based on fat from obese and non-obese individuals. Methodology and Findings To uncover adipose tissue proteins equally expressed either in omental and subcutaneous fat depots (study 1) or in omental fat from non-obese and obese individuals (study 2), we have reanalyzed our previously published data based on two-dimensional fluorescence difference gel electrophoresis. Twenty-four proteins (12 in study 1 and 12 in study 2) with similar expression levels in all conditions tested were selected and identified by mass spectrometry. Immunoblotting analysis was used to confirm in adipose tissue the expression pattern of the potential reference proteins and three proteins were validated: PARK7, ENOA and FAA. Western Blot analysis was also used to test customary loading control proteins. ENOA, PARK7 and the customary loading control protein Beta-actin showed steady expression profiles in fat from non-obese and obese individuals, whilst FAA maintained steady expression levels across paired omental and subcutaneous fat samples. Conclusions ENOA, PARK7 and Beta-actin are proper reference standards in obesity studies based on omental fat, whilst FAA is the best loading control for the comparative analysis of omental and subcutaneous adipose tissues either in obese and non-obese subjects. Neither customary loading control proteins GAPDH and TBB5 nor CALX are adequate standards in differential expression studies on adipose tissue. The use of the proposed reference proteins will facilitate the adequate analysis of proteins differentially expressed in the context of obesity, an aim difficult to achieve before this study. PMID:22272336
Galkin, O Yu; Besarab, A B; Lutsenko, T N
2017-01-01
The goal of this work was to study sensitivity and specificity of the developed ELISA set for the identification of IgG antibodies against Chlamydia trachomatis HSP-60 (using biotinylated tyramine-based signal amplification system). The study was conducted using a panel of characterized sera, as well as two reference ELISA sets of similar purpose. According to the results of ELISA informative value parameters, the ELISA we have developed showed the highest specificity and sensitivity parameters (no false negative or false positive results were registered). In 4 out of 15 intralaboratory panel serum samples initially identified as negative, anti-HSP-60 IgG-antibodies test result in reference ELISA sets upon dilution changed from negative to positive. The nature of titration curves of false negative sera and commercial monoclonal antibodies А57-В9 against C. trachomatis HSP-60 after incubation for 24 h was indicative of the presence of anti-idiotypic antibodies in these samples. Upon sera dilution, idiotypic-anti-idiotypic complexes dissociated, which caused the change of test result. High informative value of the developed ELISA set for identification of IgG antibodies against C. trachomatis HSP-60 has been proven. Anti-idiotypic antibodies possessing C. trachomatis anti-HSP-60 activity and being one of the causes of false negative results of the relevant ELISA-based tests have been identified in blood sera of individuals infected with chlamydial genitourinary infection agents.
Duffy, Fergal J; O'Donovan, Darragh; Devocelle, Marc; Moran, Niamh; O'Connell, David J; Shields, Denis C
2015-03-23
Protein-protein and protein-peptide interactions are responsible for the vast majority of biological functions in vivo, but targeting these interactions with small molecules has historically been difficult. What is required are efficient combined computational and experimental screening methods to choose among a number of potential protein interfaces worthy of targeting lead macrocyclic compounds for further investigation. To achieve this, we have generated combinatorial 3D virtual libraries of short disulfide-bonded peptides and compared them to pharmacophore models of important protein-protein and protein-peptide structures, including short linear motifs (SLiMs), protein-binding peptides, and turn structures at protein-protein interfaces, built from 3D models available in the Protein Data Bank. We prepared a total of 372 reference pharmacophores, which were matched against 108,659 multiconformer cyclic peptides. After normalization to exclude nonspecific cyclic peptides, the top hits notably are enriched for mimetics of turn structures, including a turn at the interaction surface of human α thrombin, and also feature several protein-binding peptides. The top cyclic peptide hits also cover the critical "hot spot" interaction sites predicted from the interaction crystal structure. We have validated our method by testing cyclic peptides predicted to inhibit thrombin, a key protein in the blood coagulation pathway of important therapeutic interest, identifying a cyclic peptide inhibitor with lead-like activity. We conclude that protein interfaces most readily targetable by cyclic peptides and related macrocyclic drugs may be identified computationally among a set of candidate interfaces, accelerating the choice of interfaces against which lead compounds may be screened.
Kniskern, Megan A; Johnston, Carol S
2011-06-01
The health benefits of vegetarian diets are well-recognized; however, long-term adherence to these diets may be associated with nutrient inadequacies, particularly vitamins B12 and D, calcium, iron, zinc, and protein. The dietary reference intakes (DRIs) expert panels recommended adjustments to the iron, zinc, and calcium DRIs for vegetarians to account for decreased bioavailability, but no adjustments were considered necessary for the protein DRI under the assumption that vegetarians consume about 50% of protein from animal (dairy/egg) sources. This study examined dietary protein sources in a convenience sample of 21 young adult vegetarian women who completed food logs on 4 consecutive days (3 weekdays and 1 weekend day). The daily contribution percentages of protein consumed from cereals, legumes, nuts/seeds, fruits/vegetables, and dairy/egg were computed, and the protein digestibility corrected amino acid score of the daily diets was calculated. The calculated total dietary protein digestibility score for participants was 82 ± 1%, which differed significantly (P < 0.001) from the DRI reference score, 88%, and the 4-d average protein digestibility corrected amino acid score for the sample was 80 ± 2%, which also differed significantly (P < 0.001) from the DRI reference value, 100%. The analyses indicated that animal protein accounted for only 21% of dietary protein. This research suggests that the protein DRI for vegetarians consuming less than the expected amounts of animal protein (45% to 50% of total protein) may need to be adjusted from 0.8 to about 1.0 g/kg to account for decreased protein bioavailability. Copyright © 2011 Elsevier Inc. All rights reserved.
Rice proteome analysis: a step toward functional analysis of the rice genome.
Komatsu, Setsuko; Tanaka, Naoki
2005-03-01
The technique of proteome analysis using 2-DE has the power to monitor global changes that occur in the protein complement of tissues and subcellular compartments. In this review, we describe construction of the rice proteome database, the cataloging of rice proteins, and the functional characterization of some of the proteins identified. Initially, proteins extracted from various tissues and organelles were separated by 2-DE and an image analyzer was used to construct a display or reference map of the proteins. The rice proteome database currently contains 23 reference maps based on 2-DE of proteins from different rice tissues and subcellular compartments. These reference maps comprise 13 129 rice proteins, and the amino acid sequences of 5092 of these proteins are entered in the database. Major proteins involved in growth or stress responses have been identified by using a proteomics approach and some of these proteins have unique functions. Furthermore, initial work has also begun on analyzing the phosphoproteome and protein-protein interactions in rice. The information obtained from the rice proteome database will aid in the molecular cloning of rice genes and in predicting the function of unknown proteins.
Hoofnagle, Andrew N; Whiteaker, Jeffrey R; Carr, Steven A; Kuhn, Eric; Liu, Tao; Massoni, Sam A; Thomas, Stefani N; Townsend, R Reid; Zimmerman, Lisa J; Boja, Emily; Chen, Jing; Crimmins, Daniel L; Davies, Sherri R; Gao, Yuqian; Hiltke, Tara R; Ketchum, Karen A; Kinsinger, Christopher R; Mesri, Mehdi; Meyer, Matthew R; Qian, Wei-Jun; Schoenherr, Regine M; Scott, Mitchell G; Shi, Tujin; Whiteley, Gordon R; Wrobel, John A; Wu, Chaochao; Ackermann, Brad L; Aebersold, Ruedi; Barnidge, David R; Bunk, David M; Clarke, Nigel; Fishman, Jordan B; Grant, Russ P; Kusebauch, Ulrike; Kushnir, Mark M; Lowenthal, Mark S; Moritz, Robert L; Neubert, Hendrik; Patterson, Scott D; Rockwood, Alan L; Rogers, John; Singh, Ravinder J; Van Eyk, Jennifer E; Wong, Steven H; Zhang, Shucha; Chan, Daniel W; Chen, Xian; Ellis, Matthew J; Liebler, Daniel C; Rodland, Karin D; Rodriguez, Henry; Smith, Richard D; Zhang, Zhen; Zhang, Hui; Paulovich, Amanda G
2016-01-01
For many years, basic and clinical researchers have taken advantage of the analytical sensitivity and specificity afforded by mass spectrometry in the measurement of proteins. Clinical laboratories are now beginning to deploy these work flows as well. For assays that use proteolysis to generate peptides for protein quantification and characterization, synthetic stable isotope-labeled internal standard peptides are of central importance. No general recommendations are currently available surrounding the use of peptides in protein mass spectrometric assays. The Clinical Proteomic Tumor Analysis Consortium of the National Cancer Institute has collaborated with clinical laboratorians, peptide manufacturers, metrologists, representatives of the pharmaceutical industry, and other professionals to develop a consensus set of recommendations for peptide procurement, characterization, storage, and handling, as well as approaches to the interpretation of the data generated by mass spectrometric protein assays. Additionally, the importance of carefully characterized reference materials-in particular, peptide standards for the improved concordance of amino acid analysis methods across the industry-is highlighted. The alignment of practices around the use of peptides and the transparency of sample preparation protocols should allow for the harmonization of peptide and protein quantification in research and clinical care. © 2015 American Association for Clinical Chemistry.
Maximum likelihood density modification by pattern recognition of structural motifs
Terwilliger, Thomas C.
2004-04-13
An electron density for a crystallographic structure having protein regions and solvent regions is improved by maximizing the log likelihood of a set of structures factors {F.sub.h } using a local log-likelihood function: (x)+p(.rho.(x).vertline.SOLV)p.sub.SOLV (x)+p(.rho.(x).vertline.H)p.sub.H (x)], where p.sub.PROT (x) is the probability that x is in the protein region, p(.rho.(x).vertline.PROT) is the conditional probability for .rho.(x) given that x is in the protein region, and p.sub.SOLV (x) and p(.rho.(x).vertline.SOLV) are the corresponding quantities for the solvent region, p.sub.H (x) refers to the probability that there is a structural motif at a known location, with a known orientation, in the vicinity of the point x; and p(.rho.(x).vertline.H) is the probability distribution for electron density at this point given that the structural motif actually is present. One appropriate structural motif is a helical structure within the crystallographic structure.
Jadhav, Aparna; Dash, RadhaCharan; Hirwani, Raj; Abdin, Malik
2018-03-01
Despite the wide medical importance of serine protease inhibitors, many of kazal type proteins are still to be explored. These thrombin inhibiting proteins are found in the digestive system of hematophagous organisms mainly Arthropods. We studied one of such protein i.e. Kazal type-1 protein from sand-fly Phlebotomus papatasi as its structure and interaction with thrombin is unclear. Initially, Dipetalin a kazal-follistasin domain protein was run through PSI-BLAST to retrieve related sequences. Using this set of sequence a phylogenetic tree was constructed, which identified a distantly related kazal type-1 protein. A three-dimensional structure was predicted for this protein and was aligned with Rhodniin for further evaluation. To have a comparative understanding of it's binding at the thrombin active site, the aligned kazal model-thrombin and rhodniin-thrombin complexes were subjected to molecular dynamics simulations. Dynamics analysis with reference to main chain RMSD, H-chain residue RMSF and total energy showed rhodniin-thrombin complex as a more stable system. Further, the MM/GBSA method was applied that calculated the binding free energy (ΔG binding ) for rhodniin and kazal model as -220.32kcal/Mol and -90.70kcal/Mol, respectively. Thus, it shows that kazal model has weaker bonding with thrombin, unlike rhodniin. Copyright © 2017 Elsevier B.V. All rights reserved.
Automatic classification of protein structures relying on similarities between alignments
2012-01-01
Background Identification of protein structural cores requires isolation of sets of proteins all sharing a same subset of structural motifs. In the context of an ever growing number of available 3D protein structures, standard and automatic clustering algorithms require adaptations so as to allow for efficient identification of such sets of proteins. Results When considering a pair of 3D structures, they are stated as similar or not according to the local similarities of their matching substructures in a structural alignment. This binary relation can be represented in a graph of similarities where a node represents a 3D protein structure and an edge states that two 3D protein structures are similar. Therefore, classifying proteins into structural families can be viewed as a graph clustering task. Unfortunately, because such a graph encodes only pairwise similarity information, clustering algorithms may include in the same cluster a subset of 3D structures that do not share a common substructure. In order to overcome this drawback we first define a ternary similarity on a triple of 3D structures as a constraint to be satisfied by the graph of similarities. Such a ternary constraint takes into account similarities between pairwise alignments, so as to ensure that the three involved protein structures do have some common substructure. We propose hereunder a modification algorithm that eliminates edges from the original graph of similarities and gives a reduced graph in which no ternary constraints are violated. Our approach is then first to build a graph of similarities, then to reduce the graph according to the modification algorithm, and finally to apply to the reduced graph a standard graph clustering algorithm. Such method was used for classifying ASTRAL-40 non-redundant protein domains, identifying significant pairwise similarities with Yakusa, a program devised for rapid 3D structure alignments. Conclusions We show that filtering similarities prior to standard graph based clustering process by applying ternary similarity constraints i) improves the separation of proteins of different classes and consequently ii) improves the classification quality of standard graph based clustering algorithms according to the reference classification SCOP. PMID:22974051
Atomic interaction networks in the core of protein domains and their native folds.
Soundararajan, Venkataramanan; Raman, Rahul; Raguram, S; Sasisekharan, V; Sasisekharan, Ram
2010-02-23
Vastly divergent sequences populate a majority of protein folds. In the quest to identify features that are conserved within protein domains belonging to the same fold, we set out to examine the entire protein universe on a fold-by-fold basis. We report that the atomic interaction network in the solvent-unexposed core of protein domains are fold-conserved, extraordinary sequence divergence notwithstanding. Further, we find that this feature, termed protein core atomic interaction network (or PCAIN) is significantly distinguishable across different folds, thus appearing to be "signature" of a domain's native fold. As part of this study, we computed the PCAINs for 8698 representative protein domains from families across the 1018 known protein folds to construct our seed database and an automated framework was developed for PCAIN-based characterization of the protein fold universe. A test set of randomly selected domains that are not in the seed database was classified with over 97% accuracy, independent of sequence divergence. As an application of this novel fold signature, a PCAIN-based scoring scheme was developed for comparative (homology-based) structure prediction, with 1-2 angstroms (mean 1.61A) C(alpha) RMSD generally observed between computed structures and reference crystal structures. Our results are consistent across the full spectrum of test domains including those from recent CASP experiments and most notably in the 'twilight' and 'midnight' zones wherein <30% and <10% target-template sequence identity prevails (mean twilight RMSD of 1.69A). We further demonstrate the utility of the PCAIN protocol to derive biological insight into protein structure-function relationships, by modeling the structure of the YopM effector novel E3 ligase (NEL) domain from plague-causative bacterium Yersinia Pestis and discussing its implications for host adaptive and innate immune modulation by the pathogen. Considering the several high-throughput, sequence-identity-independent applications demonstrated in this work, we suggest that the PCAIN is a fundamental fold feature that could be a valuable addition to the arsenal of protein modeling and analysis tools.
Atomic Interaction Networks in the Core of Protein Domains and Their Native Folds
Soundararajan, Venkataramanan; Raman, Rahul; Raguram, S.; Sasisekharan, V.; Sasisekharan, Ram
2010-01-01
Vastly divergent sequences populate a majority of protein folds. In the quest to identify features that are conserved within protein domains belonging to the same fold, we set out to examine the entire protein universe on a fold-by-fold basis. We report that the atomic interaction network in the solvent-unexposed core of protein domains are fold-conserved, extraordinary sequence divergence notwithstanding. Further, we find that this feature, termed protein core atomic interaction network (or PCAIN) is significantly distinguishable across different folds, thus appearing to be “signature” of a domain's native fold. As part of this study, we computed the PCAINs for 8698 representative protein domains from families across the 1018 known protein folds to construct our seed database and an automated framework was developed for PCAIN-based characterization of the protein fold universe. A test set of randomly selected domains that are not in the seed database was classified with over 97% accuracy, independent of sequence divergence. As an application of this novel fold signature, a PCAIN-based scoring scheme was developed for comparative (homology-based) structure prediction, with 1–2 angstroms (mean 1.61A) Cα RMSD generally observed between computed structures and reference crystal structures. Our results are consistent across the full spectrum of test domains including those from recent CASP experiments and most notably in the ‘twilight’ and ‘midnight’ zones wherein <30% and <10% target-template sequence identity prevails (mean twilight RMSD of 1.69A). We further demonstrate the utility of the PCAIN protocol to derive biological insight into protein structure-function relationships, by modeling the structure of the YopM effector novel E3 ligase (NEL) domain from plague-causative bacterium Yersinia Pestis and discussing its implications for host adaptive and innate immune modulation by the pathogen. Considering the several high-throughput, sequence-identity-independent applications demonstrated in this work, we suggest that the PCAIN is a fundamental fold feature that could be a valuable addition to the arsenal of protein modeling and analysis tools. PMID:20186337
Prediction of beta-turns from amino acid sequences using the residue-coupled model.
Guruprasad, K; Shukla, S
2003-04-01
We evaluated the prediction of beta-turns from amino acid sequences using the residue-coupled model with an enlarged representative protein data set selected from the Protein Data Bank. Our results show that the probability values derived from a data set comprising 425 protein chains yielded an overall beta-turn prediction accuracy 68.74%, compared with 94.7% reported earlier on a data set of 30 proteins using the same method. However, we noted that the overall beta-turn prediction accuracy using probability values derived from the 30-protein data set reduces to 40.74% when tested on the data set comprising 425 protein chains. In contrast, using probability values derived from the 425 data set used in this analysis, the overall beta-turn prediction accuracy yielded consistent results when tested on either the 30-protein data set (64.62%) used earlier or a more recent representative data set comprising 619 protein chains (64.66%) or on a jackknife data set comprising 476 representative protein chains (63.38%). We therefore recommend the use of probability values derived from the 425 representative protein chains data set reported here, which gives more realistic and consistent predictions of beta-turns from amino acid sequences.
Blass, Sandra C; Goost, Hans; Burger, Christof; Tolba, René H; Stoffel-Wagner, Birgit; Stehle, Peter; Ellinger, Sabine
2013-12-05
Disorders in wound healing (DWH) are common in trauma patients, the reasons being not completely understood. Inadequate nutritional status may favor DWH, partly by means of oxidative stress. Reliable data, however, are lacking. This study should investigate the status of extracellular micronutrients in patients with DWH within routine setting. Within a cross-sectional study, the plasma/serum status of several micronutrients (retinol, ascorbic acid, 25-hydroxycholecalciferol, α-tocopherol, β-carotene, selenium, and zinc) were determined in 44 trauma patients with DWH in addition to selected proteins (albumin, prealbumin, and C-reactive protein; CRP) and markers of pro-/antioxidant balance (antioxidant capacity, peroxides, and malondialdehyde). Values were compared to reference values to calculate the prevalence for biochemical deficiency. Correlations between CRP, albumin and prealbumin, and selected micronutrients were analyzed by Pearson's test. Statistical significance was set at P < 0.05. Mean concentrations of ascorbic acid (23.1 ± 15.9 μmol/L), 25-hydroxycholecalciferol (46.2±30.6 nmol/L), β-carotene (0.6 ± 0.4 μmol/L), selenium (0.79±0.19 μmol/L), and prealbumin (24.8 ± 8.2 mg/dL) were relatively low. Most patients showed levels of ascorbic acid (<28 μmol/L; 64%), 25-hydroxycholecalciferol (<50 μmol/L; 59%), selenium (≤ 94 μmol/L; 71%) and β-carotene (<0.9 μmol/L; 86%) below the reference range. Albumin and prealbumin were in the lower normal range and CRP was mostly above the reference range. Plasma antioxidant capacity was decreased, whereas peroxides and malondialdehyde were increased compared to normal values. Inverse correlations were found between CRP and albumin (P < 0.05) and between CRP and prealbumin (P < 0.01). Retinol (P < 0.001), ascorbic acid (P < 0.01), zinc (P < 0.001), and selenium (P < 0.001) were negatively correlated with CRP. Trauma patients with DWH frequently suffer from protein malnutrition and reduced plasma concentrations of several micronutrients probably due to inflammation, increased requirement, and oxidative burden. Thus, adequate nutritional measures are strongly recommended to trauma patients.
Development of proteome-wide binding reagents for research and diagnostics.
Taussig, Michael J; Schmidt, Ronny; Cook, Elizabeth A; Stoevesandt, Oda
2013-12-01
Alongside MS, antibodies and other specific protein-binding molecules have a special place in proteomics as affinity reagents in a toolbox of applications for determining protein location, quantitative distribution and function (affinity proteomics). The realisation that the range of research antibodies available, while apparently vast is nevertheless still very incomplete and frequently of uncertain quality, has stimulated projects with an objective of raising comprehensive, proteome-wide sets of protein binders. With progress in automation and throughput, a remarkable number of recent publications refer to the practical possibility of selecting binders to every protein encoded in the genome. Here we review the requirements of a pipeline of production of protein binders for the human proteome, including target prioritisation, antigen design, 'next generation' methods, databases and the approaches taken by ongoing projects in Europe and the USA. While the task of generating affinity reagents for all human proteins is complex and demanding, the benefits of well-characterised and quality-controlled pan-proteome binder resources for biomedical research, industry and life sciences in general would be enormous and justify the effort. Given the technical, personnel and financial resources needed to fulfil this aim, expansion of current efforts may best be addressed through large-scale international collaboration. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Combinatorial Labeling Method for Improving Peptide Fragmentation in Mass Spectrometry
NASA Astrophysics Data System (ADS)
Kuchibhotla, Bhanuramanand; Kola, Sankara Rao; Medicherla, Jagannadham V.; Cherukuvada, Swamy V.; Dhople, Vishnu M.; Nalam, Madhusudhana Rao
2017-06-01
Annotation of peptide sequence from tandem mass spectra constitutes the central step of mass spectrometry-based proteomics. Peptide mass spectra are obtained upon gas-phase fragmentation. Identification of the protein from a set of experimental peptide spectral matches is usually referred as protein inference. Occurrence and intensity of these fragment ions in the MS/MS spectra are dependent on many factors such as amino acid composition, peptide basicity, activation mode, protease, etc. Particularly, chemical derivatizations of peptides were known to alter their fragmentation. In this study, the influence of acetylation, guanidinylation, and their combination on peptide fragmentation was assessed initially on a lipase (LipA) from Bacillus subtilis followed by a bovine six protein mix digest. The dual modification resulted in improved fragment ion occurrence and intensity changes, and this resulted in the equivalent representation of b- and y-type fragment ions in an ion trap MS/MS spectrum. The improved representation has allowed us to accurately annotate the peptide sequences de novo. Dual labeling has significantly reduced the false positive protein identifications in standard bovine six peptide digest. Our study suggests that the combinatorial labeling of peptides is a useful method to validate protein identifications for high confidence protein inference. [Figure not available: see fulltext.
Hector, Amy J; Phillips, Stuart M
2018-03-01
There exists a large body of scientific evidence to support protein intakes in excess of the recommended dietary allowance (RDA) (0.8 g protein/kg/day) to promote the retention of skeletal muscle and loss of adipose tissue during dietary energy restriction. Diet-induced weight loss with as low as possible ratio of skeletal muscle to fat mass loss is a situation we refer to as high-quality weight loss. We propose that high-quality weight loss is often of importance to elite athletes in order to maintain their muscle (engine) and shed unwanted fat mass, potentially improving athletic performance. Current recommendations for protein intakes during weight loss in athletes are set at 1.6-2.4 g protein/kg/day. However, the severity of the caloric deficit and type and intensity of training performed by the athlete will influence at what end of this range athletes choose to be. Other considerations regarding protein intake that may help elite athletes achieve weight loss goals include the quality of protein consumed, and the timing and distribution of protein intake throughout the day. This review highlights the scientific evidence used to support protein recommendations for high-quality weight loss and preservation of performance in athletes. Additionally, the current knowledge surrounding the use of protein supplements, branched chain amino acids (BCAA), β-hydroxy β-methylbutyrate (HMB), and other dietary supplements with weight loss claims will be discussed.
Cai, Jing; Li, Pengfei; Luo, Xiao; Chang, Tianliang; Li, Jiaxing; Zhao, Yuwei; Xu, Yao
2018-01-01
Hulless barley (Hordeum vulgare L. var. nudum. hook. f.) has been cultivated as a major crop in the Qinghai-Tibet plateau of China for thousands of years. Compared to other cereal crops, the Tibetan hulless barley has developed stronger endogenous resistances to survive in the severe environment of its habitat. To understand the unique resistant mechanisms of this plant, detailed genetic studies need to be performed. The quantitative real-time reverse transcription-polymerase chain reaction (qRT-PCR) is the most commonly used method in detecting gene expression. However, the selection of stable reference genes under limited experimental conditions was considered to be an essential step for obtaining accurate results in qRT-PCR. In this study, 10 candidate reference genes-ACT (Actin), E2 (Ubiquitin conjugating enzyme 2), TUBα (Alpha-tubulin), TUBβ6 (Beta-tubulin 6), GAPDH (Glyceraldehyde 3-phosphate dehydrogenase), EF-1α (Elongation factor 1-alpha), SAMDC (S-adenosylmethionine decarboxylase), PKABA1 (Gene for protein kinase HvPKABA1), PGK (Phosphoglycerate kinase), and HSP90 (Heat shock protein 90)-were selected from the NCBI gene database of barley. Following qRT-PCR amplifications of all candidate reference genes in Tibetan hulless barley seedlings under various stressed conditions, the stabilities of these candidates were analyzed by three individual software packages including geNorm, NormFinder, and BestKeeper. The results demonstrated that TUBβ6, E2, TUBα, and HSP90 were generally the most suitable sets under all tested conditions; similarly, TUBα and HSP90 showed peak stability under salt stress, TUBα and EF-1α were the most suitable reference genes under cold stress, and ACT and E2 were the most stable under drought stress. Finally, a known circadian gene CCA1 was used to verify the service ability of chosen reference genes. The results confirmed that all recommended reference genes by the three software were suitable for gene expression analysis under tested stress conditions by the qRT-PCR method.
O'Leary, Nuala A; Wright, Mathew W; Brister, J Rodney; Ciufo, Stacy; Haddad, Diana; McVeigh, Rich; Rajput, Bhanu; Robbertse, Barbara; Smith-White, Brian; Ako-Adjei, Danso; Astashyn, Alexander; Badretdin, Azat; Bao, Yiming; Blinkova, Olga; Brover, Vyacheslav; Chetvernin, Vyacheslav; Choi, Jinna; Cox, Eric; Ermolaeva, Olga; Farrell, Catherine M; Goldfarb, Tamara; Gupta, Tripti; Haft, Daniel; Hatcher, Eneida; Hlavina, Wratko; Joardar, Vinita S; Kodali, Vamsi K; Li, Wenjun; Maglott, Donna; Masterson, Patrick; McGarvey, Kelly M; Murphy, Michael R; O'Neill, Kathleen; Pujar, Shashikant; Rangwala, Sanjida H; Rausch, Daniel; Riddick, Lillian D; Schoch, Conrad; Shkeda, Andrei; Storz, Susan S; Sun, Hanzhen; Thibaud-Nissen, Francoise; Tolstoy, Igor; Tully, Raymond E; Vatsan, Anjana R; Wallin, Craig; Webb, David; Wu, Wendy; Landrum, Melissa J; Kimchi, Avi; Tatusova, Tatiana; DiCuccio, Michael; Kitts, Paul; Murphy, Terence D; Pruitt, Kim D
2016-01-04
The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55,000 organisms (>4800 viruses, >40,000 prokaryotes and >10,000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management. Published by Oxford University Press on behalf of Nucleic Acids Research 2015. This work is written by (a) US Government employee(s) and is in the public domain in the US.
NASA Astrophysics Data System (ADS)
Yan, Lulu; Su, Jiaqi; Wang, Zhaoping; Yan, Xiwu; Yu, Ruihai
2017-12-01
Quantitative real-time polymerase chain reaction (qRT-PCR) is a rapid and reliable technique which has been widely used to quantifying gene transcripts (expression analysis). It is also employed for studying heterosis, hybridization breeding and hybrid tolerability of oysters, an ecologically and economically important taxonomic group. For these studies, selection of a suitable set of housekeeping genes as references is crucial for correct interpretation of qRT-PCR data. To identify suitable reference genes for oysters during low temperature and low salinity stresses, we analyzed twelve genes from the gill tissue of Crassostrea sikamea (SS), Crassostrea angulata (AA) and their hybrid (SA), which included three ribosomal genes, 28S ribosomal protein S5 ( RPS5), ribosomal protein L35 ( RPL35), and 60S ribosomal protein L29 ( RPL29); three structural genes, tubulin gamma ( TUBγ), annexin A6 and A7 ( AA6 and AA7); three metabolic pathway genes, ornithine decarboxylase ( OD), glyceraldehyde-3-phosphate dehydrogenase ( GAPDH) and glutathione S-transferase P1 ( GSP); two transcription factors, elongation factor 1 alpha and beta ( EF1α and EF1β); and one protein synthesis gene (ubiquitin ( UBQ). Primers specific for these genes were successfully developed for the three groups of oysters. Three different algorithms, geNorm, NormFinder and BestKeeper, were used to evaluate the expression stability of these candidate genes. BestKeeper program was found to be the most reliable. Based on our analysis, we found that the expression of RPL35 and EF1α was stable under low salinity stress, and the expression of OD, GAPDH and EF1α was stable under low temperature stress in hybrid (SA) oyster; the expression of RPS5 and GAPDH was stable under low salinity stress, and the expression of RPS5, UBQ, GAPDH was stable under low temperature stress in SS oyster; the expression of RPS5, GAPDH, EF1β and AA7 was stable under low salinity stress, and the expression of RPL35, EF1α, GAPDH and EF1β was stable under low temperature stress in AA oyster. Furthermore, to evaluate their suitability, the reference genes were used to quantify six target genes. In conclusion, we have successfully developed primers appropriate for the expression analysis in SS, SA and AA.
Yang, Chunxiao; Li, Hui; Pan, Huipeng; Ma, Yabin; Zhang, Deyong; Liu, Yong; Zhang, Zhanhong; Zheng, Changying; Chu, Dong
2015-01-01
Reverse transcriptase-quantitative polymerase chain reaction (RT-qPCR) is a reliable technique for measuring and evaluating gene expression during variable biological processes. To facilitate gene expression studies, normalization of genes of interest relative to stable reference genes is crucial. The western flower thrips Frankliniella occidentalis (Pergande) (Thysanoptera: Thripidae), the main vector of tomato spotted wilt virus (TSWV), is a destructive invasive species. In this study, the expression profiles of 11 candidate reference genes from nonviruliferous and viruliferous F. occidentalis were investigated. Five distinct algorithms, geNorm, NormFinder, BestKeeper, the ΔCt method, and RefFinder, were used to determine the performance of these genes. geNorm, NormFinder, BestKeeper, and RefFinder identified heat shock protein 70 (HSP70), heat shock protein 60 (HSP60), elongation factor 1 α, and ribosomal protein l32 (RPL32) as the most stable reference genes, and the ΔCt method identified HSP60, HSP70, RPL32, and heat shock protein 90 as the most stable reference genes. Additionally, two reference genes were sufficient for reliable normalization in nonviruliferous and viruliferous F. occidentalis. This work provides a foundation for investigating the molecular mechanisms of TSWV and F. occidentalis interactions.
Protein structure based prediction of catalytic residues.
Fajardo, J Eduardo; Fiser, Andras
2013-02-22
Worldwide structural genomics projects continue to release new protein structures at an unprecedented pace, so far nearly 6000, but only about 60% of these proteins have any sort of functional annotation. We explored a range of features that can be used for the prediction of functional residues given a known three-dimensional structure. These features include various centrality measures of nodes in graphs of interacting residues: closeness, betweenness and page-rank centrality. We also analyzed the distance of functional amino acids to the general center of mass (GCM) of the structure, relative solvent accessibility (RSA), and the use of relative entropy as a measure of sequence conservation. From the selected features, neural networks were trained to identify catalytic residues. We found that using distance to the GCM together with amino acid type provide a good discriminant function, when combined independently with sequence conservation. Using an independent test set of 29 annotated protein structures, the method returned 411 of the initial 9262 residues as the most likely to be involved in function. The output 411 residues contain 70 of the annotated 111 catalytic residues. This represents an approximately 14-fold enrichment of catalytic residues on the entire input set (corresponding to a sensitivity of 63% and a precision of 17%), a performance competitive with that of other state-of-the-art methods. We found that several of the graph based measures utilize the same underlying feature of protein structures, which can be simply and more effectively captured with the distance to GCM definition. This also has the added the advantage of simplicity and easy implementation. Meanwhile sequence conservation remains by far the most influential feature in identifying functional residues. We also found that due the rapid changes in size and composition of sequence databases, conservation calculations must be recalibrated for specific reference databases.
Orellana, Luis H.; Rodriguez-R, Luis M.; Konstantinidis, Konstantinos T.
2016-10-07
Functional annotation of metagenomic and metatranscriptomic data sets relies on similarity searches based on e-value thresholds resulting in an unknown number of false positive and negative matches. To overcome these limitations, we introduce ROCker, aimed at identifying position-specific, most-discriminant thresholds in sliding windows along the sequence of a target protein, accounting for non-discriminative domains shared by unrelated proteins. ROCker employs the receiver operating characteristic (ROC) curve to minimize false discovery rate (FDR) and calculate the best thresholds based on how simulated shotgun metagenomic reads of known composition map onto well-curated reference protein sequences and thus, differs from HMM profiles andmore » related methods. We showcase ROCker using ammonia monooxygenase (amoA) and nitrous oxide reductase (nosZ) genes, mediating oxidation of ammonia and the reduction of the potent greenhouse gas, N 2O, to inert N 2, respectively. ROCker typically showed 60-fold lower FDR when compared to the common practice of using fixed e-values. Previously uncounted ‘atypical’ nosZ genes were found to be two times more abundant, on average, than their typical counterparts in most soil metagenomes and the abundance of bacterial amoA was quantified against the highly-related particulate methane monooxygenase (pmoA). Therefore, ROCker can reliably detect and quantify target genes in short-read metagenomes.« less
The Application of FT-IR Spectroscopy for Quality Control of Flours Obtained from Polish Producers
Ceglińska, Alicja; Reder, Magdalena; Ciemniewska-Żytkiewicz, Hanna
2017-01-01
Samples of wheat, spelt, rye, and triticale flours produced by different Polish mills were studied by both classic chemical methods and FT-IR MIR spectroscopy. An attempt was made to statistically correlate FT-IR spectral data with reference data with regard to content of various components, for example, proteins, fats, ash, and fatty acids as well as properties such as moisture, falling number, and energetic value. This correlation resulted in calibrated and validated statistical models for versatile evaluation of unknown flour samples. The calibration data set was used to construct calibration models with use of the CSR and the PLS with the leave one-out, cross-validation techniques. The calibrated models were validated with a validation data set. The results obtained confirmed that application of statistical models based on MIR spectral data is a robust, accurate, precise, rapid, inexpensive, and convenient methodology for determination of flour characteristics, as well as for detection of content of selected flour ingredients. The obtained models' characteristics were as follows: R2 = 0.97, PRESS = 2.14; R2 = 0.96, PRESS = 0.69; R2 = 0.95, PRESS = 1.27; R2 = 0.94, PRESS = 0.76, for content of proteins, lipids, ash, and moisture level, respectively. Best results of CSR models were obtained for protein, ash, and crude fat (R2 = 0.86; 0.82; and 0.78, resp.). PMID:28243483
Conserved water molecules in bacterial serine hydroxymethyltransferases.
Milano, Teresa; Di Salvo, Martino Luigi; Angelaccio, Sebastiana; Pascarella, Stefano
2015-10-01
Water molecules occurring in the interior of protein structures often are endowed with key structural and functional roles. We report the results of a systematic analysis of conserved water molecules in bacterial serine hydroxymethyltransferases (SHMTs). SHMTs are an important group of pyridoxal-5'-phosphate-dependent enzymes that catalyze the reversible conversion of l-serine and tetrahydropteroylglutamate to glycine and 5,10-methylenetetrahydropteroylglutamate. The approach utilized in this study relies on two programs, ProACT2 and WatCH. The first software is able to categorize water molecules in a protein crystallographic structure as buried, positioned in clefts or at the surface. The other program finds, in a set of superposed homologous proteins, water molecules that occur approximately in equivalent position in each of the considered structures. These groups of molecules are referred to as 'clusters' and represent structurally conserved water molecules. Several conserved clusters of buried or cleft water molecules were found in the set of 11 bacterial SHMTs we took into account for this work. The majority of these clusters were not described previously. Possible structural and functional roles for the conserved water molecules are envisaged. This work provides a map of the conserved water molecules helpful for deciphering SHMT mechanism and for rational design of molecular engineering experiments. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Orellana, Luis H.; Rodriguez-R, Luis M.; Konstantinidis, Konstantinos T.
Functional annotation of metagenomic and metatranscriptomic data sets relies on similarity searches based on e-value thresholds resulting in an unknown number of false positive and negative matches. To overcome these limitations, we introduce ROCker, aimed at identifying position-specific, most-discriminant thresholds in sliding windows along the sequence of a target protein, accounting for non-discriminative domains shared by unrelated proteins. ROCker employs the receiver operating characteristic (ROC) curve to minimize false discovery rate (FDR) and calculate the best thresholds based on how simulated shotgun metagenomic reads of known composition map onto well-curated reference protein sequences and thus, differs from HMM profiles andmore » related methods. We showcase ROCker using ammonia monooxygenase (amoA) and nitrous oxide reductase (nosZ) genes, mediating oxidation of ammonia and the reduction of the potent greenhouse gas, N 2O, to inert N 2, respectively. ROCker typically showed 60-fold lower FDR when compared to the common practice of using fixed e-values. Previously uncounted ‘atypical’ nosZ genes were found to be two times more abundant, on average, than their typical counterparts in most soil metagenomes and the abundance of bacterial amoA was quantified against the highly-related particulate methane monooxygenase (pmoA). Therefore, ROCker can reliably detect and quantify target genes in short-read metagenomes.« less
Yamada, Takashi; Onimatsu, Hideki; Van Etten, James L.
2007-01-01
Chlorella viruses or chloroviruses are large, icosahedral, plaque‐forming, double‐stranded‐DNA—containing viruses that replicate in certain strains of the unicellular green alga Chlorella. DNA sequence analysis of the 330‐kbp genome of Paramecium bursaria chlorella virus 1 (PBCV‐1), the prototype of this virus family (Phycodnaviridae), predict ∼366 protein‐encoding genes and 11 tRNA genes. The predicted gene products of ∼50% of these genes resemble proteins of known function, including many that are completely unexpected for a virus. In addition, the chlorella viruses have several features and encode many gene products that distinguish them from most viruses. These products include: (1) multiple DNA methyltransferases and DNA site‐specific endonucleases, (2) the enzymes required to glycosylate their proteins and synthesize polysaccharides such as hyaluronan and chitin, (3) a virus‐encoded K+ channel (called Kcv) located in the internal membrane of the virions, (4) a SET domain containing protein (referred to as vSET) that dimethylates Lys27 in histone 3, and (5) PBCV‐1 has three types of introns; a self‐splicing intron, a spliceosomal processed intron, and a small tRNA intron. Accumulating evidence indicates that the chlorella viruses have a very long evolutionary history. This review mainly deals with research on the virion structure, genome rearrangements, gene expression, cell wall degradation, polysaccharide synthesis, and evolution of PBCV‐1 as well as other related viruses. PMID:16877063
2017-01-01
Abstract Functional annotation of metagenomic and metatranscriptomic data sets relies on similarity searches based on e-value thresholds resulting in an unknown number of false positive and negative matches. To overcome these limitations, we introduce ROCker, aimed at identifying position-specific, most-discriminant thresholds in sliding windows along the sequence of a target protein, accounting for non-discriminative domains shared by unrelated proteins. ROCker employs the receiver operating characteristic (ROC) curve to minimize false discovery rate (FDR) and calculate the best thresholds based on how simulated shotgun metagenomic reads of known composition map onto well-curated reference protein sequences and thus, differs from HMM profiles and related methods. We showcase ROCker using ammonia monooxygenase (amoA) and nitrous oxide reductase (nosZ) genes, mediating oxidation of ammonia and the reduction of the potent greenhouse gas, N2O, to inert N2, respectively. ROCker typically showed 60-fold lower FDR when compared to the common practice of using fixed e-values. Previously uncounted ‘atypical’ nosZ genes were found to be two times more abundant, on average, than their typical counterparts in most soil metagenomes and the abundance of bacterial amoA was quantified against the highly-related particulate methane monooxygenase (pmoA). Therefore, ROCker can reliably detect and quantify target genes in short-read metagenomes. PMID:28180325
Design and Initial Characterization of the SC-200 Proteomics Standard Mixture
Bauman, Andrew; Higdon, Roger; Rapson, Sean; Loiue, Brenton; Hogan, Jason; Stacy, Robin; Napuli, Alberto; Guo, Wenjin; van Voorhis, Wesley; Roach, Jared; Lu, Vincent; Landorf, Elizabeth; Stewart, Elizabeth; Kolker, Natali; Collart, Frank; Myler, Peter; van Belle, Gerald
2011-01-01
Abstract High-throughput (HTP) proteomics studies generate large amounts of data. Interpretation of these data requires effective approaches to distinguish noise from biological signal, particularly as instrument and computational capacity increase and studies become more complex. Resolving this issue requires validated and reproducible methods and models, which in turn requires complex experimental and computational standards. The absence of appropriate standards and data sets for validating experimental and computational workflows hinders the development of HTP proteomics methods. Most protein standards are simple mixtures of proteins or peptides, or undercharacterized reference standards in which the identity and concentration of the constituent proteins is unknown. The Seattle Children's 200 (SC-200) proposed proteomics standard mixture is the next step toward developing realistic, fully characterized HTP proteomics standards. The SC-200 exhibits a unique modular design to extend its functionality, and consists of 200 proteins of known identities and molar concentrations from 6 microbial genomes, distributed into 10 molar concentration tiers spanning a 1,000-fold range. We describe the SC-200's design, potential uses, and initial characterization. We identified 84% of SC-200 proteins with an LTQ-Orbitrap and 65% with an LTQ-Velos (false discovery rate = 1% for both). There were obvious trends in success rate, sequence coverage, and spectral counts with protein concentration; however, protein identification, sequence coverage, and spectral counts vary greatly within concentration levels. PMID:21250827
Design and initial characterization of the SC-200 proteomics standard mixture.
Bauman, Andrew; Higdon, Roger; Rapson, Sean; Loiue, Brenton; Hogan, Jason; Stacy, Robin; Napuli, Alberto; Guo, Wenjin; van Voorhis, Wesley; Roach, Jared; Lu, Vincent; Landorf, Elizabeth; Stewart, Elizabeth; Kolker, Natali; Collart, Frank; Myler, Peter; van Belle, Gerald; Kolker, Eugene
2011-01-01
High-throughput (HTP) proteomics studies generate large amounts of data. Interpretation of these data requires effective approaches to distinguish noise from biological signal, particularly as instrument and computational capacity increase and studies become more complex. Resolving this issue requires validated and reproducible methods and models, which in turn requires complex experimental and computational standards. The absence of appropriate standards and data sets for validating experimental and computational workflows hinders the development of HTP proteomics methods. Most protein standards are simple mixtures of proteins or peptides, or undercharacterized reference standards in which the identity and concentration of the constituent proteins is unknown. The Seattle Children's 200 (SC-200) proposed proteomics standard mixture is the next step toward developing realistic, fully characterized HTP proteomics standards. The SC-200 exhibits a unique modular design to extend its functionality, and consists of 200 proteins of known identities and molar concentrations from 6 microbial genomes, distributed into 10 molar concentration tiers spanning a 1,000-fold range. We describe the SC-200's design, potential uses, and initial characterization. We identified 84% of SC-200 proteins with an LTQ-Orbitrap and 65% with an LTQ-Velos (false discovery rate = 1% for both). There were obvious trends in success rate, sequence coverage, and spectral counts with protein concentration; however, protein identification, sequence coverage, and spectral counts vary greatly within concentration levels.
Aghdassi, Elaheh; McArthur, Margaret; Liu, Barbara; McGeer, Alison; Simor, Andrew; Allard, Johane P
2007-09-01
To compare the dietary intake of elderly living in 11 long-term care facilities (LTCFs) to the Estimated Average Requirement set as part of the Dietary Reference Intake for older adults. A cross-sectional assessment of dietary intake using a 3 days food record among 407 elderly with mean age of 85.2 +/- 7.7 years and BMI of 23.8 +/- 5.7 kg/m(2). This population sample was similar to the one living in LTCFs in the province of Ontario. The daily energy intake was 1513 +/- 363 kcal (6330.4 +/- 1518.8 kJ). Percentage of energy from fat, saturated fat, polyunsaturated fat, protein, and carbohydrate were 30%, 11%, 5.2%, 15%, and 56%, respectively. Although these values were close to the recommendations, 29.5% had protein intake below the recommended 0.8 g/kg; and 38.3% of subjects had cholesterol intake more than the recommended 300 mg/d. More than 50% of the subjects had suboptimal intake of calcium, magnesium, zinc and vitamins E, B(6), and folate. In addition, greater than 15% had suboptimal intakes of other micronutrients such as vitamins A, C, niacin, and copper. Elderly subjects living in LTCFs in Toronto despite having a normal body mass index (BMI), do not meet the recommended levels of intake for protein and many of the micronutrients. LTCFs staff should monitor dietary intake. Menu modification and micronutrient supplementation may be required in order to meet the daily requirements of these elderly.
Biosafety research for non-target organism risk assessment of RNAi-based GE plants
Roberts, Andrew F.; Devos, Yann; Lemgo, Godwin N. Y.; Zhou, Xuguo
2015-01-01
RNA interference, or RNAi, refers to a set of biological processes that make use of conserved cellular machinery to silence genes. Although there are several variations in the source and mechanism, they are all triggered by double stranded RNA (dsRNA) which is processed by a protein complex into small, single stranded RNA, referred to as small interfering RNAs (siRNA) with complementarity to sequences in genes targeted for silencing. The use of the RNAi mechanism to develop new traits in plants has fueled a discussion about the environmental safety of the technology for these applications, and this was the subject of a symposium session at the 13th ISBGMO in Cape Town, South Africa. This paper continues that discussion by proposing research areas that may be beneficial for future environmental risk assessments of RNAi-based genetically modified plants, with a particular focus on non-target organism assessment. PMID:26594220
Zhu, Xinyu; Ma, Hong; Chen, Zhiduan
2011-03-09
Plants contain numerous Su(var)3-9 homologues (SUVH) and related (SUVR) genes, some of which await functional characterization. Although there have been studies on the evolution of plant Su(var)3-9 SET genes, a systematic evolutionary study including major land plant groups has not been reported. Large-scale phylogenetic and evolutionary analyses can help to elucidate the underlying molecular mechanisms and contribute to improve genome annotation. Putative orthologs of plant Su(var)3-9 SET protein sequences were retrieved from major representatives of land plants. A novel clustering that included most members analyzed, henceforth referred to as core Su(var)3-9 homologues and related (cSUVHR) gene clade, was identified as well as all orthologous groups previously identified. Our analysis showed that plant Su(var)3-9 SET proteins possessed a variety of domain organizations, and can be classified into five types and ten subtypes. Plant Su(var)3-9 SET genes also exhibit a wide range of gene structures among different paralogs within a family, even in the regions encoding conserved PreSET and SET domains. We also found that the majority of SUVH members were intronless and formed three subclades within the SUVH clade. A detailed phylogenetic analysis of the plant Su(var)3-9 SET genes was performed. A novel deep phylogenetic relationship including most plant Su(var)3-9 SET genes was identified. Additional domains such as SAR, ZnF_C2H2 and WIYLD were early integrated into primordial PreSET/SET/PostSET domain organization. At least three classes of gene structures had been formed before the divergence of Physcomitrella patens (moss) from other land plants. One or multiple retroposition events might have occurred among SUVH genes with the donor genes leading to the V-2 orthologous group. The structural differences among evolutionary groups of plant Su(var)3-9 SET genes with different functions were described, contributing to the design of further experimental studies.
USDA-ARS?s Scientific Manuscript database
A nested PCR assay was developed to determine the presence of a gene encoding a bacteriophage Mu-like portal protein, gp29, in 15 reference strains and 31 field isolates of Haemophilus parasuis. Specific primers, based on the gene’s sequence, were utilized. A majority of the virulent reference strai...
Imin, Nijat; De Jong, Femke; Mathesius, Ulrike; van Noorden, Giel; Saeed, Nasir A; Wang, Xin-Ding; Rose, Ray J; Rolfe, Barry G
2004-07-01
Using a combination of two-dimensional gel electrophoresis (2-DE) protein mapping and mass spectrometry (MS) analysis, we have established proteome reference maps of Medicago truncatula embryogenic tissue culture cells. The cultures were generated from single protoplasts, which provided a relatively homogeneous cell population. We used these to analyze protein expression at the globular stages of somatic embryogenesis, which is the earliest morphogenetic embryonic stage. Over 3000 proteins could reproducibly be resolved over a pI range of 4-11. Three hundred and twelve protein spots were extracted from colloidal Coomassie Blue-stained 2-DE gels and analyzed by matrix-assisted laser desorption/ionization-time of flight MS analysis and tandem MS sequencing. This enabled the identification of 169 protein spots representing 128 unique gene products using a publicly available expressed sequence tag database and the MASCOT search engine. These reference maps will be valuable for the investigation of the molecular events which occur during somatic embryogenesis in M. truncatula. The proteome reference maps and supplementary materials will be available and updated for public access at http://semele.anu.edu.au/.
A new test set for validating predictions of protein-ligand interaction.
Nissink, J Willem M; Murray, Chris; Hartshorn, Mike; Verdonk, Marcel L; Cole, Jason C; Taylor, Robin
2002-12-01
We present a large test set of protein-ligand complexes for the purpose of validating algorithms that rely on the prediction of protein-ligand interactions. The set consists of 305 complexes with protonation states assigned by manual inspection. The following checks have been carried out to identify unsuitable entries in this set: (1) assessing the involvement of crystallographically related protein units in ligand binding; (2) identification of bad clashes between protein side chains and ligand; and (3) assessment of structural errors, and/or inconsistency of ligand placement with crystal structure electron density. In addition, the set has been pruned to assure diversity in terms of protein-ligand structures, and subsets are supplied for different protein-structure resolution ranges. A classification of the set by protein type is available. As an illustration, validation results are shown for GOLD and SuperStar. GOLD is a program that performs flexible protein-ligand docking, and SuperStar is used for the prediction of favorable interaction sites in proteins. The new CCDC/Astex test set is freely available to the scientific community (http://www.ccdc.cam.ac.uk). Copyright 2002 Wiley-Liss, Inc.
Hunsaker, Joshua J H; Wyness, Sara P; Snow, Taylor M; Genzen, Jonathan R
2016-12-01
Refractometric methods to measure total protein (TP) in serum and plasma specimens have been replaced by automated biuret methods in virtually all routine clinical testing. A subset of laboratories, however, still report using refractometry to measure TP in conjunction with serum protein electrophoresis. The objective of this study was therefore to conduct a modern performance evaluation of a digital refractometer for TP measurement. Performance evaluation of a MISCO Palm Abbe™ digital refractometer was conducted through device familiarization, carryover, precision, accuracy, linearity, analytical sensitivity, analytical specificity, and reference interval verification. Comparison assays included a manual refractometer and an automated biuret assay. Carryover risk was eliminated using a demineralized distilled water (ddH 2 O) wash step. Precision studies demonstrated overall imprecision of 2.2% CV (low TP pool) and 0.5% CV (high TP pool). Accuracy studies demonstrated correlation to both manual refractometry and the biuret method. An overall positive bias (+5.0%) was observed versus the biuret method. On average, outlier specimens had an increased triglyceride concentration. Linearity was verified using mixed dilutions of: a) low and high concentration patient pools, or b) albumin-spiked ddH 2 O and high concentration patient pool. Decreased recovery was observed using ddH 2 O dilutions at low TP concentrations. Significant interference was detected at high concentrations of glucose (>267 mg/dL) and triglycerides (>580 mg/dL). Current laboratory reference intervals for TP were verified. Performance characteristics of this digital refractometer were validated in a clinical laboratory setting. Biuret method remains the preferred assay for TP measurement in routine clinical analyses.
Flint, Mark; Matthews, Beren J; Limpus, Colin J; Mills, Paul C
2015-01-01
Biochemical and haematological parameters are increasingly used to diagnose disease in green sea turtles. Specific clinical pathology tools, such as plasma protein electrophoresis analysis, are now being used more frequently to improve our ability to diagnose disease in the live animal. Plasma protein reference intervals were calculated from 55 clinically healthy green sea turtles using pulsed field electrophoresis to determine pre-albumin, albumin, α-, β- and γ-globulin concentrations. The estimated reference intervals were then compared with data profiles from clinically unhealthy turtles admitted to a local wildlife hospital to assess the validity of the derived intervals and identify the clinically useful plasma protein fractions. Eighty-six per cent {19 of 22 [95% confidence interval (CI) 65-97]} of clinically unhealthy turtles had values outside the derived reference intervals, including the following: total protein [six of 22 turtles or 27% (95% CI 11-50%)], pre-albumin [two of five, 40% (95% CI 5-85%)], albumin [13 of 22, 59% (95% CI 36-79%)], total albumin [13 of 22, 59% (95% CI 36-79%)], α- [10 of 22, 45% (95% CI 24-68%)], β- [two of 10, 20% (95% CI 3-56%)], γ- [one of 10, 10% (95% CI 0.3-45%)] and β-γ-globulin [one of 12, 8% (95% CI 0.2-38%)] and total globulin [five of 22, 23% (8-45%)]. Plasma protein electrophoresis shows promise as an accurate adjunct tool to identify a disease state in marine turtles. This study presents the first reference interval for plasma protein electrophoresis in the Indo-Pacific green sea turtle.
A tree of life based on ninety-eight expressed genes conserved across diverse eukaryotic species
Jayaswal, Pawan Kumar; Dogra, Vivek; Shanker, Asheesh; Sharma, Tilak Raj
2017-01-01
Rapid advances in DNA sequencing technologies have resulted in the accumulation of large data sets in the public domain, facilitating comparative studies to provide novel insights into the evolution of life. Phylogenetic studies across the eukaryotic taxa have been reported but on the basis of a limited number of genes. Here we present a genome-wide analysis across different plant, fungal, protist, and animal species, with reference to the 36,002 expressed genes of the rice genome. Our analysis revealed 9831 genes unique to rice and 98 genes conserved across all 49 eukaryotic species analysed. The 98 genes conserved across diverse eukaryotes mostly exhibited binding and catalytic activities and shared common sequence motifs; and hence appeared to have a common origin. The 98 conserved genes belonged to 22 functional gene families including 26S protease, actin, ADP–ribosylation factor, ATP synthase, casein kinase, DEAD-box protein, DnaK, elongation factor 2, glyceraldehyde 3-phosphate, phosphatase 2A, ras-related protein, Ser/Thr protein phosphatase family protein, tubulin, ubiquitin and others. The consensus Bayesian eukaryotic tree of life developed in this study demonstrated widely separated clades of plants, fungi, and animals. Musa acuminata provided an evolutionary link between monocotyledons and dicotyledons, and Salpingoeca rosetta provided an evolutionary link between fungi and animals, which indicating that protozoan species are close relatives of fungi and animals. The divergence times for 1176 species pairs were estimated accurately by integrating fossil information with synonymous substitution rates in the comprehensive set of 98 genes. The present study provides valuable insight into the evolution of eukaryotes. PMID:28922368
Solernou, Albert
2018-01-01
Fluctuating Finite Element Analysis (FFEA) is a software package designed to perform continuum mechanics simulations of proteins and other globular macromolecules. It combines conventional finite element methods with stochastic thermal noise, and is appropriate for simulations of large proteins and protein complexes at the mesoscale (length-scales in the range of 5 nm to 1 μm), where there is currently a paucity of modelling tools. It requires 3D volumetric information as input, which can be low resolution structural information such as cryo-electron tomography (cryo-ET) maps or much higher resolution atomistic co-ordinates from which volumetric information can be extracted. In this article we introduce our open source software package for performing FFEA simulations which we have released under a GPLv3 license. The software package includes a C ++ implementation of FFEA, together with tools to assist the user to set up the system from Electron Microscopy Data Bank (EMDB) or Protein Data Bank (PDB) data files. We also provide a PyMOL plugin to perform basic visualisation and additional Python tools for the analysis of FFEA simulation trajectories. This manuscript provides a basic background to the FFEA method, describing the implementation of the core mechanical model and how intermolecular interactions and the solvent environment are included within this framework. We provide prospective FFEA users with a practical overview of how to set up an FFEA simulation with reference to our publicly available online tutorials and manuals that accompany this first release of the package. PMID:29570700
Solving the Problem: Genome Annotation Standards before the Data Deluge.
Klimke, William; O'Donovan, Claire; White, Owen; Brister, J Rodney; Clark, Karen; Fedorov, Boris; Mizrachi, Ilene; Pruitt, Kim D; Tatusova, Tatiana
2011-10-15
The promise of genome sequencing was that the vast undiscovered country would be mapped out by comparison of the multitude of sequences available and would aid researchers in deciphering the role of each gene in every organism. Researchers recognize that there is a need for high quality data. However, different annotation procedures, numerous databases, and a diminishing percentage of experimentally determined gene functions have resulted in a spectrum of annotation quality. NCBI in collaboration with sequencing centers, archival databases, and researchers, has developed the first international annotation standards, a fundamental step in ensuring that high quality complete prokaryotic genomes are available as gold standard references. Highlights include the development of annotation assessment tools, community acceptance of protein naming standards, comparison of annotation resources to provide consistent annotation, and improved tracking of the evidence used to generate a particular annotation. The development of a set of minimal standards, including the requirement for annotated complete prokaryotic genomes to contain a full set of ribosomal RNAs, transfer RNAs, and proteins encoding core conserved functions, is an historic milestone. The use of these standards in existing genomes and future submissions will increase the quality of databases, enabling researchers to make accurate biological discoveries.
Solving the Problem: Genome Annotation Standards before the Data Deluge
Klimke, William; O'Donovan, Claire; White, Owen; Brister, J. Rodney; Clark, Karen; Fedorov, Boris; Mizrachi, Ilene; Pruitt, Kim D.; Tatusova, Tatiana
2011-01-01
The promise of genome sequencing was that the vast undiscovered country would be mapped out by comparison of the multitude of sequences available and would aid researchers in deciphering the role of each gene in every organism. Researchers recognize that there is a need for high quality data. However, different annotation procedures, numerous databases, and a diminishing percentage of experimentally determined gene functions have resulted in a spectrum of annotation quality. NCBI in collaboration with sequencing centers, archival databases, and researchers, has developed the first international annotation standards, a fundamental step in ensuring that high quality complete prokaryotic genomes are available as gold standard references. Highlights include the development of annotation assessment tools, community acceptance of protein naming standards, comparison of annotation resources to provide consistent annotation, and improved tracking of the evidence used to generate a particular annotation. The development of a set of minimal standards, including the requirement for annotated complete prokaryotic genomes to contain a full set of ribosomal RNAs, transfer RNAs, and proteins encoding core conserved functions, is an historic milestone. The use of these standards in existing genomes and future submissions will increase the quality of databases, enabling researchers to make accurate biological discoveries. PMID:22180819
Cray, Carolyn; Dickey, Meranda; Brewer, Leah Brinson; Arheart, Kristopher L
2013-12-01
The acute phase protein serum amyloid A (SAA) has been previously shown to have value as a biomarker of inflammation and infection in many species, including manatees (Trichechus manatus latirostris). In the current study, results from an automated assay for SAA were used in a rehabilitation setting. Reference intervals were established from clinically normal manatees using the robust method: 0-46 mg/L. More than 30-fold higher mean SAA levels were observed in manatees suffering from cold stress and boat-related trauma. Poor correlations were observed between SAA and total white blood count, percentage of neutrophils, albumin, and albumin/globulin ratio. A moderate correlation was observed between SAA and the presence of nucleated red blood cells. The sensitivity of SAA testing was 93% and the specificity was 98%, representing the highest combined values of all the analytes. The results indicate that the automated method for SAA quantitation can provide important clinical data for manatees in a rehabilitation setting.
TriAnnot: A Versatile and High Performance Pipeline for the Automated Annotation of Plant Genomes
Leroy, Philippe; Guilhot, Nicolas; Sakai, Hiroaki; Bernard, Aurélien; Choulet, Frédéric; Theil, Sébastien; Reboux, Sébastien; Amano, Naoki; Flutre, Timothée; Pelegrin, Céline; Ohyanagi, Hajime; Seidel, Michael; Giacomoni, Franck; Reichstadt, Mathieu; Alaux, Michael; Gicquello, Emmanuelle; Legeai, Fabrice; Cerutti, Lorenzo; Numa, Hisataka; Tanaka, Tsuyoshi; Mayer, Klaus; Itoh, Takeshi; Quesneville, Hadi; Feuillet, Catherine
2012-01-01
In support of the international effort to obtain a reference sequence of the bread wheat genome and to provide plant communities dealing with large and complex genomes with a versatile, easy-to-use online automated tool for annotation, we have developed the TriAnnot pipeline. Its modular architecture allows for the annotation and masking of transposable elements, the structural, and functional annotation of protein-coding genes with an evidence-based quality indexing, and the identification of conserved non-coding sequences and molecular markers. The TriAnnot pipeline is parallelized on a 712 CPU computing cluster that can run a 1-Gb sequence annotation in less than 5 days. It is accessible through a web interface for small scale analyses or through a server for large scale annotations. The performance of TriAnnot was evaluated in terms of sensitivity, specificity, and general fitness using curated reference sequence sets from rice and wheat. In less than 8 h, TriAnnot was able to predict more than 83% of the 3,748 CDS from rice chromosome 1 with a fitness of 67.4%. On a set of 12 reference Mb-sized contigs from wheat chromosome 3B, TriAnnot predicted and annotated 93.3% of the genes among which 54% were perfectly identified in accordance with the reference annotation. It also allowed the curation of 12 genes based on new biological evidences, increasing the percentage of perfect gene prediction to 63%. TriAnnot systematically showed a higher fitness than other annotation pipelines that are not improved for wheat. As it is easily adaptable to the annotation of other plant genomes, TriAnnot should become a useful resource for the annotation of large and complex genomes in the future. PMID:22645565
Barsalobres-Cavallari, Carla F; Severino, Fábio E; Maluf, Mirian P; Maia, Ivan G
2009-01-01
Background Quantitative data from gene expression experiments are often normalized by transcription levels of reference or housekeeping genes. An inherent assumption for their use is that the expression of these genes is highly uniform in living organisms during various phases of development, in different cell types and under diverse environmental conditions. To date, the validation of reference genes in plants has received very little attention and suitable reference genes have not been defined for a great number of crop species including Coffea arabica. The aim of the research reported herein was to compare the relative expression of a set of potential reference genes across different types of tissue/organ samples of coffee. We also validated the expression profiles of the selected reference genes at various stages of development and under a specific biotic stress. Results The expression levels of five frequently used housekeeping genes (reference genes), namely alcohol dehydrogenase (adh), 14-3-3, polyubiquitin (poly), β-actin (actin) and glyceraldehyde-3-phosphate dehydrogenase (gapdh) was assessed by quantitative real-time RT-PCR over a set of five tissue/organ samples (root, stem, leaf, flower, and fruits) of Coffea arabica plants. In addition to these commonly used internal controls, three other genes encoding a cysteine proteinase (cys), a caffeine synthase (ccs) and the 60S ribosomal protein L7 (rpl7) were also tested. Their stability and suitability as reference genes were validated by geNorm, NormFinder and BestKeeper programs. The obtained results revealed significantly variable expression levels of all reference genes analyzed, with the exception of gapdh, which showed no significant changes in expression among the investigated experimental conditions. Conclusion Our data suggests that the expression of housekeeping genes is not completely stable in coffee. Based on our results, gapdh, followed by 14-3-3 and rpl7 were found to be homogeneously expressed and are therefore adequate for normalization purposes, showing equivalent transcript levels in different tissue/organ samples. Gapdh is therefore the recommended reference gene for measuring gene expression in Coffea arabica. Its use will enable more accurate and reliable normalization of tissue/organ-specific gene expression studies in this important cherry crop plant. PMID:19126214
Lexical Entrainment and Lexical Differentiation in Reference Phrase Choice
ERIC Educational Resources Information Center
Van Der Wege, Mija M.
2009-01-01
Speakers reuse prior references to objects when choosing reference phrases, a phenomenon known as lexical entrainment. One explanation is that speakers want to maintain a set of previously established referential precedents. Speakers may also contrast any new referents against this previously established set, thereby avoiding applying the same…
A Markov Random Field Framework for Protein Side-Chain Resonance Assignment
NASA Astrophysics Data System (ADS)
Zeng, Jianyang; Zhou, Pei; Donald, Bruce Randall
Nuclear magnetic resonance (NMR) spectroscopy plays a critical role in structural genomics, and serves as a primary tool for determining protein structures, dynamics and interactions in physiologically-relevant solution conditions. The current speed of protein structure determination via NMR is limited by the lengthy time required in resonance assignment, which maps spectral peaks to specific atoms and residues in the primary sequence. Although numerous algorithms have been developed to address the backbone resonance assignment problem [68,2,10,37,14,64,1,31,60], little work has been done to automate side-chain resonance assignment [43, 48, 5]. Most previous attempts in assigning side-chain resonances depend on a set of NMR experiments that record through-bond interactions with side-chain protons for each residue. Unfortunately, these NMR experiments have low sensitivity and limited performance on large proteins, which makes it difficult to obtain enough side-chain resonance assignments. On the other hand, it is essential to obtain almost all of the side-chain resonance assignments as a prerequisite for high-resolution structure determination. To overcome this deficiency, we present a novel side-chain resonance assignment algorithm based on alternative NMR experiments measuring through-space interactions between protons in the protein, which also provide crucial distance restraints and are normally required in high-resolution structure determination. We cast the side-chain resonance assignment problem into a Markov Random Field (MRF) framework, and extend and apply combinatorial protein design algorithms to compute the optimal solution that best interprets the NMR data. Our MRF framework captures the contact map information of the protein derived from NMR spectra, and exploits the structural information available from the backbone conformations determined by orientational restraints and a set of discretized side-chain conformations (i.e., rotamers). A Hausdorff-based computation is employed in the scoring function to evaluate the probability of side-chain resonance assignments to generate the observed NMR spectra. The complexity of the assignment problem is first reduced by using a dead-end elimination (DEE) algorithm, which prunes side-chain resonance assignments that are provably not part of the optimal solution. Then an A* search algorithm is used to find a set of optimal side-chain resonance assignments that best fit the NMR data. We have tested our algorithm on NMR data for five proteins, including the FF Domain 2 of human transcription elongation factor CA150 (FF2), the B1 domain of Protein G (GB1), human ubiquitin, the ubiquitin-binding zinc finger domain of the human Y-family DNA polymerase Eta (pol η UBZ), and the human Set2-Rpb1 interacting domain (hSRI). Our algorithm assigns resonances for more than 90% of the protons in the proteins, and achieves about 80% correct side-chain resonance assignments. The final structures computed using distance restraints resulting from the set of assigned side-chain resonances have backbone RMSD 0.5 - 1.4 Å and all-heavy-atom RMSD 1.0 - 2.2 Å from the reference structures that were determined by X-ray crystallography or traditional NMR approaches. These results demonstrate that our algorithm can be successfully applied to automate side-chain resonance assignment and high-quality protein structure determination. Since our algorithm does not require any specific NMR experiments for measuring the through-bond interactions with side-chain protons, it can save a significant amount of both experimental cost and spectrometer time, and hence accelerate the NMR structure determination process.
Pan, Huipeng; Ma, Yabin; Zhang, Deyong; Liu, Yong; Zhang, Zhanhong; Zheng, Changying; Chu, Dong
2015-01-01
Reverse transcriptase-quantitative polymerase chain reaction (RT-qPCR) is a reliable technique for measuring and evaluating gene expression during variable biological processes. To facilitate gene expression studies, normalization of genes of interest relative to stable reference genes is crucial. The western flower thrips Frankliniella occidentalis (Pergande) (Thysanoptera: Thripidae), the main vector of tomato spotted wilt virus (TSWV), is a destructive invasive species. In this study, the expression profiles of 11 candidate reference genes from nonviruliferous and viruliferous F. occidentalis were investigated. Five distinct algorithms, geNorm, NormFinder, BestKeeper, the ΔC t method, and RefFinder, were used to determine the performance of these genes. geNorm, NormFinder, BestKeeper, and RefFinder identified heat shock protein 70 (HSP70), heat shock protein 60 (HSP60), elongation factor 1 α, and ribosomal protein l32 (RPL32) as the most stable reference genes, and the ΔC t method identified HSP60, HSP70, RPL32, and heat shock protein 90 as the most stable reference genes. Additionally, two reference genes were sufficient for reliable normalization in nonviruliferous and viruliferous F. occidentalis. This work provides a foundation for investigating the molecular mechanisms of TSWV and F. occidentalis interactions. PMID:26244556
Ostermeir, Katja; Zacharias, Martin
2014-12-01
Coarse-grained elastic network models (ENM) of proteins offer a low-resolution representation of protein dynamics and directions of global mobility. A Hamiltonian-replica exchange molecular dynamics (H-REMD) approach has been developed that combines information extracted from an ENM analysis with atomistic explicit solvent MD simulations. Based on a set of centers representing rigid segments (centroids) of a protein, a distance-dependent biasing potential is constructed by means of an ENM analysis to promote and guide centroid/domain rearrangements. The biasing potentials are added with different magnitude to the force field description of the MD simulation along the replicas with one reference replica under the control of the original force field. The magnitude and the form of the biasing potentials are adapted during the simulation based on the average sampled conformation to reach a near constant biasing in each replica after equilibration. This allows for canonical sampling of conformational states in each replica. The application of the methodology to a two-domain segment of the glycoprotein 130 and to the protein cyanovirin-N indicates significantly enhanced global domain motions and improved conformational sampling compared with conventional MD simulations. © 2014 Wiley Periodicals, Inc.
Marani, Mariela M; Costa, Joana; Mafra, Isabel; Oliveira, Maria Beatriz P P; Camperi, Silvia A; Leite, José Roberto de Souza Almeida
2015-03-01
For the prospective immunorecognition of 5-enolpyruvylshikimate-3-phosphate synthase (CP4-EPSPS) as a biomarker protein expressed by transgenic soybean, an extensive in silico evaluation of the referred protein was performed. The main objective of this study was the selection of a set of peptides that could function as potential immunogens for the production of novel antibodies against CP4-EPSPS protein. For this purpose, the protein was in silico cleaved with trypsin/chymotrypsin and the resultant peptides were extensively analyzed for further selection of the best candidates for antibody production. The analysis enabled the successful proposal of four peptides with potential immunogenicity for their future use as screening biomarkers of genetically modified organisms. To our knowledge, this is the first attempt to select and define potential linear epitopes for the immunization of animals and, subsequently, to generate adequate antibodies for CP4-EPSPS recognition. The present work will be followed by the synthesis of the candidate peptides to be incubated in animals for antibody generation and potential applicability for the development of an immunosensor for CP4-EPSPS detection. © 2015 Wiley Periodicals, Inc.
Predicting the accuracy of ligand overlay methods with Random Forest models.
Nandigam, Ravi K; Evans, David A; Erickson, Jon A; Kim, Sangtae; Sutherland, Jeffrey J
2008-12-01
The accuracy of binding mode prediction using standard molecular overlay methods (ROCS, FlexS, Phase, and FieldCompare) is studied. Previous work has shown that simple decision tree modeling can be used to improve accuracy by selection of the best overlay template. This concept is extended to the use of Random Forest (RF) modeling for template and algorithm selection. An extensive data set of 815 ligand-bound X-ray structures representing 5 gene families was used for generating ca. 70,000 overlays using four programs. RF models, trained using standard measures of ligand and protein similarity and Lipinski-related descriptors, are used for automatically selecting the reference ligand and overlay method maximizing the probability of reproducing the overlay deduced from X-ray structures (i.e., using rmsd < or = 2 A as the criteria for success). RF model scores are highly predictive of overlay accuracy, and their use in template and method selection produces correct overlays in 57% of cases for 349 overlay ligands not used for training RF models. The inclusion in the models of protein sequence similarity enables the use of templates bound to related protein structures, yielding useful results even for proteins having no available X-ray structures.
A Non-parametric Cutout Index for Robust Evaluation of Identified Proteins*
Serang, Oliver; Paulo, Joao; Steen, Hanno; Steen, Judith A.
2013-01-01
This paper proposes a novel, automated method for evaluating sets of proteins identified using mass spectrometry. The remaining peptide-spectrum match score distributions of protein sets are compared to an empirical absent peptide-spectrum match score distribution, and a Bayesian non-parametric method reminiscent of the Dirichlet process is presented to accurately perform this comparison. Thus, for a given protein set, the process computes the likelihood that the proteins identified are correctly identified. First, the method is used to evaluate protein sets chosen using different protein-level false discovery rate (FDR) thresholds, assigning each protein set a likelihood. The protein set assigned the highest likelihood is used to choose a non-arbitrary protein-level FDR threshold. Because the method can be used to evaluate any protein identification strategy (and is not limited to mere comparisons of different FDR thresholds), we subsequently use the method to compare and evaluate multiple simple methods for merging peptide evidence over replicate experiments. The general statistical approach can be applied to other types of data (e.g. RNA sequencing) and generalizes to multivariate problems. PMID:23292186
Srihari, Sriganesh; Yong, Chern Han; Patil, Ashwini; Wong, Limsoon
2015-09-14
Complexes of physically interacting proteins constitute fundamental functional units responsible for driving biological processes within cells. A faithful reconstruction of the entire set of complexes is therefore essential to understand the functional organisation of cells. In this review, we discuss the key contributions of computational methods developed till date (approximately between 2003 and 2015) for identifying complexes from the network of interacting proteins (PPI network). We evaluate in depth the performance of these methods on PPI datasets from yeast, and highlight their limitations and challenges, in particular at detecting sparse and small or sub-complexes and discerning overlapping complexes. We describe methods for integrating diverse information including expression profiles and 3D structures of proteins with PPI networks to understand the dynamics of complex formation, for instance, of time-based assembly of complex subunits and formation of fuzzy complexes from intrinsically disordered proteins. Finally, we discuss methods for identifying dysfunctional complexes in human diseases, an application that is proving invaluable to understand disease mechanisms and to discover novel therapeutic targets. We hope this review aptly commemorates a decade of research on computational prediction of complexes and constitutes a valuable reference for further advancements in this exciting area. Copyright © 2015 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.
Exploring the dark foldable proteome by considering hydrophobic amino acids topology
Bitard-Feildel, Tristan; Callebaut, Isabelle
2017-01-01
The protein universe corresponds to the set of all proteins found in all organisms. A way to explore it is by taking into account the domain content of the proteins. However, some part of sequences and many entire sequences remain un-annotated despite a converging number of domain families. The un-annotated part of the protein universe is referred to as the dark proteome and remains poorly characterized. In this study, we quantify the amount of foldable domains within the dark proteome by using the hydrophobic cluster analysis methodology. These un-annotated foldable domains were grouped using a combination of remote homology searches and domain annotations, leading to define different levels of darkness. The dark foldable domains were analyzed to understand what make them different from domains stored in databases and thus difficult to annotate. The un-annotated domains of the dark proteome universe display specific features relative to database domains: shorter length, non-canonical content and particular topology in hydrophobic residues, higher propensity for disorder, and a higher energy. These features make them hard to relate to known families. Based on these observations, we emphasize that domain annotation methodologies can still be improved to fully apprehend and decipher the molecular evolution of the protein universe. PMID:28134276
Transcriptional landscapes of Axolotl (Ambystoma mexicanum).
Caballero-Pérez, Juan; Espinal-Centeno, Annie; Falcon, Francisco; García-Ortega, Luis F; Curiel-Quesada, Everardo; Cruz-Hernández, Andrés; Bako, Laszlo; Chen, Xuemei; Martínez, Octavio; Alberto Arteaga-Vázquez, Mario; Herrera-Estrella, Luis; Cruz-Ramírez, Alfredo
2018-01-15
The axolotl (Ambystoma mexicanum) is the vertebrate model system with the highest regeneration capacity. Experimental tools established over the past 100 years have been fundamental to start unraveling the cellular and molecular basis of tissue and limb regeneration. In the absence of a reference genome for the Axolotl, transcriptomic analysis become fundamental to understand the genetic basis of regeneration. Here we present one of the most diverse transcriptomic data sets for Axolotl by profiling coding and non-coding RNAs from diverse tissues. We reconstructed a population of 115,906 putative protein coding mRNAs as full ORFs (including isoforms). We also identified 352 conserved miRNAs and 297 novel putative mature miRNAs. Systematic enrichment analysis of gene expression allowed us to identify tissue-specific protein-coding transcripts. We also found putative novel and conserved microRNAs which potentially target mRNAs which are reported as important disease candidates in heart and liver. Copyright © 2017 Elsevier Inc. All rights reserved.
[Exploring pharmacological principle of Artemisia carvifolia with textmining technology].
Zhao, Yu-Ping; Wang, Hui; Yang, Guang; Qiu, Zhi-Dong; Qu, Xiao-Bo; Zhang, Xiao-Bo
2016-08-01
To explore the pharmacological principle of Artemisia carvifolia,the text mining technique was used. All the references of A. carvifolia were collected from PubMed database, and then the rules of the main ingredient,relative diseases, organs, tissues, proteins and metabolites were analyzed. Finally, a network was set up. Then it was found that the main ingredients included sesquiterpenoids,flavonoids,and volatileoils.The diseases such as malaria, cerebral malaria, falciparum malaria, visceral leishmaniasis and systemic lupus erythematosus were often treated with A. carvifolia. In association in organ were the liver, skin, trachea,lungs,and spleen.Correlations with tissues were mainly including macrophages, T lymphocytes, blood vessels, epithelial cells.The protein was correlation with it involved CYP450, PI3K, TNF-α, AASDPPT, DNA polymerase and so on. Comprehensive and systematic treatment principle of A. carvifolia was obtained by text mining, which was helpful in clinical application. Copyright© by the Chinese Pharmaceutical Association.
Pujar, Shashikant; O’Leary, Nuala A; Farrell, Catherine M; Mudge, Jonathan M; Wallin, Craig; Diekhans, Mark; Barnes, If; Bennett, Ruth; Berry, Andrew E; Cox, Eric; Davidson, Claire; Goldfarb, Tamara; Gonzalez, Jose M; Hunt, Toby; Jackson, John; Joardar, Vinita; Kay, Mike P; Kodali, Vamsi K; McAndrews, Monica; McGarvey, Kelly M; Murphy, Michael; Rajput, Bhanu; Rangwala, Sanjida H; Riddick, Lillian D; Seal, Ruth L; Webb, David; Zhu, Sophia; Aken, Bronwen L; Bult, Carol J; Frankish, Adam; Pruitt, Kim D
2018-01-01
Abstract The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community. PMID:29126148
nGASP--the nematode genome annotation assessment project.
Coghlan, Avril; Fiedler, Tristan J; McKay, Sheldon J; Flicek, Paul; Harris, Todd W; Blasiar, Darin; Stein, Lincoln D
2008-12-19
While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets across 10 Mb of the C. elegans genome. Predictions were compared to reference gene sets consisting of confirmed or manually curated gene models from WormBase. The most accurate gene-finders were 'combiner' algorithms, which made use of transcript- and protein-alignments and multi-genome alignments, as well as gene predictions from other gene-finders. Gene-finders that used alignments of ESTs, mRNAs and proteins came in second. There was a tie for third place between gene-finders that used multi-genome alignments and ab initio gene-finders. The median gene level sensitivity of combiners was 78% and their specificity was 42%, which is nearly the same accuracy reported for combiners in the human genome. C. elegans genes with exons of unusual hexamer content, as well as those with unusually many exons, short exons, long introns, a weak translation start signal, weak splice sites, or poorly conserved orthologs posed the greatest difficulty for gene-finders. This experiment establishes a baseline of gene prediction accuracy in Caenorhabditis genomes, and has guided the choice of gene-finders for the annotation of newly sequenced genomes of Caenorhabditis and other nematode species. We have created new gene sets for C. briggsae, C. remanei, C. brenneri, C. japonica, and Brugia malayi using some of the best-performing gene-finders.
Pruitt, Kim D.; Tatusova, Tatiana; Maglott, Donna R.
2005-01-01
The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes sequence data that are currently publicly available in the archival databases. The database incorporates data from over 2400 organisms and includes over one million proteins representing significant taxonomic diversity spanning prokaryotes, eukaryotes and viruses. Nucleotide and protein sequences are explicitly linked, and the sequences are linked to other resources including the NCBI Map Viewer and Gene. Sequences are annotated to include coding regions, conserved domains, variation, references, names, database cross-references, and other features using a combined approach of collaboration and other input from the scientific community, automated annotation, propagation from GenBank and curation by NCBI staff. PMID:15608248
Geometric registration of images by similarity transformation using two reference points
NASA Technical Reports Server (NTRS)
Kang, Yong Q. (Inventor); Jo, Young-Heon (Inventor); Yan, Xiao-Hai (Inventor)
2011-01-01
A method for registering a first image to a second image using a similarity transformation. The each image includes a plurality of pixels. The first image pixels are mapped to a set of first image coordinates and the second image pixels are mapped to a set of second image coordinates. The first image coordinates of two reference points in the first image are determined. The second image coordinates of these reference points in the second image are determined. A Cartesian translation of the set of second image coordinates is performed such that the second image coordinates of the first reference point match its first image coordinates. A similarity transformation of the translated set of second image coordinates is performed. This transformation scales and rotates the second image coordinates about the first reference point such that the second image coordinates of the second reference point match its first image coordinates.
Hrubec, Terry C.; Smith, Stephen A.; Robertson, John L.
2001-01-01
Hybrid striped bass (Morone chrysops X Morone saxatilis ) are an important aquaculture species yet there are few diagnostic tools available to assess their health. Hematology and clinical chemistry analyses are not used extensively in fish medicine due to the lack of reference intervals for various fish species, and because factors such as age can affect blood values. There is little published information regarding age-related changes in blood values of juvenile fish. It is important to evaluate juvenile fish, as this is the time they are raised in aquaculture settings. Determining age-related changes in the blood values of fishes would further develop clinical pathology as a diagnostic tool, enhancing both fish medicine and the aquaculture industry. The results of standard hematology and clinical chemistry analysis were evaluated in juvenile hybrid striped bass at 4, 6, 9, 15, and 19 months of age. Values for PCV and RBC indices were significantly lower, and plasma protein concentration was significantly higher in younger fish. Total WBC and lymphocyte counts were significantly higher in fish at 6 and 9 months of age, while neutrophil and monocyte counts were higher at 6, 9, and 15 months. Eosinophil counts were significantly higher in 9-month-old fish. The majority of hematologic values fell within previously established reference intervals, indicating that only slight modification to the intervals is necessary for evaluating hematologic results of hybrid striped bass at different ages. The following analytes deviated sufficiently from adult reference intervals to warrant separate reference values: plasma protein concentration at 4 months, WBC and lymphocyte counts at 15 and 19 months, and thrombocyte-like-cells at 9 months of age. Values for most biochemical analytes were significantly different among age groups except for creatinine and potassium concentrations. Comparisons with reference intervals were not made for biochemical analytes, because established reference intervals were not available. Age-related changes in hematologic and biochemical values of striped bass were similar to those reported for rainbow trout and mammals.
Köke, A J A; Smeets, R J E M; Schreurs, K M; van Baalen, B; de Haan, P; Remerie, S C; Schiphorst Preuper, H R; Reneman, M F
2017-03-01
No core set of measurement tools exists to collect data within clinical practice. Such data could be useful as reference data to guide treatment decisions and to compare patient characteristics or treatment results within specific treatment settings. The Dutch Dataset Pain Rehabilitation was developed which included the six domains of the IMMPACT core set and three new domains relevant in the field of rehabilitation (medical consumption, patient-specific goals and activities/participation). Between 2010 and 2013 the core set was implemented in 32 rehabilitation facilities throughout the Netherlands. A total of 8200 adult patients with chronic pain completed the core set at first consultation with the rehabilitation physician. Adult patients (18-90 years) suffering from a long history of pain (38% >5 years) were referred. Patients had high medical consumption and less than half were working. Although patients were referred with diagnosis of low back pain or neck or shoulder pain, a large group (85%) had multisite pain (39% 2-5 painful body regions; 46% >5 painful body regions). Scores on psychosocial questionnaires were high, indicating high case complexity of referred patients. Reference data for subgroups based on gender, pain severity, pain locations and on pain duration are presented. The data from this clinical core set can be used to compare patient characteristics of patients of other treatment setting and/or scientific publications. As treatment success might depend on case complexity, which is high in the referred patients, the advantages of earlier referral to comprehensive multidisciplinary treatment were discussed. A detailed description of case complexity of patients with chronic pain referred for pain rehabilitation. Insight in case complexity of patients within subgroups on the basis of gender, pain duration, pain severity and pain location. These descriptions can be used as reference data for daily practice in the field of pain rehabilitation and can be used to evaluate, monitor and improve rehabilitation care in care settings nationwide as well as internationally. © 2016 European Pain Federation - EFIC®.
Heidler, Juliana; Hardt, Stefanie; Wittig, Ilka; Tegeder, Irmgard
2016-12-01
Progranulin deficiency is associated with neurodegeneration in humans and in mice. The mechanisms likely involve progranulin-promoted removal of protein waste via autophagy. We performed a deep proteomic screen of the pre-frontal cortex in aged (13-15 months) female progranulin-deficient mice (GRN -/- ) and mice with inducible neuron-specific overexpression of progranulin (SLICK-GRN-OE) versus the respective control mice. Proteins were extracted and analyzed per liquid chromatography/mass spectrometry (LC/MS) on a Thermo Scientific™ Q Exactive Plus equipped with an ultra-high performance liquid chromatography unit and a Nanospray Flex Ion-Source. Full Scan MS-data were acquired using Xcalibur and raw files were analyzed using the proteomics software Max Quant. The mouse reference proteome set from uniprot (June 2015) was used to identify peptides and proteins. The DiB data file is a reduced MaxQuant output and includes peptide and protein identification, accession numbers, protein and gene names, sequence coverage and label free quantification (LFQ) values of each sample. Differences in protein expression in genotypes are presented in "Progranulin overexpression in sensory neurons attenuates neuropathic pain in mice: Role of autophagy" (C. Altmann, S. Hardt, C. Fischer, J. Heidler, H.Y. Lim, A. Haussler, B. Albuquerque, B. Zimmer, C. Moser, C. Behrends, F. Koentgen, I. Wittig, M.H. Schmidt, A.M. Clement, T. Deller, I. Tegeder, 2016) [1].
Yamagata, Tetsuo; Zanelli, Ugo; Gallemann, Dieter; Perrin, Dominique; Dolgos, Hugues; Petersson, Carl
2017-09-01
1. We compared direct scaling, regression model equation and the so-called "Poulin et al." methods to scale clearance (CL) from in vitro intrinsic clearance (CL int ) measured in human hepatocytes using two sets of compounds. One reference set comprised of 20 compounds with known elimination pathways and one external evaluation set based on 17 compounds development in Merck (MS). 2. A 90% prospective confidence interval was calculated using the reference set. This interval was found relevant for the regression equation method. The three outliers identified were justified on the basis of their elimination mechanism. 3. The direct scaling method showed a systematic underestimation of clearance in both the reference and evaluation sets. The "Poulin et al." and the regression equation methods showed no obvious bias in either the reference or evaluation sets. 4. The regression model equation was slightly superior to the "Poulin et al." method in the reference set and showed a better absolute average fold error (AAFE) of value 1.3 compared to 1.6. A larger difference was observed in the evaluation set were the regression method and "Poulin et al." resulted in an AAFE of 1.7 and 2.6, respectively (removing the three compounds with known issues mentioned above). A similar pattern was observed for the correlation coefficient. Based on these data we suggest the regression equation method combined with a prospective confidence interval as the first choice for the extrapolation of human in vivo hepatic metabolic clearance from in vitro systems.
Two-dimensional proteome reference maps for the soybean cyst nematode Heterodera glycines
USDA-ARS?s Scientific Manuscript database
Two-dimensional electrophoresis (2-DE) reference maps of Heterodera glycines were constructed. After in-gel digestion with trypsin, 803 spots representing 426 proteins were subsequently identified by LC-MS/MS. Proteins with annotated function were further categorized by Gene Ontology. Results showed...
Flint, Mark; Matthews, Beren J.; Limpus, Colin J.; Mills, Paul C.
2015-01-01
Biochemical and haematological parameters are increasingly used to diagnose disease in green sea turtles. Specific clinical pathology tools, such as plasma protein electrophoresis analysis, are now being used more frequently to improve our ability to diagnose disease in the live animal. Plasma protein reference intervals were calculated from 55 clinically healthy green sea turtles using pulsed field electrophoresis to determine pre-albumin, albumin, α-, β- and γ-globulin concentrations. The estimated reference intervals were then compared with data profiles from clinically unhealthy turtles admitted to a local wildlife hospital to assess the validity of the derived intervals and identify the clinically useful plasma protein fractions. Eighty-six per cent {19 of 22 [95% confidence interval (CI) 65–97]} of clinically unhealthy turtles had values outside the derived reference intervals, including the following: total protein [six of 22 turtles or 27% (95% CI 11–50%)], pre-albumin [two of five, 40% (95% CI 5–85%)], albumin [13 of 22, 59% (95% CI 36–79%)], total albumin [13 of 22, 59% (95% CI 36–79%)], α- [10 of 22, 45% (95% CI 24–68%)], β- [two of 10, 20% (95% CI 3–56%)], γ- [one of 10, 10% (95% CI 0.3–45%)] and β–γ-globulin [one of 12, 8% (95% CI 0.2–38%)] and total globulin [five of 22, 23% (8–45%)]. Plasma protein electrophoresis shows promise as an accurate adjunct tool to identify a disease state in marine turtles. This study presents the first reference interval for plasma protein electrophoresis in the Indo-Pacific green sea turtle. PMID:27293722
Dölz, R; Mossé, M O; Slonimski, P P; Bairoch, A; Linder, P
1994-01-01
We continued our effort to make a comprehensive database (LISTA) for the yeast Saccharomyces cerevisiae. In this database each sequence has been attributed a single genetic name. In the case of duplicated sequences a simple method has been applied to distinguish between sequences of one and the same gene from non-allelic sequences of duplicated genes. If necessary, synonyms are given in the case of allelic duplicated sequences. Thus sequences can be found either by the name or by synonyms given in LISTA. Each entry contains the genetic name, the mnemonic from the EMBL data bank, the codon bias, reference of the publication of the sequence, Chromosomal location as far as known, Swissprot and EMBL accession numbers. To obtain more information on the included sequences, each entry has been screened against non-redundant nucleotide and protein data bank collections resulting in LISTA-HON and LISTA-HOP. The LISTA data base can be linked to the associated data sets or to nucleotide and protein banks by the Sequence Retrieval System (SRS). PMID:7937046
Protein and oil composition predictions of single soybeans by transmission Raman spectroscopy.
Schulmerich, Matthew V; Walsh, Michael J; Gelber, Matthew K; Kong, Rong; Kole, Matthew R; Harrison, Sandra K; McKinney, John; Thompson, Dennis; Kull, Linda S; Bhargava, Rohit
2012-08-22
The soybean industry requires rapid, accurate, and precise technologies for the analyses of seed/grain constituents. While the current gold standard for nondestructive quantification of economically and nutritionally important soybean components is near-infrared spectroscopy (NIRS), emerging technology may provide viable alternatives and lead to next generation instrumentation for grain compositional analysis. In principle, Raman spectroscopy provides the necessary chemical information to generate models for predicting the concentration of soybean constituents. In this communication, we explore the use of transmission Raman spectroscopy (TRS) for nondestructive soybean measurements. We show that TRS uses the light scattering properties of soybeans to effectively homogenize the heterogeneous bulk of a soybean for representative sampling. Working with over 1000 individual intact soybean seeds, we developed a simple partial least-squares model for predicting oil and protein content nondestructively. We find TRS to have a root-mean-standard error of prediction (RMSEP) of 0.89% for oil measurements and 0.92% for protein measurements. In both calibration and validation sets, the predicative capabilities of the model were similar to the error in the reference methods.
Kawabata, Takeshi; Nakamura, Haruki
2014-07-28
A protein-bound conformation of a target molecule can be predicted by aligning the target molecule on the reference molecule obtained from the 3D structure of the compound-protein complex. This strategy is called "similarity-based docking". For this purpose, we develop the flexible alignment program fkcombu, which aligns the target molecule based on atomic correspondences with the reference molecule. The correspondences are obtained by the maximum common substructure (MCS) of 2D chemical structures, using our program kcombu. The prediction performance was evaluated using many target-reference pairs of superimposed ligand 3D structures on the same protein in the PDB, with different ranges of chemical similarity. The details of atomic correspondence largely affected the prediction success. We found that topologically constrained disconnected MCS (TD-MCS) with the simple element-based atomic classification provides the best prediction. The crashing potential energy with the receptor protein improved the performance. We also found that the RMSD between the predicted and correct target conformations significantly correlates with the chemical similarities between target-reference molecules. Generally speaking, if the reference and target compounds have more than 70% chemical similarity, then the average RMSD of 3D conformations is <2.0 Å. We compared the performance with a rigid-body molecular alignment program based on volume-overlap scores (ShaEP). Our MCS-based flexible alignment program performed better than the rigid-body alignment program, especially when the target and reference molecules were sufficiently similar.
Perez, Romel B.; Tischer, Alexander; Auton, Matthew; Whitten, Steven T.
2014-01-01
Molecular transduction of biological signals is understood primarily in terms of the cooperative structural transitions of protein macromolecules, providing a mechanism through which discrete local structure perturbations affect global macromolecular properties. The recognition that proteins lacking tertiary stability, commonly referred to as intrinsically disordered proteins, mediate key signaling pathways suggests that protein structures without cooperative intramolecular interactions may also have the ability to couple local and global structure changes. Presented here are results from experiments that measured and tested the ability of disordered proteins to couple local changes in structure to global changes in structure. Using the intrinsically disordered N-terminal region of the p53 protein as an experimental model, a set of proline and alanine to glycine substitution variants were designed to modulate backbone conformational propensities without introducing non-native intramolecular interactions. The hydrodynamic radius (Rh) was used to monitor changes in global structure. Circular dichroism spectroscopy showed that the glycine substitutions decreased polyproline II (PPII) propensities relative to the wild type, as expected, and fluorescence methods indicated that substitution-induced changes in Rh were not associated with folding. The experiments showed that changes in local PPII structure cause changes in Rh that are variable and that depend on the intrinsic chain propensities of proline and alanine residues, demonstrating a mechanism for coupling local and global structure changes. Molecular simulations that model our results were used to extend the analysis to other proteins and illustrate the generality of the observed proline and alanine effects on the structures of intrinsically disordered proteins. PMID:25244701
Genetics Home Reference: leukoencephalopathy with vanishing white matter
... The eIF2B protein helps regulate overall protein production (synthesis) in the cell by interacting with another protein, ... because it is involved in starting (initiating) protein synthesis. Proper regulation of protein synthesis is vital for ...
7 CFR 801.7 - Reference methods and tolerances for near-infrared spectroscopy (NIRS) analyzers.
Code of Federal Regulations, 2013 CFR
2013-01-01
..._of_federal_regulations/ibr_locations.html. (b) Tolerances—(1) NIRS wheat protein analyzers. The... Method 992.23. (3) NIRS corn oil, protein, and starch analyzers. The maintenance tolerances for the NIRS... methods and tolerances for near-infrared spectroscopy (NIRS) analyzers. (a) Reference methods. (1) The...
7 CFR 801.7 - Reference methods and tolerances for near-infrared spectroscopy (NIRS) analyzers.
Code of Federal Regulations, 2011 CFR
2011-01-01
..._of_federal_regulations/ibr_locations.html. (b) Tolerances—(1) NIRS wheat protein analyzers. The... Method 992.23. (3) NIRS corn oil, protein, and starch analyzers. The maintenance tolerances for the NIRS... methods and tolerances for near-infrared spectroscopy (NIRS) analyzers. (a) Reference methods. (1) The...
7 CFR 801.7 - Reference methods and tolerances for near-infrared spectroscopy (NIRS) analyzers.
Code of Federal Regulations, 2014 CFR
2014-01-01
..._of_federal_regulations/ibr_locations.html. (b) Tolerances—(1) NIRS wheat protein analyzers. The... Method 992.23. (3) NIRS corn oil, protein, and starch analyzers. The maintenance tolerances for the NIRS... methods and tolerances for near-infrared spectroscopy (NIRS) analyzers. (a) Reference methods. (1) The...
7 CFR 801.7 - Reference methods and tolerances for near-infrared spectroscopy (NIRS) analyzers.
Code of Federal Regulations, 2012 CFR
2012-01-01
..._of_federal_regulations/ibr_locations.html. (b) Tolerances—(1) NIRS wheat protein analyzers. The... Method 992.23. (3) NIRS corn oil, protein, and starch analyzers. The maintenance tolerances for the NIRS... methods and tolerances for near-infrared spectroscopy (NIRS) analyzers. (a) Reference methods. (1) The...
Expression of SET Protein in the Ovaries of Patients with Polycystic Ovary Syndrome
Boqun, Xu; Xiaonan, Dai; YuGui, Cui; Lingling, Gao; Xue, Dai; Gao, Chao; Feiyang, Diao; Jiayin, Liu; Gao, Li; Li, Mei; Zhang, Yuan; Ma, Xiang
2013-01-01
Background. We previously found that expression of SET gene was up-regulated in polycystic ovaries by using microarray. It suggested that SET may be an attractive candidate regulator involved in the pathophysiology of polycystic ovary syndrome (PCOS). In this study, expression and cellular localization of SET protein were investigated in human polycystic and normal ovaries. Method. Ovarian tissues, six normal ovaries and six polycystic ovaries, were collected during transsexual operation and surgical treatment with the signed consent form. The cellular localization of SET protein was observed by immunohistochemistry. The expression levels of SET protein were analyzed by Western Blot. Result. SET protein was expressed predominantly in the theca cells and oocytes of human ovarian follicles in both PCOS ovarian tissues and normal ovarian tissues. The level of SET protein expression in polycystic ovaries was triple higher than that in normal ovaries (P < 0.05). Conclusion. SET was overexpressed in polycystic ovaries more than that in normal ovaries. Combined with its localization in theca cells, SET may participate in regulating ovarian androgen biosynthesis and the pathophysiology of hyperandrogenism in PCOS. PMID:23861679
Expression of SET Protein in the Ovaries of Patients with Polycystic Ovary Syndrome.
Boqun, Xu; Xiaonan, Dai; Yugui, Cui; Lingling, Gao; Xue, Dai; Gao, Chao; Feiyang, Diao; Jiayin, Liu; Gao, Li; Li, Mei; Zhang, Yuan; Ma, Xiang
2013-01-01
Background. We previously found that expression of SET gene was up-regulated in polycystic ovaries by using microarray. It suggested that SET may be an attractive candidate regulator involved in the pathophysiology of polycystic ovary syndrome (PCOS). In this study, expression and cellular localization of SET protein were investigated in human polycystic and normal ovaries. Method. Ovarian tissues, six normal ovaries and six polycystic ovaries, were collected during transsexual operation and surgical treatment with the signed consent form. The cellular localization of SET protein was observed by immunohistochemistry. The expression levels of SET protein were analyzed by Western Blot. Result. SET protein was expressed predominantly in the theca cells and oocytes of human ovarian follicles in both PCOS ovarian tissues and normal ovarian tissues. The level of SET protein expression in polycystic ovaries was triple higher than that in normal ovaries (P < 0.05). Conclusion. SET was overexpressed in polycystic ovaries more than that in normal ovaries. Combined with its localization in theca cells, SET may participate in regulating ovarian androgen biosynthesis and the pathophysiology of hyperandrogenism in PCOS.
Reference intervals for 24 laboratory parameters determined in 24-hour urine collections.
Curcio, Raffaele; Stettler, Helen; Suter, Paolo M; Aksözen, Jasmin Barman; Saleh, Lanja; Spanaus, Katharina; Bochud, Murielle; Minder, Elisabeth; von Eckardstein, Arnold
2016-01-01
Reference intervals for many laboratory parameters determined in 24-h urine collections are either not publicly available or based on small numbers, not sex specific or not from a representative sample. Osmolality and concentrations or enzymatic activities of sodium, potassium, chloride, glucose, creatinine, citrate, cortisol, pancreatic α-amylase, total protein, albumin, transferrin, immunoglobulin G, α1-microglobulin, α2-macroglobulin, as well as porphyrins and their precursors (δ-aminolevulinic acid and porphobilinogen) were determined in 241 24-h urine samples of a population-based cohort of asymptomatic adults (121 men and 120 women). For 16 of these 24 parameters creatinine-normalized ratios were calculated based on 24-h urine creatinine. The reference intervals for these parameters were calculated according to the CLSI C28-A3 statistical guidelines. By contrast to most published reference intervals, which do not stratify for sex, reference intervals of 12 of 24 laboratory parameters in 24-h urine collections and of eight of 16 parameters as creatinine-normalized ratios differed significantly between men and women. For six parameters calculated as 24-h urine excretion and four parameters calculated as creatinine-normalized ratios no reference intervals had been published before. For some parameters we found significant and relevant deviations from previously reported reference intervals, most notably for 24-h urine cortisol in women. Ten 24-h urine parameters showed weak or moderate sex-specific correlations with age. By applying up-to-date analytical methods and clinical chemistry analyzers to 24-h urine collections from a large population-based cohort we provide as yet the most comprehensive set of sex-specific reference intervals calculated according to CLSI guidelines for parameters determined in 24-h urine collections.
Extracting sets of chemical substructures and protein domains governing drug-target interactions.
Yamanishi, Yoshihiro; Pauwels, Edouard; Saigo, Hiroto; Stoven, Véronique
2011-05-23
The identification of rules governing molecular recognition between drug chemical substructures and protein functional sites is a challenging issue at many stages of the drug development process. In this paper we develop a novel method to extract sets of drug chemical substructures and protein domains that govern drug-target interactions on a genome-wide scale. This is made possible using sparse canonical correspondence analysis (SCCA) for analyzing drug substructure profiles and protein domain profiles simultaneously. The method does not depend on the availability of protein 3D structures. From a data set of known drug-target interactions including enzymes, ion channels, G protein-coupled receptors, and nuclear receptors, we extract a set of chemical substructures shared by drugs able to bind to a set of protein domains. These two sets of extracted chemical substructures and protein domains form components that can be further exploited in a drug discovery process. This approach successfully clusters protein domains that may be evolutionary unrelated but that bind a common set of chemical substructures. As shown in several examples, it can also be very helpful for predicting new protein-ligand interactions and addressing the problem of ligand specificity. The proposed method constitutes a contribution to the recent field of chemogenomics that aims to connect the chemical space with the biological space.
Li, Ling-Wei; Fan, Li-Qing; Zhu, Wen-Bing; Nien, Hong-Chuan; Sun, Bo-Lan; Luo, Ke-Li; Liao, Ting-Ting; Tang, Le; Lu, Guang-Xiu
2007-05-01
To extend the analysis of the proteome of human spermatozoa and establish a 2-D gel electrophoresis (2-DE) reference map of human spermatozoal proteins in a pH range of 3.5-9.0. In order to reveal more protein spots, immobilized pH gradient strips (24 cm) of broad range of pH 3-10 and the narrower range of pH 6-9, as well as different overlapping narrow range pH immobilized pH gradient (IPG) strips, including 3.5-4.5, 4.0-5.0, 4.5-5.5, 5.0-6.0 and 5.5-6.7, were used. After 2-DE, several visually identical spots between the different pH range 2-D gel pairs were cut from the gels and confirmed by mass spectrometry and used as landmarks for computer analysis. The 2-D reference map with pH value from 3.5 to 9.0 was synthesized by using the ImageMaster analysis software. The overlapping spots were excluded, so that every spot was counted only once. A total of 3872 different protein spots were identified from the reference map, an approximately 3-fold increase compared to the broad range pH 3-10 IPG strip (1306 spots). The present 2-D pattern is a high resolution 2-D reference map for human fertile spermatozoal protein spots. A comprehensive knowledge of the protein composition of human spermatozoa is very meaningful in studying dysregulation of male fertility.
Severi, Leda; Losi, Lorena; Fonda, Sergio; Taddia, Laura; Gozzi, Gaia; Marverti, Gaetano; Magni, Fulvio; Chinello, Clizia; Stella, Martina; Sheouli, Jalid; Braicu, Elena I; Genovese, Filippo; Lauriola, Angela; Marraccini, Chiara; Gualandi, Alessandra; D'Arca, Domenico; Ferrari, Stefania; Costi, Maria P
2018-01-01
Proteomics and bioinformatics are a useful combined technology for the characterization of protein expression level and modulation associated with the response to a drug and with its mechanism of action. The folate pathway represents an important target in the anticancer drugs therapy. In the present study, a discovery proteomics approach was applied to tissue samples collected from ovarian cancer patients who relapsed after the first-line carboplatin-based chemotherapy and were treated with pemetrexed (PMX), a known folate pathway targeting drug. The aim of the work is to identify the proteomic profile that can be associated to the response to the PMX treatment in pre-treatement tissue. Statistical metrics of the experimental Mass Spectrometry (MS) data were combined with a knowledge-based approach that included bioinformatics and a literature review through ProteinQuest™ tool, to design a protein set of reference (PSR). The PSR provides feedback for the consistency of MS proteomic data because it includes known validated proteins. A panel of 24 proteins with levels that were significantly different in pre-treatment samples of patients who responded to the therapy vs. the non-responder ones, was identified. The differences of the identified proteins were explained for the patients with different outcomes and the known PMX targets were further validated. The protein panel herein identified is ready for further validation in retrospective clinical trials using a targeted proteomic approach. This study may have a general relevant impact on biomarker application for cancer patients therapy selection.
Predicting protein-protein interactions from protein domains using a set cover approach.
Huang, Chengbang; Morcos, Faruck; Kanaan, Simon P; Wuchty, Stefan; Chen, Danny Z; Izaguirre, Jesús A
2007-01-01
One goal of contemporary proteome research is the elucidation of cellular protein interactions. Based on currently available protein-protein interaction and domain data, we introduce a novel method, Maximum Specificity Set Cover (MSSC), for the prediction of protein-protein interactions. In our approach, we map the relationship between interactions of proteins and their corresponding domain architectures to a generalized weighted set cover problem. The application of a greedy algorithm provides sets of domain interactions which explain the presence of protein interactions to the largest degree of specificity. Utilizing domain and protein interaction data of S. cerevisiae, MSSC enables prediction of previously unknown protein interactions, links that are well supported by a high tendency of coexpression and functional homogeneity of the corresponding proteins. Focusing on concrete examples, we show that MSSC reliably predicts protein interactions in well-studied molecular systems, such as the 26S proteasome and RNA polymerase II of S. cerevisiae. We also show that the quality of the predictions is comparable to the Maximum Likelihood Estimation while MSSC is faster. This new algorithm and all data sets used are accessible through a Web portal at http://ppi.cse.nd.edu.
2012-01-01
Background Haemophilus parasuis is the causative agent of Glässer’s disease and is a pathogen of swine in high-health status herds. Reports on serotyping of field strains from outbreaks describe that approximately 30% of them are nontypeable and therefore cannot be traced. Molecular typing methods have been used as alternatives to serotyping. This study was done to compare random amplified polymorphic DNA (RAPD) profiles and whole cell protein (WCP) lysate profiles as methods for distinguishing H. parasuis reference strains and field isolates. Results The DNA and WCP lysate profiles of 15 reference strains and 31 field isolates of H. parasuis were analyzed using the Dice and neighbor joining algorithms. The results revealed unique and reproducible DNA and protein profiles among the reference strains and field isolates studied. Simpson’s index of diversity showed significant discrimination between isolates when three 10mer primers were combined for the RAPD method and also when both the RAPD and WCP lysate typing methods were combined. Conclusions The RAPD profiles seen among the reference strains and field isolates did not appear to change over time which may reflect a lack of DNA mutations in the genes of the samples. The recent field isolates had different WCP lysate profiles than the reference strains, possibly because the number of passages of the type strains may affect their protein expression. PMID:22703293
Cysteine-rich domains related to Frizzled receptors and Hedgehog-interacting proteins
Pei, Jimin; Grishin, Nick V
2012-01-01
Frizzled and Smoothened are homologous seven-transmembrane proteins functioning in the Wnt and Hedgehog signaling pathways, respectively. They harbor an extracellular cysteine-rich domain (FZ-CRD), a mobile evolutionary unit that has been found in a number of other metazoan proteins and Frizzled-like proteins in Dictyostelium. Domains distantly related to FZ-CRDs, in Hedgehog-interacting proteins (HHIPs), folate receptors and riboflavin-binding proteins (FRBPs), and Niemann-Pick Type C1 proteins (NPC1s), referred to as HFN-CRDs, exhibit similar structures and disulfide connectivity patterns compared with FZ-CRDs. We used computational analyses to expand the homologous set of FZ-CRDs and HFN-CRDs, providing a better understanding of their evolution and classification. First, FZ-CRD-containing proteins with various domain compositions were identified in several major eukaryotic lineages including plants and Chromalveolata, revealing a wider phylogenetic distribution of FZ-CRDs than previously recognized. Second, two new and distinct groups of highly divergent FZ-CRDs were found by sensitive similarity searches. One of them is present in the calcium channel component Mid1 in fungi and the uncharacterized FAM155 proteins in metazoans. Members of the other new FZ-CRD group occur in the metazoan-specific RECK (reversion-inducing-cysteine-rich protein with Kazal motifs) proteins that are putative tumor suppressors acting as inhibitors of matrix metalloproteases. Finally, sequence and three-dimensional structural comparisons helped us uncover a divergent HFN-CRD in glypicans, which are important morphogen-binding heparan sulfate proteoglycans. Such a finding reinforces the evolutionary ties between the Wnt and Hedgehog signaling pathways and underscores the importance of gene duplications in creating essential signaling components in metazoan evolution. PMID:22693159
MUSCLE: multiple sequence alignment with high accuracy and high throughput.
Edgar, Robert C
2004-01-01
We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log-expectation score, and refinement using tree-dependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.
The zebrafish reference genome sequence and its relationship to the human genome.
Howe, Kerstin; Clark, Matthew D; Torroja, Carlos F; Torrance, James; Berthelot, Camille; Muffato, Matthieu; Collins, John E; Humphray, Sean; McLaren, Karen; Matthews, Lucy; McLaren, Stuart; Sealy, Ian; Caccamo, Mario; Churcher, Carol; Scott, Carol; Barrett, Jeffrey C; Koch, Romke; Rauch, Gerd-Jörg; White, Simon; Chow, William; Kilian, Britt; Quintais, Leonor T; Guerra-Assunção, José A; Zhou, Yi; Gu, Yong; Yen, Jennifer; Vogel, Jan-Hinnerk; Eyre, Tina; Redmond, Seth; Banerjee, Ruby; Chi, Jianxiang; Fu, Beiyuan; Langley, Elizabeth; Maguire, Sean F; Laird, Gavin K; Lloyd, David; Kenyon, Emma; Donaldson, Sarah; Sehra, Harminder; Almeida-King, Jeff; Loveland, Jane; Trevanion, Stephen; Jones, Matt; Quail, Mike; Willey, Dave; Hunt, Adrienne; Burton, John; Sims, Sarah; McLay, Kirsten; Plumb, Bob; Davis, Joy; Clee, Chris; Oliver, Karen; Clark, Richard; Riddle, Clare; Elliot, David; Eliott, David; Threadgold, Glen; Harden, Glenn; Ware, Darren; Begum, Sharmin; Mortimore, Beverley; Mortimer, Beverly; Kerry, Giselle; Heath, Paul; Phillimore, Benjamin; Tracey, Alan; Corby, Nicole; Dunn, Matthew; Johnson, Christopher; Wood, Jonathan; Clark, Susan; Pelan, Sarah; Griffiths, Guy; Smith, Michelle; Glithero, Rebecca; Howden, Philip; Barker, Nicholas; Lloyd, Christine; Stevens, Christopher; Harley, Joanna; Holt, Karen; Panagiotidis, Georgios; Lovell, Jamieson; Beasley, Helen; Henderson, Carl; Gordon, Daria; Auger, Katherine; Wright, Deborah; Collins, Joanna; Raisen, Claire; Dyer, Lauren; Leung, Kenric; Robertson, Lauren; Ambridge, Kirsty; Leongamornlert, Daniel; McGuire, Sarah; Gilderthorp, Ruth; Griffiths, Coline; Manthravadi, Deepa; Nichol, Sarah; Barker, Gary; Whitehead, Siobhan; Kay, Michael; Brown, Jacqueline; Murnane, Clare; Gray, Emma; Humphries, Matthew; Sycamore, Neil; Barker, Darren; Saunders, David; Wallis, Justene; Babbage, Anne; Hammond, Sian; Mashreghi-Mohammadi, Maryam; Barr, Lucy; Martin, Sancha; Wray, Paul; Ellington, Andrew; Matthews, Nicholas; Ellwood, Matthew; Woodmansey, Rebecca; Clark, Graham; Cooper, James D; Cooper, James; Tromans, Anthony; Grafham, Darren; Skuce, Carl; Pandian, Richard; Andrews, Robert; Harrison, Elliot; Kimberley, Andrew; Garnett, Jane; Fosker, Nigel; Hall, Rebekah; Garner, Patrick; Kelly, Daniel; Bird, Christine; Palmer, Sophie; Gehring, Ines; Berger, Andrea; Dooley, Christopher M; Ersan-Ürün, Zübeyde; Eser, Cigdem; Geiger, Horst; Geisler, Maria; Karotki, Lena; Kirn, Anette; Konantz, Judith; Konantz, Martina; Oberländer, Martina; Rudolph-Geiger, Silke; Teucke, Mathias; Lanz, Christa; Raddatz, Günter; Osoegawa, Kazutoyo; Zhu, Baoli; Rapp, Amanda; Widaa, Sara; Langford, Cordelia; Yang, Fengtang; Schuster, Stephan C; Carter, Nigel P; Harrow, Jennifer; Ning, Zemin; Herrero, Javier; Searle, Steve M J; Enright, Anton; Geisler, Robert; Plasterk, Ronald H A; Lee, Charles; Westerfield, Monte; de Jong, Pieter J; Zon, Leonard I; Postlethwait, John H; Nüsslein-Volhard, Christiane; Hubbard, Tim J P; Roest Crollius, Hugues; Rogers, Jane; Stemple, Derek L
2013-04-25
Zebrafish have become a popular organism for the study of vertebrate gene function. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.
The zebrafish reference genome sequence and its relationship to the human genome
Howe, Kerstin; Clark, Matthew D.; Torroja, Carlos F.; Torrance, James; Berthelot, Camille; Muffato, Matthieu; Collins, John E.; Humphray, Sean; McLaren, Karen; Matthews, Lucy; McLaren, Stuart; Sealy, Ian; Caccamo, Mario; Churcher, Carol; Scott, Carol; Barrett, Jeffrey C.; Koch, Romke; Rauch, Gerd-Jörg; White, Simon; Chow, William; Kilian, Britt; Quintais, Leonor T.; Guerra-Assunção, José A.; Zhou, Yi; Gu, Yong; Yen, Jennifer; Vogel, Jan-Hinnerk; Eyre, Tina; Redmond, Seth; Banerjee, Ruby; Chi, Jianxiang; Fu, Beiyuan; Langley, Elizabeth; Maguire, Sean F.; Laird, Gavin K.; Lloyd, David; Kenyon, Emma; Donaldson, Sarah; Sehra, Harminder; Almeida-King, Jeff; Loveland, Jane; Trevanion, Stephen; Jones, Matt; Quail, Mike; Willey, Dave; Hunt, Adrienne; Burton, John; Sims, Sarah; McLay, Kirsten; Plumb, Bob; Davis, Joy; Clee, Chris; Oliver, Karen; Clark, Richard; Riddle, Clare; Eliott, David; Threadgold, Glen; Harden, Glenn; Ware, Darren; Mortimer, Beverly; Kerry, Giselle; Heath, Paul; Phillimore, Benjamin; Tracey, Alan; Corby, Nicole; Dunn, Matthew; Johnson, Christopher; Wood, Jonathan; Clark, Susan; Pelan, Sarah; Griffiths, Guy; Smith, Michelle; Glithero, Rebecca; Howden, Philip; Barker, Nicholas; Stevens, Christopher; Harley, Joanna; Holt, Karen; Panagiotidis, Georgios; Lovell, Jamieson; Beasley, Helen; Henderson, Carl; Gordon, Daria; Auger, Katherine; Wright, Deborah; Collins, Joanna; Raisen, Claire; Dyer, Lauren; Leung, Kenric; Robertson, Lauren; Ambridge, Kirsty; Leongamornlert, Daniel; McGuire, Sarah; Gilderthorp, Ruth; Griffiths, Coline; Manthravadi, Deepa; Nichol, Sarah; Barker, Gary; Whitehead, Siobhan; Kay, Michael; Brown, Jacqueline; Murnane, Clare; Gray, Emma; Humphries, Matthew; Sycamore, Neil; Barker, Darren; Saunders, David; Wallis, Justene; Babbage, Anne; Hammond, Sian; Mashreghi-Mohammadi, Maryam; Barr, Lucy; Martin, Sancha; Wray, Paul; Ellington, Andrew; Matthews, Nicholas; Ellwood, Matthew; Woodmansey, Rebecca; Clark, Graham; Cooper, James; Tromans, Anthony; Grafham, Darren; Skuce, Carl; Pandian, Richard; Andrews, Robert; Harrison, Elliot; Kimberley, Andrew; Garnett, Jane; Fosker, Nigel; Hall, Rebekah; Garner, Patrick; Kelly, Daniel; Bird, Christine; Palmer, Sophie; Gehring, Ines; Berger, Andrea; Dooley, Christopher M.; Ersan-Ürün, Zübeyde; Eser, Cigdem; Geiger, Horst; Geisler, Maria; Karotki, Lena; Kirn, Anette; Konantz, Judith; Konantz, Martina; Oberländer, Martina; Rudolph-Geiger, Silke; Teucke, Mathias; Osoegawa, Kazutoyo; Zhu, Baoli; Rapp, Amanda; Widaa, Sara; Langford, Cordelia; Yang, Fengtang; Carter, Nigel P.; Harrow, Jennifer; Ning, Zemin; Herrero, Javier; Searle, Steve M. J.; Enright, Anton; Geisler, Robert; Plasterk, Ronald H. A.; Lee, Charles; Westerfield, Monte; de Jong, Pieter J.; Zon, Leonard I.; Postlethwait, John H.; Nüsslein-Volhard, Christiane; Hubbard, Tim J. P.; Crollius, Hugues Roest; Rogers, Jane; Stemple, Derek L.
2013-01-01
Zebrafish have become a popular organism for the study of vertebrate gene function1,2. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease3–5. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes6, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination. PMID:23594743
Ovesná, Jaroslava; Kučera, Ladislav; Vaculová, Kateřina; Štrymplová, Kamila; Svobodová, Ilona; Milella, Luigi
2012-01-01
Reverse transcription coupled with real-time quantitative PCR (RT-qPCR) is a frequently used method for gene expression profiling. Reference genes (RGs) are commonly employed to normalize gene expression data. A limited information exist on the gene expression and profiling in developing barley caryopsis. Expression stability was assessed by measuring the cycle threshold (Ct) range and applying both the GeNorm (pair-wise comparison of geometric means) and Normfinder (model-based approach) principles for the calculation. Here, we have identified a set of four RGs suitable for studying gene expression in the developing barley caryopsis. These encode the proteins GAPDH, HSP90, HSP70 and ubiquitin. We found a correlation between the frequency of occurrence of a transcript in silico and its suitability as an RG. This set of RGs was tested by comparing the normalized level of β-amylase (β-amy1) transcript with directly measured quantities of the BMY1 gene product in the developing barley caryopsis. This panel of genes could be used for other gene expression studies, as well as to optimize β-amy1 analysis for study of the impact of β-amy1 expression upon barley end-use quality.
Evaluation of the Immunoquick+4 malaria rapid diagnostic test in a non-endemic setting.
van Dijk, D P J; Gillet, P; Vlieghe, E; Cnops, L; Van Esbroeck, M; Jacobs, J
2010-05-01
The aim of this retrospective study was to evaluate the Immunoquick+4 (BioSynex, Strasbourg, France), a three-band malaria rapid diagnostic test (MRDT) targeting histidine-rich protein-2 (HRP-2) and pan Plasmodium-specific parasite lactate dehydrogenase, in a non-endemic reference setting. Stored whole-blood samples (n = 613) from international travellers suspected of malaria were used, with microscopy corrected by polymerase chain reaction (PCR) as the reference method. Samples infected by P. falciparum (n = 323), P. vivax (n = 97), P. ovale (n = 73) and P. malariae (n = 25) were selected, as well as 95 malaria-negative samples. The overall sensitivities of the Immunoquick+4 for the diagnosis of P. falciparum, P. vivax, P. malariae and P. ovale were 88.9, 75.3, 56.0 and 19.2%, respectively. Sensitivity was significantly related to parasite density for P. falciparum (93.6% versus 71.4% at parasite densities >100/microl and
Superior Cross-Species Reference Genes: A Blueberry Case Study
Die, Jose V.; Rowland, Lisa J.
2013-01-01
The advent of affordable Next Generation Sequencing technologies has had major impact on studies of many crop species, where access to genomic technologies and genome-scale data sets has been extremely limited until now. The recent development of genomic resources in blueberry will enable the application of high throughput gene expression approaches that should relatively quickly increase our understanding of blueberry physiology. These studies, however, require a highly accurate and robust workflow and make necessary the identification of reference genes with high expression stability for correct target gene normalization. To create a set of superior reference genes for blueberry expression analyses, we mined a publicly available transcriptome data set from blueberry for orthologs to a set of Arabidopsis genes that showed the most stable expression in a developmental series. In total, the expression stability of 13 putative reference genes was evaluated by qPCR and a set of new references with high stability values across a developmental series in fruits and floral buds of blueberry were identified. We also demonstrated the need to use at least two, preferably three, reference genes to avoid inconsistencies in results, even when superior reference genes are used. The new references identified here provide a valuable resource for accurate normalization of gene expression in Vaccinium spp. and may be useful for other members of the Ericaceae family as well. PMID:24058469
Genetics Home Reference: CLN4 disease
... with each other. Specifically, CSPα is involved in recycling certain proteins that are involved in nerve impulse ... protein cannot perform its function, which reduces protein recycling, causing a shortage (deficiency) of functional proteins needed ...
Genetics Home Reference: protein C deficiency
... Twitter Home Health Conditions Protein C deficiency Protein C deficiency Printable PDF Open All Close All Enable ... to view the expand/collapse boxes. Description Protein C deficiency is a disorder that increases the risk ...
A PIXEL COMPOSITION-BASED REFERENCE DATA SET FOR THEMATIC ACCURACY ASSESSMENT
Developing reference data sets for accuracy assessment of land-cover classifications derived from coarse spatial resolution sensors such as MODIS can be difficult due to the large resolution differences between the image data and available reference data sources. Ideally, the spa...
Ghadie, Mohamed Ali; Lambourne, Luke; Vidal, Marc; Xia, Yu
2017-08-01
Alternative splicing is known to remodel protein-protein interaction networks ("interactomes"), yet large-scale determination of isoform-specific interactions remains challenging. We present a domain-based method to predict the isoform interactome from the reference interactome. First, we construct the domain-resolved reference interactome by mapping known domain-domain interactions onto experimentally-determined interactions between reference proteins. Then, we construct the isoform interactome by predicting that an isoform loses an interaction if it loses the domain mediating the interaction. Our prediction framework is of high-quality when assessed by experimental data. The predicted human isoform interactome reveals extensive network remodeling by alternative splicing. Protein pairs interacting with different isoforms of the same gene tend to be more divergent in biological function, tissue expression, and disease phenotype than protein pairs interacting with the same isoforms. Our prediction method complements experimental efforts, and demonstrates that integrating structural domain information with interactomes provides insights into the functional impact of alternative splicing.
Lambourne, Luke; Vidal, Marc
2017-01-01
Alternative splicing is known to remodel protein-protein interaction networks (“interactomes”), yet large-scale determination of isoform-specific interactions remains challenging. We present a domain-based method to predict the isoform interactome from the reference interactome. First, we construct the domain-resolved reference interactome by mapping known domain-domain interactions onto experimentally-determined interactions between reference proteins. Then, we construct the isoform interactome by predicting that an isoform loses an interaction if it loses the domain mediating the interaction. Our prediction framework is of high-quality when assessed by experimental data. The predicted human isoform interactome reveals extensive network remodeling by alternative splicing. Protein pairs interacting with different isoforms of the same gene tend to be more divergent in biological function, tissue expression, and disease phenotype than protein pairs interacting with the same isoforms. Our prediction method complements experimental efforts, and demonstrates that integrating structural domain information with interactomes provides insights into the functional impact of alternative splicing. PMID:28846689
Ma, Yue; Tuskan, Gerald A.
2018-01-01
The existence of complete genome sequences makes it important to develop different approaches for classification of large-scale data sets and to make extraction of biological insights easier. Here, we propose an approach for classification of complete proteomes/protein sets based on protein distributions on some basic attributes. We demonstrate the usefulness of this approach by determining protein distributions in terms of two attributes: protein lengths and protein intrinsic disorder contents (ID). The protein distributions based on L and ID are surveyed for representative proteome organisms and protein sets from the three domains of life. The two-dimensional maps (designated as fingerprints here) from the protein distribution densities in the LD space defined by ln(L) and ID are then constructed. The fingerprints for different organisms and protein sets are found to be distinct with each other, and they can therefore be used for comparative studies. As a test case, phylogenetic trees have been constructed based on the protein distribution densities in the fingerprints of proteomes of organisms without performing any protein sequence comparison and alignments. The phylogenetic trees generated are biologically meaningful, demonstrating that the protein distributions in the LD space may serve as unique phylogenetic signals of the organisms at the proteome level. PMID:29686995
Eronen, Lauri; Toivonen, Hannu
2012-06-06
Biological databases contain large amounts of data concerning the functions and associations of genes and proteins. Integration of data from several such databases into a single repository can aid the discovery of previously unknown connections spanning multiple types of relationships and databases. Biomine is a system that integrates cross-references from several biological databases into a graph model with multiple types of edges, such as protein interactions, gene-disease associations and gene ontology annotations. Edges are weighted based on their type, reliability, and informativeness. We present Biomine and evaluate its performance in link prediction, where the goal is to predict pairs of nodes that will be connected in the future, based on current data. In particular, we formulate protein interaction prediction and disease gene prioritization tasks as instances of link prediction. The predictions are based on a proximity measure computed on the integrated graph. We consider and experiment with several such measures, and perform a parameter optimization procedure where different edge types are weighted to optimize link prediction accuracy. We also propose a novel method for disease-gene prioritization, defined as finding a subset of candidate genes that cluster together in the graph. We experimentally evaluate Biomine by predicting future annotations in the source databases and prioritizing lists of putative disease genes. The experimental results show that Biomine has strong potential for predicting links when a set of selected candidate links is available. The predictions obtained using the entire Biomine dataset are shown to clearly outperform ones obtained using any single source of data alone, when different types of links are suitably weighted. In the gene prioritization task, an established reference set of disease-associated genes is useful, but the results show that under favorable conditions, Biomine can also perform well when no such information is available.The Biomine system is a proof of concept. Its current version contains 1.1 million entities and 8.1 million relations between them, with focus on human genetics. Some of its functionalities are available in a public query interface at http://biomine.cs.helsinki.fi, allowing searching for and visualizing connections between given biological entities.
Lavallée-Adam, Mathieu; Rauniyar, Navin; McClatchy, Daniel B; Yates, John R
2014-12-05
The majority of large-scale proteomics quantification methods yield long lists of quantified proteins that are often difficult to interpret and poorly reproduced. Computational approaches are required to analyze such intricate quantitative proteomics data sets. We propose a statistical approach to computationally identify protein sets (e.g., Gene Ontology (GO) terms) that are significantly enriched with abundant proteins with reproducible quantification measurements across a set of replicates. To this end, we developed PSEA-Quant, a protein set enrichment analysis algorithm for label-free and label-based protein quantification data sets. It offers an alternative approach to classic GO analyses, models protein annotation biases, and allows the analysis of samples originating from a single condition, unlike analogous approaches such as GSEA and PSEA. We demonstrate that PSEA-Quant produces results complementary to GO analyses. We also show that PSEA-Quant provides valuable information about the biological processes involved in cystic fibrosis using label-free protein quantification of a cell line expressing a CFTR mutant. Finally, PSEA-Quant highlights the differences in the mechanisms taking place in the human, rat, and mouse brain frontal cortices based on tandem mass tag quantification. Our approach, which is available online, will thus improve the analysis of proteomics quantification data sets by providing meaningful biological insights.
2015-01-01
The majority of large-scale proteomics quantification methods yield long lists of quantified proteins that are often difficult to interpret and poorly reproduced. Computational approaches are required to analyze such intricate quantitative proteomics data sets. We propose a statistical approach to computationally identify protein sets (e.g., Gene Ontology (GO) terms) that are significantly enriched with abundant proteins with reproducible quantification measurements across a set of replicates. To this end, we developed PSEA-Quant, a protein set enrichment analysis algorithm for label-free and label-based protein quantification data sets. It offers an alternative approach to classic GO analyses, models protein annotation biases, and allows the analysis of samples originating from a single condition, unlike analogous approaches such as GSEA and PSEA. We demonstrate that PSEA-Quant produces results complementary to GO analyses. We also show that PSEA-Quant provides valuable information about the biological processes involved in cystic fibrosis using label-free protein quantification of a cell line expressing a CFTR mutant. Finally, PSEA-Quant highlights the differences in the mechanisms taking place in the human, rat, and mouse brain frontal cortices based on tandem mass tag quantification. Our approach, which is available online, will thus improve the analysis of proteomics quantification data sets by providing meaningful biological insights. PMID:25177766
USDA-ARS?s Scientific Manuscript database
MOCASSIN-prot is a software, implemented in Perl and Matlab, for constructing protein similarity networks to classify proteins. Both domain composition and quantitative sequence similarity information are utilized in constructing the directed protein similarity networks. For each reference protein i...
Genetics Home Reference: HSD10 disease
... in the production (synthesis) of proteins . While most protein synthesis occurs in the fluid surrounding the nucleus (cytoplasm), ... few proteins are synthesized in the mitochondria. During protein synthesis, in either the mitochondria or the cytoplasm, molecules ...
Genetics Home Reference: TRNT1 deficiency
... in the production (synthesis) of other proteins. During protein synthesis, a molecule called transfer RNA (tRNA) helps assemble ... thought to be less able to participate in protein synthesis. Researchers suspect that protein synthesis in cellular structures ...
Perez, Romel B; Tischer, Alexander; Auton, Matthew; Whitten, Steven T
2014-12-01
Molecular transduction of biological signals is understood primarily in terms of the cooperative structural transitions of protein macromolecules, providing a mechanism through which discrete local structure perturbations affect global macromolecular properties. The recognition that proteins lacking tertiary stability, commonly referred to as intrinsically disordered proteins (IDPs), mediate key signaling pathways suggests that protein structures without cooperative intramolecular interactions may also have the ability to couple local and global structure changes. Presented here are results from experiments that measured and tested the ability of disordered proteins to couple local changes in structure to global changes in structure. Using the intrinsically disordered N-terminal region of the p53 protein as an experimental model, a set of proline (PRO) and alanine (ALA) to glycine (GLY) substitution variants were designed to modulate backbone conformational propensities without introducing non-native intramolecular interactions. The hydrodynamic radius (R(h)) was used to monitor changes in global structure. Circular dichroism spectroscopy showed that the GLY substitutions decreased polyproline II (PP(II)) propensities relative to the wild type, as expected, and fluorescence methods indicated that substitution-induced changes in R(h) were not associated with folding. The experiments showed that changes in local PP(II) structure cause changes in R(h) that are variable and that depend on the intrinsic chain propensities of PRO and ALA residues, demonstrating a mechanism for coupling local and global structure changes. Molecular simulations that model our results were used to extend the analysis to other proteins and illustrate the generality of the observed PRO and alanine effects on the structures of IDPs. © 2014 Wiley Periodicals, Inc.
Relationship between Hot Spot Residues and Ligand Binding Hot Spots in Protein-Protein Interfaces
Zerbe, Brandon S.; Hall, David R.
2013-01-01
In the context of protein-protein interactions, the term “hot spot” refers to a residue or cluster of residues that makes a major contribution to the binding free energy, as determined by alanine scanning mutagenesis. In contrast, in pharmaceutical research a hot spot is a site on a target protein that has high propensity for ligand binding and hence is potentially important for drug discovery. Here we examine the relationship between these two hot spot concepts by comparing alanine scanning data for a set of 15 proteins with results from mapping the protein surfaces for sites that can bind fragment-sized small molecules. We find the two types of hot spots are largely complementary; the residues protruding into hot spot regions identified by computational mapping or experimental fragment screening are almost always themselves hot spot residues as defined by alanine scanning experiments. Conversely, a residue that is found by alanine scanning to contribute little to binding rarely interacts with hot spot regions on the partner protein identified by fragment mapping. In spite of the strong correlation between the two hot spot concepts, they fundamentally differ, however. In particular, while identification of a hot spot by alanine scanning establishes the potential to generate substantial interaction energy with a binding partner, there are additional topological requirements to be a hot spot for small molecule binding. Hence, only a minority of hot spots identified by alanine scanning represent sites that are potentially useful for small inhibitor binding, and it is this subset that is identified by experimental or computational fragment screening. PMID:22770357
Relationship between hot spot residues and ligand binding hot spots in protein-protein interfaces.
Zerbe, Brandon S; Hall, David R; Vajda, Sandor; Whitty, Adrian; Kozakov, Dima
2012-08-27
In the context of protein-protein interactions, the term "hot spot" refers to a residue or cluster of residues that makes a major contribution to the binding free energy, as determined by alanine scanning mutagenesis. In contrast, in pharmaceutical research, a hot spot is a site on a target protein that has high propensity for ligand binding and hence is potentially important for drug discovery. Here we examine the relationship between these two hot spot concepts by comparing alanine scanning data for a set of 15 proteins with results from mapping the protein surfaces for sites that can bind fragment-sized small molecules. We find the two types of hot spots are largely complementary; the residues protruding into hot spot regions identified by computational mapping or experimental fragment screening are almost always themselves hot spot residues as defined by alanine scanning experiments. Conversely, a residue that is found by alanine scanning to contribute little to binding rarely interacts with hot spot regions on the partner protein identified by fragment mapping. In spite of the strong correlation between the two hot spot concepts, they fundamentally differ, however. In particular, while identification of a hot spot by alanine scanning establishes the potential to generate substantial interaction energy with a binding partner, there are additional topological requirements to be a hot spot for small molecule binding. Hence, only a minority of hot spots identified by alanine scanning represent sites that are potentially useful for small inhibitor binding, and it is this subset that is identified by experimental or computational fragment screening.
Evaluating minimalist mimics by exploring key orientations on secondary structures (EKOS)☟
Xin, Dongyue; Ko, Eunhwa; Perez, Lisa M.; Ioerger, Thomas R.; Burgess, Kevin
2013-01-01
Peptide mimics that display amino acid side-chains on semi-rigid scaffolds (not peptide polyamides) can be referred to as minimalist mimics. Accessible conformations of these scaffolds may overlay with secondary structures giving, for example, “minimalist helical mimics”. It is difficult for researchers who want to apply minimalist mimics to decide which one to use because there is no widely accepted protocol for calibrating how closely these compounds mimic secondary structures. Moreover, it is also difficult for potential practitioners to evaluate which ideal minimalist helical mimics are preferred for a particular set of side-chains. For instance, what mimic presents i, i+4, i+7 side-chains in orientations that best resemble an ideal α-helix, and is a different mimic required for a i, i+3, i+7 helical combination? This article describes a protocol for fitting each member of an array of accessible scaffold conformations on secondary structures. The protocol involves: (i) use quenched molecular dynamics (QMD) to generate an ensemble consisting of hundreds of accessible, low energy conformers of the mimics; (ii) representation of each of these as a set of Cα and Cβ coordinates corresponding to three amino acid side-chains displayed by the scaffolds;(iii) similar representation of each combination of three side-chains in each ideal secondary structure as a set of Cα and Cβ coordinates corresponding to three amino acid side-chains displayed by the scaffolds; and, (iv) overlay Cα and Cβ coordinates of all the conformers on all the sets of side-chain “triads” in the ideal secondary structures and express the goodness of fit in terms of root mean squared deviation (RMSD, Å) for each overlay. We refer to this process as Exploring Key Orientations on Secondary structures (EKOS). Application of this procedure reveals the relative bias of a scaffold to overlay on different secondary structures, the “side-chain correspondences” (eg i, i+4, i+7 or i, i+3, i+4) of those overlays, and the energy of this state relative to the minimum located. This protocol was tested on some of the most widely cited minimalist α-helical mimics (1 – 8 in the text). The data obtained indicates several of these compounds preferentially exist in conformations that resemble other secondary structures as well as α-helices, and many of the α-helical conformations have unexpected side-chain correspondences. These observations imply the featured minimalist mimics have more scope for disrupting PPI interfaces than previously anticipated. Finally, the same simulation method was used to match preferred conformations of minimalist mimics with actual protein/peptide structures at interfaces providing quantitative comparisons of predicted fits of the test mimics at protein-protein interaction sites. PMID:24121516
Evaluating minimalist mimics by exploring key orientations on secondary structures (EKOS).
Xin, Dongyue; Ko, Eunhwa; Perez, Lisa M; Ioerger, Thomas R; Burgess, Kevin
2013-11-28
Peptide mimics that display amino acid side-chains on semi-rigid scaffolds (not peptide polyamides) can be referred to as minimalist mimics. Accessible conformations of these scaffolds may overlay with secondary structures giving, for example, "minimalist helical mimics". It is difficult for researchers who want to apply minimalist mimics to decide which one to use because there is no widely accepted protocol for calibrating how closely these compounds mimic secondary structures. Moreover, it is also difficult for potential practitioners to evaluate which ideal minimalist helical mimics are preferred for a particular set of side-chains. For instance, what mimic presents i, i + 4, i + 7 side-chains in orientations that best resemble an ideal α-helix, and is a different mimic required for a i, i + 3, i + 7 helical combination? This article describes a protocol for fitting each member of an array of accessible scaffold conformations on secondary structures. The protocol involves: (i) use quenched molecular dynamics (QMD) to generate an ensemble consisting of hundreds of accessible, low energy conformers of the mimics; (ii) representation of each of these as a set of Cα and Cβ coordinates corresponding to three amino acid side-chains displayed by the scaffolds; (iii) similar representation of each combination of three side-chains in each ideal secondary structure as a set of Cα and Cβ coordinates corresponding to three amino acid side-chains displayed by the scaffolds; and, (iv) overlay Cα and Cβ coordinates of all the conformers on all the sets of side-chain "triads" in the ideal secondary structures and express the goodness of fit in terms of root mean squared deviation (RMSD, Å) for each overlay. We refer to this process as Exploring Key Orientations on Secondary structures (EKOS). Application of this procedure reveals the relative bias of a scaffold to overlay on different secondary structures, the "side-chain correspondences" (e.g. i, i + 4, i + 7 or i, i + 3, i + 4) of those overlays, and the energy of this state relative to the minimum located. This protocol was tested on some of the most widely cited minimalist α-helical mimics (1-8 in the text). The data obtained indicates several of these compounds preferentially exist in conformations that resemble other secondary structures as well as α-helices, and many of the α-helical conformations have unexpected side-chain correspondences. These observations imply the featured minimalist mimics have more scope for disrupting PPI interfaces than previously anticipated. Finally, the same simulation method was used to match preferred conformations of minimalist mimics with actual protein/peptide structures at interfaces providing quantitative comparisons of predicted fits of the test mimics at protein-protein interaction sites.
Schmidt-Hieltjes, Yvonne; Elshof, Clemens; Roovers, Lian; Ruinemans-Koerts, Janneke
2016-05-01
The aim of our study was to analyse whether the κ/λ free light chain ratio reference range for screening for Bence Jones proteinuria should be dependent on the estimated glomerular filtration rate (eGFR). The serum κ/λ free light chain ratio, eGFR, serum M-protein and Bence Jones protein were measured in 544 patients for whom Bence Jones protein analysis was ordered. In the population of patients without Bence Jones proteinuria or a M-protein (n = 402), there is no gradual increase in κ/λ free light chain ratio with diminishing eGFR. The κ/λ free light chain ratio in this group was 0.56-1.86 (95% interval). With this diagnostic reference range of the κ/λ ratio, 105 of the 110 patients with Bence Jones protein could be identified correctly. Only five patients with Bence Jones proteinuria (<0.17 g/L) were missed, without diagnostic or therapeutic consequences. In 36 patients (6.6%), an abnormal κ/λ free light chain ratio was measured without the presence of Bence Jones proteinuria. A κ/λ free light chain ratio in serum can be used safely and efficiently to select urine samples which should be analysed for Bence Jones proteinuria with an electrophoresis/immunofixation technique. Using this diagnostic reference range, the number of urine samples which should be analysed by electrophoresis/immunofixation could be reduced by 74%. The diagnostic reference interval can be determined best in a group of patients for whom Bence Jones analysis is indicated. For calculation of this reference range, the eGFR value does not need to be taken into account. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
A two-dimensional proteome reference map of Herbaspirillum seropedicae proteins.
Chaves, Daniela Fojo Seixas; Ferrer, Pércio Pereira; de Souza, Emanuel Maltempi; Gruz, Leonardo Magalhães; Monteiro, Rose Adele; de Oliveira Pedrosa, Fábio
2007-10-01
Herbaspirillum seropedicae is an endophytic diazotroph associated with economically important crops such as rice, sugarcane, and wheat. Here, we present a 2-D reference map for H. seropedicae. Using MALDI-TOF-MS we identified 205 spots representing 173 different proteins with a calculated average of 1.18 proteins/gene. Seventeen hypothetical or conserved hypothetical ORFs were shown to code for true gene products. These data will support the genome annotation process and provide a basis on which to undertake comparative proteomic studies.
Genetics Home Reference: chordoma
... regions of DNA. On the basis of this action, T-box proteins are called transcription factors. The brachyury protein is ... both result in the production of excess brachyury protein. The specific mechanism by which excess brachyury protein contributes to the ...
Oftedal, O T; Eisert, R; Barrell, G K
2014-01-01
Mammalian milks may differ greatly in composition from cow milk, and these differences may affect the performance of analytical methods. High-fat, high-protein milks with a preponderance of oligosaccharides, such as those produced by many marine mammals, present a particular challenge. We compared the performance of several methods against reference procedures using Weddell seal (Leptonychotes weddellii) milk of highly varied composition (by reference methods: 27-63% water, 24-62% fat, 8-12% crude protein, 0.5-1.8% sugar). A microdrying step preparatory to carbon-hydrogen-nitrogen (CHN) gas analysis slightly underestimated water content and had a higher repeatability relative standard deviation (RSDr) than did reference oven drying at 100°C. Compared with a reference macro-Kjeldahl protein procedure, the CHN (or Dumas) combustion method had a somewhat higher RSDr (1.56 vs. 0.60%) but correlation between methods was high (0.992), means were not different (CHN: 17.2±0.46% dry matter basis; Kjeldahl 17.3±0.49% dry matter basis), there were no significant proportional or constant errors, and predictive performance was high. A carbon stoichiometric procedure based on CHN analysis failed to adequately predict fat (reference: Röse-Gottlieb method) or total sugar (reference: phenol-sulfuric acid method). Gross energy content, calculated from energetic factors and results from reference methods for fat, protein, and total sugar, accurately predicted gross energy as measured by bomb calorimetry. We conclude that the CHN (Dumas) combustion method and calculation of gross energy are acceptable analytical approaches for marine mammal milk, but fat and sugar require separate analysis by appropriate analytic methods and cannot be adequately estimated by carbon stoichiometry. Some other alternative methods-low-temperature drying for water determination; Bradford, Lowry, and biuret methods for protein; the Folch and the Bligh and Dyer methods for fat; and enzymatic and reducing sugar methods for total sugar-appear likely to produce substantial error in marine mammal milks. It is important that alternative analytical methods be properly validated against a reference method before being used, especially for mammalian milks that differ greatly from cow milk in analyte characteristics and concentrations. Copyright © 2014 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Protein Models Docking Benchmark 2
Anishchenko, Ivan; Kundrotas, Petras J.; Tuzikov, Alexander V.; Vakser, Ilya A.
2015-01-01
Structural characterization of protein-protein interactions is essential for our ability to understand life processes. However, only a fraction of known proteins have experimentally determined structures. Such structures provide templates for modeling of a large part of the proteome, where individual proteins can be docked by template-free or template-based techniques. Still, the sensitivity of the docking methods to the inherent inaccuracies of protein models, as opposed to the experimentally determined high-resolution structures, remains largely untested, primarily due to the absence of appropriate benchmark set(s). Structures in such a set should have pre-defined inaccuracy levels and, at the same time, resemble actual protein models in terms of structural motifs/packing. The set should also be large enough to ensure statistical reliability of the benchmarking results. We present a major update of the previously developed benchmark set of protein models. For each interactor, six models were generated with the model-to-native Cα RMSD in the 1 to 6 Å range. The models in the set were generated by a new approach, which corresponds to the actual modeling of new protein structures in the “real case scenario,” as opposed to the previous set, where a significant number of structures were model-like only. In addition, the larger number of complexes (165 vs. 63 in the previous set) increases the statistical reliability of the benchmarking. We estimated the highest accuracy of the predicted complexes (according to CAPRI criteria), which can be attained using the benchmark structures. The set is available at http://dockground.bioinformatics.ku.edu. PMID:25712716
Categorizing Biases in High-Confidence High-Throughput Protein-Protein Interaction Data Sets*
Yu, Xueping; Ivanic, Joseph; Memišević, Vesna; Wallqvist, Anders; Reifman, Jaques
2011-01-01
We characterized and evaluated the functional attributes of three yeast high-confidence protein-protein interaction data sets derived from affinity purification/mass spectrometry, protein-fragment complementation assay, and yeast two-hybrid experiments. The interacting proteins retrieved from these data sets formed distinct, partially overlapping sets with different protein-protein interaction characteristics. These differences were primarily a function of the deployed experimental technologies used to recover these interactions. This affected the total coverage of interactions and was especially evident in the recovery of interactions among different functional classes of proteins. We found that the interaction data obtained by the yeast two-hybrid method was the least biased toward any particular functional characterization. In contrast, interacting proteins in the affinity purification/mass spectrometry and protein-fragment complementation assay data sets were over- and under-represented among distinct and different functional categories. We delineated how these differences affected protein complex organization in the network of interactions, in particular for strongly interacting complexes (e.g. RNA and protein synthesis) versus weak and transient interacting complexes (e.g. protein transport). We quantified methodological differences in detecting protein interactions from larger protein complexes, in the correlation of protein abundance among interacting proteins, and in their connectivity of essential proteins. In the latter case, we showed that minimizing inherent methodology biases removed many of the ambiguous conclusions about protein essentiality and protein connectivity. We used these findings to rationalize how biological insights obtained by analyzing data sets originating from different sources sometimes do not agree or may even contradict each other. An important corollary of this work was that discrepancies in biological insights did not necessarily imply that one detection methodology was better or worse, but rather that, to a large extent, the insights reflected the methodological biases themselves. Consequently, interpreting the protein interaction data within their experimental or cellular context provided the best avenue for overcoming biases and inferring biological knowledge. PMID:21876202
Tang, Hsin-Yao; Beer, Lynn A.; Barnhart, Kurt T.; Speicher, David W.
2011-01-01
Stable isotope dilution-multiple reaction monitoring-mass spectrometry (SID-MRM-MS) has emerged as a promising platform for verification of serological candidate biomarkers. However, cost and time needed to synthesize and evaluate stable isotope peptides, optimize spike-in assays, and generate standard curves, quickly becomes unattractive when testing many candidate biomarkers. In this study, we demonstrate that label-free multiplexed MRM-MS coupled with major protein depletion and 1-D gel separation is a time-efficient, cost-effective initial biomarker verification strategy requiring less than 100 μl serum. Furthermore, SDS gel fractionation can resolve different molecular weight forms of targeted proteins with potential diagnostic value. Because fractionation is at the protein level, consistency of peptide quantitation profiles across fractions permits rapid detection of quantitation problems for specific peptides from a given protein. Despite the lack of internal standards, the entire workflow can be highly reproducible, and long-term reproducibility of relative protein abundance can be obtained using different mass spectrometers and LC methods with external reference standards. Quantitation down to ~200 pg/mL could be achieved using this workflow. Hence, the label-free GeLC-MRM workflow enables rapid, sensitive, and economical initial screening of large numbers of candidate biomarkers prior to setting up SID-MRM assays or immunoassays for the most promising candidate biomarkers. PMID:21726088
Tang, Hsin-Yao; Beer, Lynn A; Barnhart, Kurt T; Speicher, David W
2011-09-02
Stable isotope dilution-multiple reaction monitoring-mass spectrometry (SID-MRM-MS) has emerged as a promising platform for verification of serological candidate biomarkers. However, cost and time needed to synthesize and evaluate stable isotope peptides, optimize spike-in assays, and generate standard curves quickly becomes unattractive when testing many candidate biomarkers. In this study, we demonstrate that label-free multiplexed MRM-MS coupled with major protein depletion and 1D gel separation is a time-efficient, cost-effective initial biomarker verification strategy requiring less than 100 μL of serum. Furthermore, SDS gel fractionation can resolve different molecular weight forms of targeted proteins with potential diagnostic value. Because fractionation is at the protein level, consistency of peptide quantitation profiles across fractions permits rapid detection of quantitation problems for specific peptides from a given protein. Despite the lack of internal standards, the entire workflow can be highly reproducible, and long-term reproducibility of relative protein abundance can be obtained using different mass spectrometers and LC methods with external reference standards. Quantitation down to ~200 pg/mL could be achieved using this workflow. Hence, the label-free GeLC-MRM workflow enables rapid, sensitive, and economical initial screening of large numbers of candidate biomarkers prior to setting up SID-MRM assays or immunoassays for the most promising candidate biomarkers.
KoBaMIN: a knowledge-based minimization web server for protein structure refinement.
Rodrigues, João P G L M; Levitt, Michael; Chopra, Gaurav
2012-07-01
The KoBaMIN web server provides an online interface to a simple, consistent and computationally efficient protein structure refinement protocol based on minimization of a knowledge-based potential of mean force. The server can be used to refine either a single protein structure or an ensemble of proteins starting from their unrefined coordinates in PDB format. The refinement method is particularly fast and accurate due to the underlying knowledge-based potential derived from structures deposited in the PDB; as such, the energy function implicitly includes the effects of solvent and the crystal environment. Our server allows for an optional but recommended step that optimizes stereochemistry using the MESHI software. The KoBaMIN server also allows comparison of the refined structures with a provided reference structure to assess the changes brought about by the refinement protocol. The performance of KoBaMIN has been benchmarked widely on a large set of decoys, all models generated at the seventh worldwide experiments on critical assessment of techniques for protein structure prediction (CASP7) and it was also shown to produce top-ranking predictions in the refinement category at both CASP8 and CASP9, yielding consistently good results across a broad range of model quality values. The web server is fully functional and freely available at http://csb.stanford.edu/kobamin.
Acute phase proteins in healthy goats: establishment of reference intervals.
Heller, Meera C; Johns, Jennifer L
2015-03-01
Acute inflammatory processes can trigger increased production of acute phase proteins (APPs) that can be useful biomarkers of inflammation. APPs are diverse and include proteins involved in coagulation, opsonization, iron regulation, and limitation of tissue injury. Haptoglobin, serum amyloid A, and alpha-1 acid glycoprotein have been proposed as useful APPs in goats. APPs can differ markedly by species, therefore species-specific reference intervals and studies are necessary. The objective of this study was to determine species-specific reference intervals for 4 APPs in goats. Haptoglobin, serum amyloid A, lipopolysaccharide binding protein, and alpha-1 acid glycoprotein were measured in in 54 clinically normal adult goats. APPs were measured using goat-specific commercial enzyme-linked immunosorbent assay kits. Results were analyzed by 1-way analysis of variance to compare sexes and breeding status. Reference Value Advisor was used to calculate reference limits according to the IFCC-CLSI guidelines. Only 1 APP was found to vary in healthy animals; serum haptoglobin was increased in lactating animals and decreased in pregnant does in their second trimester when compared with open, nonlactating does. No sex-based differences were seen for any of the APPs measured. We report normal reference intervals for 4 serum APPs that may be useful as disease markers. Haptoglobin should be interpreted with caution in animals with unknown pregnancy status. Further studies are needed to determine whether these APPs are useful biomarkers in goat disease states. © 2015 The Author(s).
Guo, Hao-Bo; Ma, Yue; Tuskan, Gerald A.; ...
2018-01-01
The existence of complete genome sequences makes it important to develop different approaches for classification of large-scale data sets and to make extraction of biological insights easier. Here, we propose an approach for classification of complete proteomes/protein sets based on protein distributions on some basic attributes. We demonstrate the usefulness of this approach by determining protein distributions in terms of two attributes: protein lengths and protein intrinsic disorder contents (ID). The protein distributions based on L and ID are surveyed for representative proteome organisms and protein sets from the three domains of life. The two-dimensional maps (designated as fingerprints here)more » from the protein distribution densities in the LD space defined by ln( L ) and ID are then constructed. The fingerprints for different organisms and protein sets are found to be distinct with each other, and they can therefore be used for comparative studies. As a test case, phylogenetic trees have been constructed based on the protein distribution densities in the fingerprints of proteomes of organisms without performing any protein sequence comparison and alignments. The phylogenetic trees generated are biologically meaningful, demonstrating that the protein distributions in the LD space may serve as unique phylogenetic signals of the organisms at the proteome level.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Guo, Hao-Bo; Ma, Yue; Tuskan, Gerald A.
The existence of complete genome sequences makes it important to develop different approaches for classification of large-scale data sets and to make extraction of biological insights easier. Here, we propose an approach for classification of complete proteomes/protein sets based on protein distributions on some basic attributes. We demonstrate the usefulness of this approach by determining protein distributions in terms of two attributes: protein lengths and protein intrinsic disorder contents (ID). The protein distributions based on L and ID are surveyed for representative proteome organisms and protein sets from the three domains of life. The two-dimensional maps (designated as fingerprints here)more » from the protein distribution densities in the LD space defined by ln( L ) and ID are then constructed. The fingerprints for different organisms and protein sets are found to be distinct with each other, and they can therefore be used for comparative studies. As a test case, phylogenetic trees have been constructed based on the protein distribution densities in the fingerprints of proteomes of organisms without performing any protein sequence comparison and alignments. The phylogenetic trees generated are biologically meaningful, demonstrating that the protein distributions in the LD space may serve as unique phylogenetic signals of the organisms at the proteome level.« less
Aleixandre-Tudo, José Luis; Nieuwoudt, Helené; Aleixandre, José Luis; Du Toit, Wessel J
2015-02-04
The validation of ultraviolet-visible (UV-vis) spectroscopy combined with partial least-squares (PLS) regression to quantify red wine tannins is reported. The methylcellulose precipitable (MCP) tannin assay and the bovine serum albumin (BSA) tannin assay were used as reference methods. To take the high variability of wine tannins into account when the calibration models were built, a diverse data set was collected from samples of South African red wines that consisted of 18 different cultivars, from regions spanning the wine grape-growing areas of South Africa with their various sites, climates, and soils, ranging in vintage from 2000 to 2012. A total of 240 wine samples were analyzed, and these were divided into a calibration set (n = 120) and a validation set (n = 120) to evaluate the predictive ability of the models. To test the robustness of the PLS calibration models, the predictive ability of the classifying variables cultivar, vintage year, and experimental versus commercial wines was also tested. In general, the statistics obtained when BSA was used as a reference method were slightly better than those obtained with MCP. Despite this, the MCP tannin assay should also be considered as a valid reference method for developing PLS calibrations. The best calibration statistics for the prediction of new samples were coefficient of correlation (R 2 val) = 0.89, root mean standard error of prediction (RMSEP) = 0.16, and residual predictive deviation (RPD) = 3.49 for MCP and R 2 val = 0.93, RMSEP = 0.08, and RPD = 4.07 for BSA, when only the UV region (260-310 nm) was selected, which also led to a faster analysis time. In addition, a difference in the results obtained when the predictive ability of the classifying variables vintage, cultivar, or commercial versus experimental wines was studied suggests that tannin composition is highly affected by many factors. This study also discusses the correlations in tannin values between the methylcellulose and protein precipitation methods.
Open Source High Content Analysis Utilizing Automated Fluorescence Lifetime Imaging Microscopy.
Görlitz, Frederik; Kelly, Douglas J; Warren, Sean C; Alibhai, Dominic; West, Lucien; Kumar, Sunil; Alexandrov, Yuriy; Munro, Ian; Garcia, Edwin; McGinty, James; Talbot, Clifford; Serwa, Remigiusz A; Thinon, Emmanuelle; da Paola, Vincenzo; Murray, Edward J; Stuhmeier, Frank; Neil, Mark A A; Tate, Edward W; Dunsby, Christopher; French, Paul M W
2017-01-18
We present an open source high content analysis instrument utilizing automated fluorescence lifetime imaging (FLIM) for assaying protein interactions using Förster resonance energy transfer (FRET) based readouts of fixed or live cells in multiwell plates. This provides a means to screen for cell signaling processes read out using intramolecular FRET biosensors or intermolecular FRET of protein interactions such as oligomerization or heterodimerization, which can be used to identify binding partners. We describe here the functionality of this automated multiwell plate FLIM instrumentation and present exemplar data from our studies of HIV Gag protein oligomerization and a time course of a FRET biosensor in live cells. A detailed description of the practical implementation is then provided with reference to a list of hardware components and a description of the open source data acquisition software written in µManager. The application of FLIMfit, an open source MATLAB-based client for the OMERO platform, to analyze arrays of multiwell plate FLIM data is also presented. The protocols for imaging fixed and live cells are outlined and a demonstration of an automated multiwell plate FLIM experiment using cells expressing fluorescent protein-based FRET constructs is presented. This is complemented by a walk-through of the data analysis for this specific FLIM FRET data set.
Alberio, Tiziana; Pieroni, Luisa; Ronci, Maurizio; Banfi, Cristina; Bongarzone, Italia; Bottoni, Patrizia; Brioschi, Maura; Caterino, Marianna; Chinello, Clizia; Cormio, Antonella; Cozzolino, Flora; Cunsolo, Vincenzo; Fontana, Simona; Garavaglia, Barbara; Giusti, Laura; Greco, Viviana; Lucacchini, Antonio; Maffioli, Elisa; Magni, Fulvio; Monteleone, Francesca; Monti, Maria; Monti, Valentina; Musicco, Clara; Petrosillo, Giuseppe; Porcelli, Vito; Saletti, Rosaria; Scatena, Roberto; Soggiu, Alessio; Tedeschi, Gabriella; Zilocchi, Mara; Roncada, Paola; Urbani, Andrea; Fasano, Mauro
2017-12-01
The Mitochondrial Human Proteome Project aims at understanding the function of the mitochondrial proteome and its crosstalk with the proteome of other organelles. Being able to choose a suitable and validated enrichment protocol of functional mitochondria, based on the specific needs of the downstream proteomics analysis, would greatly help the researchers in the field. Mitochondrial fractions from ten model cell lines were prepared using three enrichment protocols and analyzed on seven different LC-MS/MS platforms. All data were processed using neXtProt as reference database. The data are available for the Human Proteome Project purposes through the ProteomeXchange Consortium with the identifier PXD007053. The processed data sets were analyzed using a suite of R routines to perform a statistical analysis and to retrieve subcellular and submitochondrial localizations. Although the overall number of identified total and mitochondrial proteins was not significantly dependent on the enrichment protocol, specific line to line differences were observed. Moreover, the protein lists were mapped to a network representing the functional mitochondrial proteome, encompassing mitochondrial proteins and their first interactors. More than 80% of the identified proteins resulted in nodes of this network but with a different ability in coisolating mitochondria-associated structures for each enrichment protocol/cell line pair.
Malcova, Ivana; Farkasovsky, Marian; Senohrabkova, Lenka; Vasicova, Pavla; Hasek, Jiri
2016-05-01
Live-imaging analysis is performed in many laboratories all over the world. Various tools have been developed to enable protein labeling either in plasmid or genomic context in live yeast cells. Here, we introduce a set of nine integrative modules for the C-terminal gene tagging that combines three fluorescent proteins (FPs)-ymTagBFP, mCherry and yTagRFP-T with three dominant selection markers: geneticin, nourseothricin and hygromycin. In addition, the construction of two episomal modules for Saccharomyces cerevisiae with photostable yTagRFP-T is also referred to. Our cassettes with orange, red and blue FPs can be combined with other fluorescent probes like green fluorescent protein to prepare double- or triple-labeled strains for multicolor live-cell imaging. Primers for PCR amplification of the cassettes were designed in such a way as to be fully compatible with the existing PCR toolbox representing over 50 various integrative modules and also with deletion cassettes either for single or repeated usage to enable a cost-effective and an easy exchange of tags. New modules can also be used for biochemical analysis since antibodies are available for all three fluorescent probes. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Zawadzka, Anna M.; Schilling, Birgit; Held, Jason M.; Sahu, Alexandria K.; Cusack, Michael P.; Drake, Penelope M.; Fisher, Susan J.; Gibson, Bradford W.
2015-01-01
Human plasma contains proteins that reflect overall health and represents a rich source of proteins for identifying and understanding disease pathophysiology. However, few studies have investigated changes in plasma phosphoproteins. In addition, little is known about the normal variations in these phosphoproteins, especially with respect to specific sites of modification. To address these questions, we evaluated variability in plasma protein phosphorylation in healthy individuals using multiple reaction monitoring (MRM) and SWATH MS2 data-independent acquisition. First, we developed a discovery workflow for phosphopeptide enrichment from plasma and identified targets for MRM assays. Next, we analyzed plasma from healthy donors using an analytical workflow consisting of MRM and SWATH MS2 that targeted phosphopeptides from 58 and 68 phosphoproteins, respectively. These two methods produced similar results showing low variability in 13 phosphosites from 10 phosphoproteins (CVinter <30%) and high interpersonal variation of 16 phosphosites from 14 phosphoproteins (CVinter >30%). Moreover, these phosphopeptides originate from phosphoproteins involved in cellular processes governing homeostasis, immune response, cell-extracellular matrix interactions, lipid and sugar metabolism, and cell signaling. This limited assessment of technical and biological variability in phosphopeptides generated from plasma phosphoproteins among healthy volunteers constitutes a reference for future studies that target protein phosphorylation as biomarkers. PMID:24853916
Building a protein name dictionary from full text: a machine learning term extraction approach.
Shi, Lei; Campagne, Fabien
2005-04-07
The majority of information in the biological literature resides in full text articles, instead of abstracts. Yet, abstracts remain the focus of many publicly available literature data mining tools. Most literature mining tools rely on pre-existing lexicons of biological names, often extracted from curated gene or protein databases. This is a limitation, because such databases have low coverage of the many name variants which are used to refer to biological entities in the literature. We present an approach to recognize named entities in full text. The approach collects high frequency terms in an article, and uses support vector machines (SVM) to identify biological entity names. It is also computationally efficient and robust to noise commonly found in full text material. We use the method to create a protein name dictionary from a set of 80,528 full text articles. Only 8.3% of the names in this dictionary match SwissProt description lines. We assess the quality of the dictionary by studying its protein name recognition performance in full text. This dictionary term lookup method compares favourably to other published methods, supporting the significance of our direct extraction approach. The method is strong in recognizing name variants not found in SwissProt.
GENCODE: the reference human genome annotation for The ENCODE Project.
Harrow, Jennifer; Frankish, Adam; Gonzalez, Jose M; Tapanari, Electra; Diekhans, Mark; Kokocinski, Felix; Aken, Bronwen L; Barrell, Daniel; Zadissa, Amonida; Searle, Stephen; Barnes, If; Bignell, Alexandra; Boychenko, Veronika; Hunt, Toby; Kay, Mike; Mukherjee, Gaurab; Rajan, Jeena; Despacio-Reyes, Gloria; Saunders, Gary; Steward, Charles; Harte, Rachel; Lin, Michael; Howald, Cédric; Tanzer, Andrea; Derrien, Thomas; Chrast, Jacqueline; Walters, Nathalie; Balasubramanian, Suganthi; Pei, Baikang; Tress, Michael; Rodriguez, Jose Manuel; Ezkurdia, Iakes; van Baren, Jeltje; Brent, Michael; Haussler, David; Kellis, Manolis; Valencia, Alfonso; Reymond, Alexandre; Gerstein, Mark; Guigó, Roderic; Hubbard, Tim J
2012-09-01
The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.
Open Source High Content Analysis Utilizing Automated Fluorescence Lifetime Imaging Microscopy
Warren, Sean C.; Alibhai, Dominic; West, Lucien; Kumar, Sunil; Alexandrov, Yuriy; Munro, Ian; Garcia, Edwin; McGinty, James; Talbot, Clifford; Serwa, Remigiusz A.; Thinon, Emmanuelle; da Paola, Vincenzo; Murray, Edward J.; Stuhmeier, Frank; Neil, Mark A. A.; Tate, Edward W.; Dunsby, Christopher; French, Paul M. W.
2017-01-01
We present an open source high content analysis instrument utilizing automated fluorescence lifetime imaging (FLIM) for assaying protein interactions using Förster resonance energy transfer (FRET) based readouts of fixed or live cells in multiwell plates. This provides a means to screen for cell signaling processes read out using intramolecular FRET biosensors or intermolecular FRET of protein interactions such as oligomerization or heterodimerization, which can be used to identify binding partners. We describe here the functionality of this automated multiwell plate FLIM instrumentation and present exemplar data from our studies of HIV Gag protein oligomerization and a time course of a FRET biosensor in live cells. A detailed description of the practical implementation is then provided with reference to a list of hardware components and a description of the open source data acquisition software written in µManager. The application of FLIMfit, an open source MATLAB-based client for the OMERO platform, to analyze arrays of multiwell plate FLIM data is also presented. The protocols for imaging fixed and live cells are outlined and a demonstration of an automated multiwell plate FLIM experiment using cells expressing fluorescent protein-based FRET constructs is presented. This is complemented by a walk-through of the data analysis for this specific FLIM FRET data set. PMID:28190060
Building a protein name dictionary from full text: a machine learning term extraction approach
Shi, Lei; Campagne, Fabien
2005-01-01
Background The majority of information in the biological literature resides in full text articles, instead of abstracts. Yet, abstracts remain the focus of many publicly available literature data mining tools. Most literature mining tools rely on pre-existing lexicons of biological names, often extracted from curated gene or protein databases. This is a limitation, because such databases have low coverage of the many name variants which are used to refer to biological entities in the literature. Results We present an approach to recognize named entities in full text. The approach collects high frequency terms in an article, and uses support vector machines (SVM) to identify biological entity names. It is also computationally efficient and robust to noise commonly found in full text material. We use the method to create a protein name dictionary from a set of 80,528 full text articles. Only 8.3% of the names in this dictionary match SwissProt description lines. We assess the quality of the dictionary by studying its protein name recognition performance in full text. Conclusion This dictionary term lookup method compares favourably to other published methods, supporting the significance of our direct extraction approach. The method is strong in recognizing name variants not found in SwissProt. PMID:15817129
MIPS: analysis and annotation of proteins from whole genomes in 2005.
Mewes, H W; Frishman, D; Mayer, K F X; Münsterkötter, M; Noubibou, O; Pagel, P; Rattei, T; Oesterheld, M; Ruepp, A; Stümpflen, V
2006-01-01
The Munich Information Center for Protein Sequences (MIPS at the GSF), Neuherberg, Germany, provides resources related to genome information. Manually curated databases for several reference organisms are maintained. Several of these databases are described elsewhere in this and other recent NAR database issues. In a complementary effort, a comprehensive set of >400 genomes automatically annotated with the PEDANT system are maintained. The main goal of our current work on creating and maintaining genome databases is to extend gene centered information to information on interactions within a generic comprehensive framework. We have concentrated our efforts along three lines (i) the development of suitable comprehensive data structures and database technology, communication and query tools to include a wide range of different types of information enabling the representation of complex information such as functional modules or networks Genome Research Environment System, (ii) the development of databases covering computable information such as the basic evolutionary relations among all genes, namely SIMAP, the sequence similarity matrix and the CABiNet network analysis framework and (iii) the compilation and manual annotation of information related to interactions such as protein-protein interactions or other types of relations (e.g. MPCDB, MPPI, CYGD). All databases described and the detailed descriptions of our projects can be accessed through the MIPS WWW server (http://mips.gsf.de).
Kotrri, Gynter; Fusch, Gerhard; Kwan, Celia; Choi, Dasol; Choi, Arum; Al Kafi, Nisreen; Rochow, Niels; Fusch, Christoph
2016-01-01
Commercial infrared (IR) milk analyzers are being increasingly used in research settings for the macronutrient measurement of breast milk (BM) prior to its target fortification. These devices, however, may not provide reliable measurement if not properly calibrated. In the current study, we tested a correction algorithm for a Near-IR milk analyzer (Unity SpectraStar, Brookfield, CT, USA) for fat and protein measurements, and examined the effect of pasteurization on the IR matrix and the stability of fat, protein, and lactose. Measurement values generated through Near-IR analysis were compared against those obtained through chemical reference methods to test the correction algorithm for the Near-IR milk analyzer. Macronutrient levels were compared between unpasteurized and pasteurized milk samples to determine the effect of pasteurization on macronutrient stability. The correction algorithm generated for our device was found to be valid for unpasteurized and pasteurized BM. Pasteurization had no effect on the macronutrient levels and the IR matrix of BM. These results show that fat and protein content can be accurately measured and monitored for unpasteurized and pasteurized BM. Of additional importance is the implication that donated human milk, generally low in protein content, has the potential to be target fortified. PMID:26927169
pKa values in proteins determined by electrostatics applied to molecular dynamics trajectories.
Meyer, Tim; Knapp, Ernst-Walter
2015-06-09
For a benchmark set of 194 measured pKa values in 13 proteins, electrostatic energy computations are performed in which pKa values are computed by solving the Poisson-Boltzmann equation. In contrast to the previous approach of Karlsberg(+) (KB(+)) that essentially used protein crystal structures with variations in their side chain conformations, the present approach (KB2(+)MD) uses protein conformations from four molecular dynamics (MD) simulations of 10 ns each. These MD simulations are performed with different specific but fixed protonation patterns, selected to sample the conformational space for the different protonation patterns faithfully. The root-mean-square deviation between computed and measured pKa values (pKa RMSD) is shown to be reduced from 1.17 pH units using KB(+) to 0.96 pH units using KB2(+)MD. The pKa RMSD can be further reduced to 0.79 pH units, if each conformation is energy-minimized with a dielectric constant of εmin = 4 prior to calculating the electrostatic energy. The electrostatic energy expressions upon which the computations are based have been reformulated such that they do not involve terms that mix protein and solvent environment contributions and no thermodynamic cycle is needed. As a consequence, conformations of the titratable residues can be treated independently in the protein and solvent environments. In addition, the energy terms used here avoid the so-called intrinsic pKa and can therefore be interpreted without reference to arbitrary protonation states and conformations.
A New Method for Determining Structure Ensemble: Application to a RNA Binding Di-Domain Protein.
Liu, Wei; Zhang, Jingfeng; Fan, Jing-Song; Tria, Giancarlo; Grüber, Gerhard; Yang, Daiwen
2016-05-10
Structure ensemble determination is the basis of understanding the structure-function relationship of a multidomain protein with weak domain-domain interactions. Paramagnetic relaxation enhancement has been proven a powerful tool in the study of structure ensembles, but there exist a number of challenges such as spin-label flexibility, domain dynamics, and overfitting. Here we propose a new (to our knowledge) method to describe structure ensembles using a minimal number of conformers. In this method, individual domains are considered rigid; the position of each spin-label conformer and the structure of each protein conformer are defined by three and six orthogonal parameters, respectively. First, the spin-label ensemble is determined by optimizing the positions and populations of spin-label conformers against intradomain paramagnetic relaxation enhancements with a genetic algorithm. Subsequently, the protein structure ensemble is optimized using a more efficient genetic algorithm-based approach and an overfitting indicator, both of which were established in this work. The method was validated using a reference ensemble with a set of conformers whose populations and structures are known. This method was also applied to study the structure ensemble of the tandem di-domain of a poly (U) binding protein. The determined ensemble was supported by small-angle x-ray scattering and nuclear magnetic resonance relaxation data. The ensemble obtained suggests an induced fit mechanism for recognition of target RNA by the protein. Copyright © 2016 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Choi, Yejin; Kwon, Seong Yi; Oh, Ho Jung; Shim, Sunbo; Chang, Seokkee; Chung, Hye Joo; Kim, Do Keun; Park, Younsang; Lee, Younghee
2017-09-01
The single radial immunodiffusion (SRID) assay, used to quantify hemagglutinin (HA) in influenza vaccines, requires reference reagents; however, because centralized production of reference reagents may slow the emergency deployment of vaccines, alternatives are needed. We investigated the production of HA proteins using recombinant DNA technology, rather than a traditional egg-based production process. The HA proteins were then used in an SRID assay as a reference antigen. We found that HA can be quantified in both egg-based and cell-based influenza vaccines when recombinant HAs (rHAs) are used as the reference antigen. Furthermore, we confirmed that rHAs obtained from strains with pandemic potential, such as H5N1, H7N3, H7N9, and H9N2 strains, can be utilized in the SRID assay. The rHA production process takes just one month, in contrast to the traditional process that takes three to four months. The use of rHAs may reduce the time required to produce reference reagents and facilitate timely introduction of vaccines during emergencies.
Pujar, Shashikant; O'Leary, Nuala A; Farrell, Catherine M; Loveland, Jane E; Mudge, Jonathan M; Wallin, Craig; Girón, Carlos G; Diekhans, Mark; Barnes, If; Bennett, Ruth; Berry, Andrew E; Cox, Eric; Davidson, Claire; Goldfarb, Tamara; Gonzalez, Jose M; Hunt, Toby; Jackson, John; Joardar, Vinita; Kay, Mike P; Kodali, Vamsi K; Martin, Fergal J; McAndrews, Monica; McGarvey, Kelly M; Murphy, Michael; Rajput, Bhanu; Rangwala, Sanjida H; Riddick, Lillian D; Seal, Ruth L; Suner, Marie-Marthe; Webb, David; Zhu, Sophia; Aken, Bronwen L; Bruford, Elspeth A; Bult, Carol J; Frankish, Adam; Murphy, Terence; Pruitt, Kim D
2018-01-04
The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community. Published by Oxford University Press on behalf of Nucleic Acids Research 2017.
Protein structure based prediction of catalytic residues
2013-01-01
Background Worldwide structural genomics projects continue to release new protein structures at an unprecedented pace, so far nearly 6000, but only about 60% of these proteins have any sort of functional annotation. Results We explored a range of features that can be used for the prediction of functional residues given a known three-dimensional structure. These features include various centrality measures of nodes in graphs of interacting residues: closeness, betweenness and page-rank centrality. We also analyzed the distance of functional amino acids to the general center of mass (GCM) of the structure, relative solvent accessibility (RSA), and the use of relative entropy as a measure of sequence conservation. From the selected features, neural networks were trained to identify catalytic residues. We found that using distance to the GCM together with amino acid type provide a good discriminant function, when combined independently with sequence conservation. Using an independent test set of 29 annotated protein structures, the method returned 411 of the initial 9262 residues as the most likely to be involved in function. The output 411 residues contain 70 of the annotated 111 catalytic residues. This represents an approximately 14-fold enrichment of catalytic residues on the entire input set (corresponding to a sensitivity of 63% and a precision of 17%), a performance competitive with that of other state-of-the-art methods. Conclusions We found that several of the graph based measures utilize the same underlying feature of protein structures, which can be simply and more effectively captured with the distance to GCM definition. This also has the added the advantage of simplicity and easy implementation. Meanwhile sequence conservation remains by far the most influential feature in identifying functional residues. We also found that due the rapid changes in size and composition of sequence databases, conservation calculations must be recalibrated for specific reference databases. PMID:23433045
Hassan, Syed S.; Jamal, Syed B.; Radusky, Leandro G.; Tiwari, Sandeep; Ullah, Asad; Ali, Javed; Behramand; de Carvalho, Paulo V. S. D.; Shams, Rida; Khan, Sabir; Figueiredo, Henrique C. P.; Barh, Debmalya; Ghosh, Preetam; Silva, Artur; Baumbach, Jan; Röttger, Richard; Turjanski, Adrián G.; Azevedo, Vasco A. C.
2018-01-01
Diphtheria is an acute and highly infectious disease, previously regarded as endemic in nature but vaccine-preventable, is caused by Corynebacterium diphtheriae (Cd). In this work, we used an in silico approach along the 13 complete genome sequences of C. diphtheriae followed by a computational assessment of structural information of the binding sites to characterize the “pocketome druggability.” To this end, we first computed the “modelome” (3D structures of a complete genome) of a randomly selected reference strain Cd NCTC13129; that had 13,763 open reading frames (ORFs) and resulted in 1,253 (∼9%) structure models. The amino acid sequences of these modeled structures were compared with the remaining 12 genomes and consequently, 438 conserved protein sequences were obtained. The RCSB-PDB database was consulted to check the template structures for these conserved proteins and as a result, 401 adequate 3D models were obtained. We subsequently predicted the protein pockets for the obtained set of models and kept only the conserved pockets that had highly druggable (HD) values (137 across all strains). Later, an off-target host homology analyses was performed considering the human proteome using NCBI database. Furthermore, the gene essentiality analysis was carried out that gave a final set of 10-conserved targets possessing highly druggable protein pockets. To check the target identification robustness of the pipeline used in this work, we crosschecked the final target list with another in-house target identification approach for C. diphtheriae thereby obtaining three common targets, these were; hisE-phosphoribosyl-ATP pyrophosphatase, glpX-fructose 1,6-bisphosphatase II, and rpsH-30S ribosomal protein S8. Our predicted results suggest that the in silico approach used could potentially aid in experimental polypharmacological target determination in C. diphtheriae and other pathogens, thereby, might complement the existing and new drug-discovery pipelines. PMID:29487617
Self-Complementarity within Proteins: Bridging the Gap between Binding and Folding
Basu, Sankar; Bhattacharyya, Dhananjay; Banerjee, Rahul
2012-01-01
Complementarity, in terms of both shape and electrostatic potential, has been quantitatively estimated at protein-protein interfaces and used extensively to predict the specific geometry of association between interacting proteins. In this work, we attempted to place both binding and folding on a common conceptual platform based on complementarity. To that end, we estimated (for the first time to our knowledge) electrostatic complementarity (Em) for residues buried within proteins. Em measures the correlation of surface electrostatic potential at protein interiors. The results show fairly uniform and significant values for all amino acids. Interestingly, hydrophobic side chains also attain appreciable complementarity primarily due to the trajectory of the main chain. Previous work from our laboratory characterized the surface (or shape) complementarity (Sm) of interior residues, and both of these measures have now been combined to derive two scoring functions to identify the native fold amid a set of decoys. These scoring functions are somewhat similar to functions that discriminate among multiple solutions in a protein-protein docking exercise. The performances of both of these functions on state-of-the-art databases were comparable if not better than most currently available scoring functions. Thus, analogously to interfacial residues of protein chains associated (docked) with specific geometry, amino acids found in the native interior have to satisfy fairly stringent constraints in terms of both Sm and Em. The functions were also found to be useful for correctly identifying the same fold for two sequences with low sequence identity. Finally, inspired by the Ramachandran plot, we developed a plot of Sm versus Em (referred to as the complementarity plot) that identifies residues with suboptimal packing and electrostatics which appear to be correlated to coordinate errors. PMID:22713576
Self-complementarity within proteins: bridging the gap between binding and folding.
Basu, Sankar; Bhattacharyya, Dhananjay; Banerjee, Rahul
2012-06-06
Complementarity, in terms of both shape and electrostatic potential, has been quantitatively estimated at protein-protein interfaces and used extensively to predict the specific geometry of association between interacting proteins. In this work, we attempted to place both binding and folding on a common conceptual platform based on complementarity. To that end, we estimated (for the first time to our knowledge) electrostatic complementarity (Em) for residues buried within proteins. Em measures the correlation of surface electrostatic potential at protein interiors. The results show fairly uniform and significant values for all amino acids. Interestingly, hydrophobic side chains also attain appreciable complementarity primarily due to the trajectory of the main chain. Previous work from our laboratory characterized the surface (or shape) complementarity (Sm) of interior residues, and both of these measures have now been combined to derive two scoring functions to identify the native fold amid a set of decoys. These scoring functions are somewhat similar to functions that discriminate among multiple solutions in a protein-protein docking exercise. The performances of both of these functions on state-of-the-art databases were comparable if not better than most currently available scoring functions. Thus, analogously to interfacial residues of protein chains associated (docked) with specific geometry, amino acids found in the native interior have to satisfy fairly stringent constraints in terms of both Sm and Em. The functions were also found to be useful for correctly identifying the same fold for two sequences with low sequence identity. Finally, inspired by the Ramachandran plot, we developed a plot of Sm versus Em (referred to as the complementarity plot) that identifies residues with suboptimal packing and electrostatics which appear to be correlated to coordinate errors. Copyright © 2012 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Reference set design for relational modeling of fuzzy systems
NASA Astrophysics Data System (ADS)
Lapohos, Tibor; Buchal, Ralph O.
1994-10-01
One of the keys to the successful relational modeling of fuzzy systems is the proper design of fuzzy reference sets. This has been discussed throughout the literature. In the frame of modeling a stochastic system, we analyze the problem numerically. First, we briefly describe the relational model and present the performance of the modeling in the most trivial case: the reference sets are triangle shaped. Next, we present a known fuzzy reference set generator algorithm (FRSGA) which is based on the fuzzy c-means (Fc-M) clustering algorithm. In the second section of this chapter we improve the previous FRSGA by adding a constraint to the Fc-M algorithm (modified Fc-M or MFc-M): two cluster centers are forced to coincide with the domain limits. This is needed to obtain properly shaped extreme linguistic reference values. We apply this algorithm to uniformly discretized domains of the variables involved. The fuzziness of the reference sets produced by both Fc-M and MFc-M is determined by a parameter, which in our experiments is modified iteratively. Each time, a new model is created and its performance analyzed. For certain algorithm parameter values both of these two algorithms have shortcomings. To eliminate the drawbacks of these two approaches, we develop a completely new generator algorithm for reference sets which we call Polyline. This algorithm and its performance are described in the last section. In all three cases, the modeling is performed for a variety of operators used in the inference engine and two defuzzification methods. Therefore our results depend neither on the system model order nor the experimental setup.
Prediction of virus-host protein-protein interactions mediated by short linear motifs.
Becerra, Andrés; Bucheli, Victor A; Moreno, Pedro A
2017-03-09
Short linear motifs in host organisms proteins can be mimicked by viruses to create protein-protein interactions that disable or control metabolic pathways. Given that viral linear motif instances of host motif regular expressions can be found by chance, it is necessary to develop filtering methods of functional linear motifs. We conduct a systematic comparison of linear motifs filtering methods to develop a computational approach for predicting motif-mediated protein-protein interactions between human and the human immunodeficiency virus 1 (HIV-1). We implemented three filtering methods to obtain linear motif sets: 1) conserved in viral proteins (C), 2) located in disordered regions (D) and 3) rare or scarce in a set of randomized viral sequences (R). The sets C,D,R are united and intersected. The resulting sets are compared by the number of protein-protein interactions correctly inferred with them - with experimental validation. The comparison is done with HIV-1 sequences and interactions from the National Institute of Allergy and Infectious Diseases (NIAID). The number of correctly inferred interactions allows to rank the interactions by the sets used to deduce them: D∪R and C. The ordering of the sets is descending on the probability of capturing functional interactions. With respect to HIV-1, the sets C∪R, D∪R, C∪D∪R infer all known interactions between HIV1 and human proteins mediated by linear motifs. We found that the majority of conserved linear motifs in the virus are located in disordered regions. We have developed a method for predicting protein-protein interactions mediated by linear motifs between HIV-1 and human proteins. The method only use protein sequences as inputs. We can extend the software developed to any other eukaryotic virus and host in order to find and rank candidate interactions. In future works we will use it to explore possible viral attack mechanisms based on linear motif mimicry.
MolProbity: More and better reference data for improved all-atom structure validation.
Williams, Christopher J; Headd, Jeffrey J; Moriarty, Nigel W; Prisant, Michael G; Videau, Lizbeth L; Deis, Lindsay N; Verma, Vishal; Keedy, Daniel A; Hintze, Bradley J; Chen, Vincent B; Jain, Swati; Lewis, Steven M; Arendall, W Bryan; Snoeyink, Jack; Adams, Paul D; Lovell, Simon C; Richardson, Jane S; Richardson, David C
2018-01-01
This paper describes the current update on macromolecular model validation services that are provided at the MolProbity website, emphasizing changes and additions since the previous review in 2010. There have been many infrastructure improvements, including rewrite of previous Java utilities to now use existing or newly written Python utilities in the open-source CCTBX portion of the Phenix software system. This improves long-term maintainability and enhances the thorough integration of MolProbity-style validation within Phenix. There is now a complete MolProbity mirror site at http://molprobity.manchester.ac.uk. GitHub serves our open-source code, reference datasets, and the resulting multi-dimensional distributions that define most validation criteria. Coordinate output after Asn/Gln/His "flip" correction is now more idealized, since the post-refinement step has apparently often been skipped in the past. Two distinct sets of heavy-atom-to-hydrogen distances and accompanying van der Waals radii have been researched and improved in accuracy, one for the electron-cloud-center positions suitable for X-ray crystallography and one for nuclear positions. New validations include messages at input about problem-causing format irregularities, updates of Ramachandran and rotamer criteria from the million quality-filtered residues in a new reference dataset, the CaBLAM Cα-CO virtual-angle analysis of backbone and secondary structure for cryoEM or low-resolution X-ray, and flagging of the very rare cis-nonProline and twisted peptides which have recently been greatly overused. Due to wide application of MolProbity validation and corrections by the research community, in Phenix, and at the worldwide Protein Data Bank, newly deposited structures have continued to improve greatly as measured by MolProbity's unique all-atom clashscore. © 2017 The Protein Society.
Proteome reference map and regulation network of neonatal rat cardiomyocyte
Li, Zi-jian; Liu, Ning; Han, Qi-de; Zhang, You-yi
2011-01-01
Aim: To study and establish a proteome reference map and regulation network of neonatal rat cardiomyocyte. Methods: Cultured cardiomyocytes of neonatal rats were used. All proteins expressed in the cardiomyocytes were separated and identified by two-dimensional polyacrylamide gel electrophoresis (2-DE) and matrix-assisted laser desorption/ionization-time of flight mass spectrometry (MALDI-TOF MS). Biological networks and pathways of the neonatal rat cardiomyocytes were analyzed using the Ingenuity Pathway Analysis (IPA) program (www.ingenuity.com). A 2-DE database was made accessible on-line by Make2ddb package on a web server. Results: More than 1000 proteins were separated on 2D gels, and 148 proteins were identified. The identified proteins were used for the construction of an extensible markup language-based database. Biological networks and pathways were constructed to analyze the functions associate with cardiomyocyte proteins in the database. The 2-DE database of rat cardiomyocyte proteins can be accessed at http://2d.bjmu.edu.cn. Conclusion: A proteome reference map and regulation network of the neonatal rat cardiomyocytes have been established, which may serve as an international platform for storage, analysis and visualization of cardiomyocyte proteomic data. PMID:21841810
Candidate mosaic proteins for a pan-filoviral cytotoxic T-Cell lymphocyte vaccine
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fenimore, Paul W; Fischer, William M; Kuiken, Carla
The extremely high fatality rates of many filovirus (FILV) strains the recurrent but rarely identified origin of human epidemics, the only partly identified viral reservoirs and the continuing non-human primate epizootics in Africa make a broadly-protective filovirus vaccine highly desirable. Cytotoxic T-cells (CTL) have been shown to be protective in mice, guinea pigs and non-human primates. In murine models the cytotoxic T-cell epitopes that are protective against Ebola virus have been mapped and in non-human primates CTL-mediated protection between viral strains (John Dye: specify) has been demonstrated using two filoviral proteins, nucleoprotein (NP) and glycoprotein (GP). These immunological results suggestmore » that the CTL avenue of immunity deserves consideration for a vaccine. The poorly-understood viral reservoirs means that it is difficult to predict what strains are likely to cause epidemics. Thus, there is a premium on developing a pan-filoviral vaccine. The genetic diversity of FILV is large, roughly the same scale as human immunodeficiency virus (HIV). This presents a serious challenge for the vaccine designer because a traditional vaccine aspiring to pan-filoviral coverage is likely to require the inclusion of many antigenic reagents. A recent method for optimizing cytotoxic T-cell lymphocyte epitope coverage with mosaic antigens was successful in improving potential CTL epitope coverage against HIV and may be useful in the context of very different viruses, such as the filoviruses discussed here. Mosaic proteins are recombinants composed of fragments of wild-type proteins joined at locations resulting in exclusively natural k-mers, 9 {le} k {le} 15, and having approximately the same length as the wild-type proteins. The use of mosaic antigens is motivated by three conjectures: (1) optimizing a mosaic protein to maximize coverage of k-mers found in a set of reference proteins will give better odds of including broadly-protective CTL epitopes in a vaccine than is possible with a wild-type protein, (2) reducing the number of low-prevalence k-mers minimizes the likelihood of undesirable immunodominance, and (3) excluding exogenous k-mers will result in mosaic proteins whose processing for presentation is close to what occurs with wild-type proteins. The first and second applications of the mosaic method were to HIV and Hepatitis C Virus (HCV). HIV is the virus with the largest number of known sequences, and consequently a plethora of information for the CTL vaccine designer to incorporate into their mosaics. Experience with HIV and HCV mosaics supports the validity of the three conjectures above. The available FILV sequences are probably closer to the minimum amount of information needed to make a meaningful mosaic vaccine candidate. There were 532 protein sequences in the National Institutes of Health GenPept database in November 2007 when our reference set was downloaded. These sequences come from both Ebola and Marburg viruses (EBOV and MARV), representing transcripts of all 7 genes. The coverage of viral diversity by the 7 genes is variable, with genes 1 (nucleoprotein, NP), 4 (glycoprotein, GP; soluble glycoprotein, sGP) and 7 (polymerase, L) giving the best coverage. Broadly-protective vaccine candidates for diverse viruses, such as HIV or Hepatitis C virus (HCV) have required pools of antigens. FILV is similar in this regard. While we have designed CTL mosaic proteins using all 7 types of filoviral proteins, only NP, GP and L proteins are reported here. If it were important to include other proteins in a mosaic CTL vaccine, additional sequences would be required to cover the space of known viral diversity.« less
The Pfam protein families database: towards a more sustainable future.
Finn, Robert D; Coggill, Penelope; Eberhardt, Ruth Y; Eddy, Sean R; Mistry, Jaina; Mitchell, Alex L; Potter, Simon C; Punta, Marco; Qureshi, Matloob; Sangrador-Vegas, Amaia; Salazar, Gustavo A; Tate, John; Bateman, Alex
2016-01-04
In the last two years the Pfam database (http://pfam.xfam.org) has undergone a substantial reorganisation to reduce the effort involved in making a release, thereby permitting more frequent releases. Arguably the most significant of these changes is that Pfam is now primarily based on the UniProtKB reference proteomes, with the counts of matched sequences and species reported on the website restricted to this smaller set. Building families on reference proteomes sequences brings greater stability, which decreases the amount of manual curation required to maintain them. It also reduces the number of sequences displayed on the website, whilst still providing access to many important model organisms. Matches to the full UniProtKB database are, however, still available and Pfam annotations for individual UniProtKB sequences can still be retrieved. Some Pfam entries (1.6%) which have no matches to reference proteomes remain; we are working with UniProt to see if sequences from them can be incorporated into reference proteomes. Pfam-B, the automatically-generated supplement to Pfam, has been removed. The current release (Pfam 29.0) includes 16 295 entries and 559 clans. The facility to view the relationship between families within a clan has been improved by the introduction of a new tool. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Genetics Home Reference: choroideremia
... movement of proteins and organelles within cells (intracellular trafficking). Mutations in the CHM gene lead to an ... Without the aid of Rab proteins in intracellular trafficking, cells die prematurely. The REP-1 protein is ...
Genetics Home Reference: cystinosis
... the amino acid cystine (a building block of proteins) within cells. Excess cystine damages cells and often ... gene lead to a deficiency of a transporter protein called cystinosin. Within cells, this protein normally moves ...
Ren, Xiaohu; Yang, Xifei; Hong, Wen-Xu; Huang, Peiwu; Wang, Yong; Liu, Wei; Ye, Jinbo; Huang, Haiyan; Huang, Xinfeng; Shen, Liming; Yang, Linqing; Zhuang, Zhixiong; Liu, Jianjun
2014-05-16
Trichloroethylene (TCE) is an effective solvent for a variety of organic materials. Since the wide use of TCE as industrial degreasing of metals, adhesive paint and polyvinyl chloride production, TCE has turned into an environmental and occupational toxicant. Exposure to TCE could cause severe hepatotoxicity; however, the toxic mechanisms of TCE remain poorly understood. Recently, we reported that SET protein mediated TCE-induced cytotoxicity in L-02 cells. Here, we further identified the proteins related to SET-mediated hepatic cytotoxicity of TCE using the techniques of DIGE (differential gel electrophoresis) and MALDI-TOF-MS/MS. Among the 20 differential proteins identified, 8 were found to be modulated by SET in TCE-induced cytotoxicity and three of them (cofilin-1, peroxiredoxin-2 and S100-A11) were validated by Western-blot analysis. The functional analysis revealed that most of the identified SET-modulated proteins are apoptosis-associated proteins. These data indicated that these proteins may be involved in SET-mediated hepatic cytotoxicity of TCE in L-02 cells. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Paul, Anna-Lisa; Liu, Li; McClung, Scott; Laughner, Beth; Chen, Sixue; Ferl, Robert J
2009-04-01
As a first step in the broad characterization of plant 14-3-3 multiprotein complexes in vivo, stringent and specific antibody affinity purification was used to capture 14-3-3s together with their interacting proteins from extracts of Arabidopsis cell suspension cultures. Approximately 120 proteins were identified as potential in vivo 14-3-3 interacting proteins by mass spectrometry of the recovered complexes. Comparison of the proteins in this data set with the 14-3-3 interacting proteins from a similar study in human embryonic kidney cell cultures revealed eight interacting proteins that likely represent reasonably abundant, fundamental 14-3-3 interaction complexes that are highly conserved across all eukaryotes. The Arabidopsis 14-3-3 interaction data set was also compared to a yeast in vivo 14-3-3 interaction data set. Four 14-3-3 interacting proteins are conserved in yeast, humans, and Arabidopsis. Comparisons of the data sets based on biochemical function revealed many additional similarities in the human and Arabidopsis data sets that represent conserved functional interactions, while also leaving many proteins uniquely identified in either Arabidopsis or human cells. In particular, the Arabidopsis interaction data set is enriched for proteins involved in metabolism.
Sanhueza, Carlos A; Cartmell, Jonathan; El-Hawiet, Amr; Szpacenko, Adam; Kitova, Elena N; Daneshfar, Rambod; Klassen, John S; Lang, Dean E; Eugenio, Luiz; Ng, Kenneth K-S; Kitov, Pavel I; Bundle, David R
2015-01-07
A focused library of virtual heterobifunctional ligands was generated in silico and a set of ligands with recombined fragments was synthesized and evaluated for binding to Clostridium difficile toxins. The position of the trisaccharide fragment was used as a reference for filtering docked poses during virtual screening to match the trisaccharide ligand in a crystal structure. The peptoid, a diversity fragment probing the protein surface area adjacent to a known binding site, was generated by a multi-component Ugi reaction. Our approach combines modular fragment-based design with in silico screening of synthetically feasible compounds and lays the groundwork for future efforts in development of composite bifunctional ligands for large clostridial toxins.
Ranking the whole MEDLINE database according to a large training set using text indexing.
Suomela, Brian P; Andrade, Miguel A
2005-03-24
The MEDLINE database contains over 12 million references to scientific literature, with about 3/4 of recent articles including an abstract of the publication. Retrieval of entries using queries with keywords is useful for human users that need to obtain small selections. However, particular analyses of the literature or database developments may need the complete ranking of all the references in the MEDLINE database as to their relevance to a topic of interest. This report describes a method that does this ranking using the differences in word content between MEDLINE entries related to a topic and the whole of MEDLINE, in a computational time appropriate for an article search query engine. We tested the capabilities of our system to retrieve MEDLINE references which are relevant to the subject of stem cells. We took advantage of the existing annotation of references with terms from the MeSH hierarchical vocabulary (Medical Subject Headings, developed at the National Library of Medicine). A training set of 81,416 references was constructed by selecting entries annotated with the MeSH term stem cells or some child in its sub tree. Frequencies of all nouns, verbs, and adjectives in the training set were computed and the ratios of word frequencies in the training set to those in the entire MEDLINE were used to score references. Self-consistency of the algorithm, benchmarked with a test set containing the training set and an equal number of references randomly selected from MEDLINE was better using nouns (79%) than adjectives (73%) or verbs (70%). The evaluation of the system with 6,923 references not used for training, containing 204 articles relevant to stem cells according to a human expert, indicated a recall of 65% for a precision of 65%. This strategy appears to be useful for predicting the relevance of MEDLINE references to a given concept. The method is simple and can be used with any user-defined training set. Choice of the part of speech of the words used for classification has important effects on performance. Lists of words, scripts, and additional information are available from the web address http://www.ogic.ca/projects/ks2004/.
Standard setting: comparison of two methods.
George, Sanju; Haque, M Sayeed; Oyebode, Femi
2006-09-14
The outcome of assessments is determined by the standard-setting method used. There is a wide range of standard-setting methods and the two used most extensively in undergraduate medical education in the UK are the norm-reference and the criterion-reference methods. The aims of the study were to compare these two standard-setting methods for a multiple-choice question examination and to estimate the test-retest and inter-rater reliability of the modified Angoff method. The norm-reference method of standard-setting (mean minus 1 SD) was applied to the 'raw' scores of 78 4th-year medical students on a multiple-choice examination (MCQ). Two panels of raters also set the standard using the modified Angoff method for the same multiple-choice question paper on two occasions (6 months apart). We compared the pass/fail rates derived from the norm reference and the Angoff methods and also assessed the test-retest and inter-rater reliability of the modified Angoff method. The pass rate with the norm-reference method was 85% (66/78) and that by the Angoff method was 100% (78 out of 78). The percentage agreement between Angoff method and norm-reference was 78% (95% CI 69% - 87%). The modified Angoff method had an inter-rater reliability of 0.81-0.82 and a test-retest reliability of 0.59-0.74. There were significant differences in the outcomes of these two standard-setting methods, as shown by the difference in the proportion of candidates that passed and failed the assessment. The modified Angoff method was found to have good inter-rater reliability and moderate test-retest reliability.
Development of a novel set of Gateway-compatible vectors for live imaging in insect cells.
Maroniche, G A; Mongelli, V C; Alfonso, V; Llauger, G; Taboga, O; del Vas, Mariana
2011-10-01
Insect genomics is a growing area of research. To exploit fully the genomic data that are being generated, high-throughput systems for the functional characterization of insect proteins and their interactomes are required. In this work, a Gateway-compatible vector set for expression of fluorescent fusion proteins in insect cells was developed. The vector set was designed to express a protein of interest fused to any of four different fluorescent proteins [green fluorescent protein (GFP), cyan fluorescent protein (CFP), yellow fluorescent protein (YFP) and mCherry] by either the C-terminal or the N-terminal ends. Additionally, a collection of organelle-specific fluorescent markers was assembled for colocalization with fluorescent recombinant proteins of interest. Moreover, the vector set was proven to be suitable for simultaneously detecting up to three proteins by multiple labelling. The use of the vector set was exemplified by defining the subcellular distribution of Mal de Río Cuarto virus (MRCV) outer coat protein P10 and by analysing the in vivo self-interaction of the MRCV viroplasm matrix protein P9-1 in Förster resonance energy transfer (FRET) experiments. In conclusion, we have developed a valuable tool for high-throughput studies of protein subcellular localization that will aid in the elucidation of the function of newly described insect and virus proteins. © 2011 The Authors. Insect Molecular Biology © 2011 The Royal Entomological Society.
Assessment of the reliability of protein-protein interactions and protein function prediction.
Deng, Minghua; Sun, Fengzhu; Chen, Ting
2003-01-01
As more and more high-throughput protein-protein interaction data are collected, the task of estimating the reliability of different data sets becomes increasingly important. In this paper, we present our study of two groups of protein-protein interaction data, the physical interaction data and the protein complex data, and estimate the reliability of these data sets using three different measurements: (1) the distribution of gene expression correlation coefficients, (2) the reliability based on gene expression correlation coefficients, and (3) the accuracy of protein function predictions. We develop a maximum likelihood method to estimate the reliability of protein interaction data sets according to the distribution of correlation coefficients of gene expression profiles of putative interacting protein pairs. The results of the three measurements are consistent with each other. The MIPS protein complex data have the highest mean gene expression correlation coefficients (0.256) and the highest accuracy in predicting protein functions (70% sensitivity and specificity), while Ito's Yeast two-hybrid data have the lowest mean (0.041) and the lowest accuracy (15% sensitivity and specificity). Uetz's data are more reliable than Ito's data in all three measurements, and the TAP protein complex data are more reliable than the HMS-PCI data in all three measurements as well. The complex data sets generally perform better in function predictions than do the physical interaction data sets. Proteins in complexes are shown to be more highly correlated in gene expression. The results confirm that the components of a protein complex can be assigned to functions that the complex carries out within a cell. There are three interaction data sets different from the above two groups: the genetic interaction data, the in-silico data and the syn-express data. Their capability of predicting protein functions generally falls between that of the Y2H data and that of the MIPS protein complex data. The supplementary information is available at the following Web site: http://www-hto.usc.edu/-msms/AssessInteraction/.
Turnover of Lipidated LC3 and Autophagic Cargoes in Mammalian Cells.
Rodríguez-Arribas, M; Yakhine-Diop, S M S; González-Polo, R A; Niso-Santano, M; Fuentes, J M
2017-01-01
Macroautophagy (usually referred to as autophagy) is the most important degradation system in mammalian cells. It is responsible for the elimination of protein aggregates, organelles, and other cellular content. During autophagy, these materials (i.e., cargo) must be engulfed by a double-membrane structure called an autophagosome, which delivers the cargo to the lysosome to complete its degradation. Autophagy is a very dynamic pathway called autophagic flux. The process involves all the steps that are implicated in cargo degradation from autophagosome formation. There are several techniques to monitor autophagic flux. Among them, the method most used experimentally to assess autophagy is the detection of LC3 protein processing and p62 degradation by Western blotting. In this chapter, we provide a detailed and straightforward protocol for this purpose in cultured mammalian cells, including a brief set of notes concerning problems associated with the Western-blotting detection of LC3 and p62. © 2017 Elsevier Inc. All rights reserved.
Automatic tracking of cells for video microscopy in patch clamp experiments
2014-01-01
Background Visualisation of neurons labeled with fluorescent proteins or compounds generally require exposure to intense light for a relatively long period of time, often leading to bleaching of the fluorescent probe and photodamage of the tissue. Here we created a technique to drastically shorten light exposure and improve the targeting of fluorescent labeled cells that is specially useful for patch-clamp recordings. We applied image tracking and mask overlay to reduce the time of fluorescence exposure and minimise mistakes when identifying neurons. Methods Neurons are first identified according to visual criteria (e.g. fluorescence protein expression, shape, viability etc.) and a transmission microscopy image Differential Interference Contrast (DIC) or Dodt contrast containing the cell used as a reference for the tracking algorithm. A fluorescence image can also be acquired later to be used as a mask (that can be overlaid on the target during live transmission video). As patch-clamp experiments require translating the microscope stage, we used pattern matching to track reference neurons in order to move the fluorescence mask to match the new position of the objective in relation to the sample. For the image processing we used the Open Source Computer Vision (OpenCV) library, including the Speeded-Up Robust Features (SURF) for tracking cells. The dataset of images (n = 720) was analyzed under normal conditions of acquisition and with influence of noise (defocusing and brightness). Results We validated the method in dissociated neuronal cultures and fresh brain slices expressing Enhanced Yellow Fluorescent Protein (eYFP) or Tandem Dimer Tomato (tdTomato) proteins, which considerably decreased the exposure to fluorescence excitation, thereby minimising photodamage. We also show that the neuron tracking can be used in differential interference contrast or Dodt contrast microscopy. Conclusion The techniques of digital image processing used in this work are an important addition to the set of microscopy tools used in modern electrophysiology, specially in experiments with neuron cultures and brain slices. PMID:24946774
Automatic tracking of cells for video microscopy in patch clamp experiments.
Peixoto, Helton M; Munguba, Hermany; Cruz, Rossana M S; Guerreiro, Ana M G; Leao, Richardson N
2014-06-20
Visualisation of neurons labeled with fluorescent proteins or compounds generally require exposure to intense light for a relatively long period of time, often leading to bleaching of the fluorescent probe and photodamage of the tissue. Here we created a technique to drastically shorten light exposure and improve the targeting of fluorescent labeled cells that is specially useful for patch-clamp recordings. We applied image tracking and mask overlay to reduce the time of fluorescence exposure and minimise mistakes when identifying neurons. Neurons are first identified according to visual criteria (e.g. fluorescence protein expression, shape, viability etc.) and a transmission microscopy image Differential Interference Contrast (DIC) or Dodt contrast containing the cell used as a reference for the tracking algorithm. A fluorescence image can also be acquired later to be used as a mask (that can be overlaid on the target during live transmission video). As patch-clamp experiments require translating the microscope stage, we used pattern matching to track reference neurons in order to move the fluorescence mask to match the new position of the objective in relation to the sample. For the image processing we used the Open Source Computer Vision (OpenCV) library, including the Speeded-Up Robust Features (SURF) for tracking cells. The dataset of images (n = 720) was analyzed under normal conditions of acquisition and with influence of noise (defocusing and brightness). We validated the method in dissociated neuronal cultures and fresh brain slices expressing Enhanced Yellow Fluorescent Protein (eYFP) or Tandem Dimer Tomato (tdTomato) proteins, which considerably decreased the exposure to fluorescence excitation, thereby minimising photodamage. We also show that the neuron tracking can be used in differential interference contrast or Dodt contrast microscopy. The techniques of digital image processing used in this work are an important addition to the set of microscopy tools used in modern electrophysiology, specially in experiments with neuron cultures and brain slices.
Genetics Home Reference: Wolff-Parkinson-White syndrome
... protein that is part of an enzyme called AMP-activated protein kinase (AMPK). This enzyme helps sense ... suggests that these mutations alter the activity of AMP-activated protein kinase in the heart, although it ...
Genetics Home Reference: mitochondrial trifunctional protein deficiency
... protein deficiency Orphanet: Mitochondrial trifunctional protein deficiency Screening, Technology, and Research in Genetics Virginia Department of Health (PDF) Patient Support and Advocacy Resources (4 links) Children Living with Inherited Metabolic Diseases (CLIMB) Children's Mitochondrial ...
Genetics Home Reference: Opitz G/BBB syndrome
... of cells (cell migration). Midline-1 assists in recycling certain proteins that need to be reused instead ... decrease in midline-1 function, which prevents protein recycling. The resulting accumulation of proteins impairs microtubule function, ...
Lung Reference Set A Application: LaszloTakacs - Biosystems (2010) — EDRN Public Portal
We would like to access the NCI lung cancer Combined Pre-Validation Reference Set A in order to further validate a lung cancer diagnostic test candidate. Our test is based on a panel of antibodies which have been tested on 4 different cohorts (see below, paragraph “Preliminary Data and Methods”). This Reference Set A, whose clinical setting is “Diagnosis of lung cancer”, will be used to validate the panel of monoclonal antibodies which have been demonstrated by extensive data analysis to provide the best discrimination between controls and Lung Cancer patient plasma samples, sensitivity and specificity values from ROC analyses are superior than 85 %.
Phylogenetically informed logic relationships improve detection of biological network organization
2011-01-01
Background A "phylogenetic profile" refers to the presence or absence of a gene across a set of organisms, and it has been proven valuable for understanding gene functional relationships and network organization. Despite this success, few studies have attempted to search beyond just pairwise relationships among genes. Here we search for logic relationships involving three genes, and explore its potential application in gene network analyses. Results Taking advantage of a phylogenetic matrix constructed from the large orthologs database Roundup, we invented a method to create balanced profiles for individual triplets of genes that guarantee equal weight on the different phylogenetic scenarios of coevolution between genes. When we applied this idea to LAPP, the method to search for logic triplets of genes, the balanced profiles resulted in significant performance improvement and the discovery of hundreds of thousands more putative triplets than unadjusted profiles. We found that logic triplets detected biological network organization and identified key proteins and their functions, ranging from neighbouring proteins in local pathways, to well separated proteins in the whole pathway, and to the interactions among different pathways at the system level. Finally, our case study suggested that the directionality in a logic relationship and the profile of a triplet could disclose the connectivity between the triplet and surrounding networks. Conclusion Balanced profiles are superior to the raw profiles employed by traditional methods of phylogenetic profiling in searching for high order gene sets. Gene triplets can provide valuable information in detection of biological network organization and identification of key genes at different levels of cellular interaction. PMID:22172058
Pareja, Eduardo; Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Bonal, Javier; Tobes, Raquel
2006-01-01
Background Transcriptional regulation processes are the principal mechanisms of adaptation in prokaryotes. In these processes, the regulatory proteins and the regulatory DNA signals located in extragenic regions are the key elements involved. As all extragenic spaces are putative regulatory regions, ExtraTrain covers all extragenic regions of available genomes and regulatory proteins from bacteria and archaea included in the UniProt database. Description ExtraTrain provides integrated and easily manageable information for 679816 extragenic regions and for the genes delimiting each of them. In addition ExtraTrain supplies a tool to explore extragenic regions, named Palinsight, oriented to detect and search palindromic patterns. This interactive visual tool is totally integrated in the database, allowing the search for regulatory signals in user defined sets of extragenic regions. The 26046 regulatory proteins included in ExtraTrain belong to the families AraC/XylS, ArsR, AsnC, Cold shock domain, CRP-FNR, DeoR, GntR, IclR, LacI, LuxR, LysR, MarR, MerR, NtrC/Fis, OmpR and TetR. The database follows the InterPro criteria to define these families. The information about regulators includes manually curated sets of references specifically associated to regulator entries. In order to achieve a sustainable and maintainable knowledge database ExtraTrain is a platform open to the contribution of knowledge by the scientific community providing a system for the incorporation of textual knowledge. Conclusion ExtraTrain is a new database for exploring Extragenic regions and Transcriptional information in bacteria and archaea. ExtraTrain database is available at . PMID:16539733
Analysis of the structure and dynamics of human serum albumin.
Guizado, T R Cuya
2014-10-01
Human serum albumin (HSA) is a biologically relevant protein that binds a variety of drugs and other small molecules. No less than 50 structures are deposited in the RCSB Protein Data Bank (PDB). Based on these structures, we first performed a clustering analysis. Despite the diversity of ligands, only two well defined conformations are detected, with a deviation of 0.46 nm between the average structures of the two clusters, while deviations within each cluster are smaller than 0.08 nm. Those two conformations are representative of the apoprotein and the HSA-myristate complex already identified in previous literature. Considering the structures within each cluster as a representative sample of the dynamical states of the corresponding conformation, we scrutinize the structural and dynamical differences between both conformations. Analysis of the fluctuations within each cluster set reveals that domain II is the most rigid one and better matches both structures. Then, taking this domain as reference, we show that the structural difference between both conformations can be expressed in terms of twist and hinge motions of domains I and III, respectively. We also characterize the dynamical difference between conformations by computing correlations and principal components for each set of dynamical states. The two conformations display different collective motions. The results are compared with those obtained from the trajectories of short molecular dynamics simulations, giving consistent outcomes. Let us remark that, beyond the relevance of the results for the structural and dynamical characterization of HAS conformations, the present methodology could be extended to other proteins in the PDB archive.
Genome activation by raspberry bushy dwarf virus coat protein.
Macfarlane, Stuart A; McGavin, Wendy J
2009-03-01
Two sets of infectious cDNA clones of raspberry bushy dwarf virus (RBDV) have been constructed, enabling either the synthesis of infectious RNA transcripts or the delivery of infectious binary plasmid DNA by infiltration of Agrobacterium tumefaciens. In whole plants and in protoplasts, inoculation of RBDV RNA1 and RNA2 transcripts led to a low level of infection, which was greatly increased by the addition of RNA3, a subgenomic RNA coding for the RBDV coat protein (CP). Agroinfiltration of RNA1 and RNA2 constructs did not produce a detectable infection but, again, inclusion of a construct encoding the CP led to high levels of infection. Thus, RBDV replication is greatly stimulated by the presence of the CP, a mechanism that also operates with ilarviruses and alfalfa mosaic virus, where it is referred to as genome activation. Mutation to remove amino acids from the N terminus of the CP showed that the first 15 RBDV CP residues are not required for genome activation. Other experiments, in which overlapping regions at the CP N terminus were fused to the monomeric red fluorescent protein, showed that sequences downstream of the first 48 aa are not absolutely required for genome activation.
Hood-Degrenier, Jennifer K
2008-01-01
The movement of newly synthesized proteins through the endomembrane system of eukaryotic cells, often referred to generally as the secretory pathway, is a topic covered in most intermediate-level undergraduate cell biology courses. An article previously published in this journal described a laboratory exercise in which yeast mutants defective in two distinct steps of protein secretion were differentiated using a genetic reporter designed specifically to identify defects in the first step of the pathway, the insertion of proteins into the endoplasmic reticulum (Vallen, 2002). We have developed two versions of a Western blotting assay that serves as a second way of distinguishing the two secretory mutants, which we pair with the genetic assay in a 3-wk laboratory module. A quiz administered before and after students participated in the lab activities revealed significant postlab gains in their understanding of the secretory pathway and experimental techniques used to study it. A second survey administered at the end of the lab module assessed student perceptions of the efficacy of the lab activities; the results of this survey indicated that the experiments were successful in meeting a set of educational goals defined by the instructor.
ERIC Educational Resources Information Center
Yoder, N.; Darling-Churchill, K.; Colombi, G. D.; Ruddy, S.; Neiman, S.; Chagnon, E.; Mayo, R.
2017-01-01
This reference manual identifies five overarching sets of activities for improving school climate, with the goal of improving student outcomes (e.g., achievement, attendance, behaviors, and skills). These sets of activities help to initiate, implement, and sustain school climate improvements. For each activity set, the manual presents a clear…
Wang, Hongbin; Zhang, Yongqian; Gui, Shuqi; Zhang, Yong; Lu, Fuping; Deng, Yulin
2017-08-15
Comparisons across large numbers of samples are frequently necessary in quantitative proteomics. Many quantitative methods used in proteomics are based on stable isotope labeling, but most of these are only useful for comparing two samples. For up to eight samples, the iTRAQ labeling technique can be used. For greater numbers of samples, the label-free method has been used, but this method was criticized for low reproducibility and accuracy. An ingenious strategy has been introduced, comparing each sample against a 18 O-labeled reference sample that was created by pooling equal amounts of all samples. However, it is necessary to use proportion-known protein mixtures to investigate and evaluate this new strategy. Another problem for comparative proteomics of multiple samples is the poor coincidence and reproducibility in protein identification results across samples. In present study, a method combining 18 O-reference strategy and a quantitation and identification-decoupled strategy was investigated with proportion-known protein mixtures. The results obviously demonstrated that the 18 O-reference strategy had greater accuracy and reliability than other previously used comparison methods based on transferring comparison or label-free strategies. By the decoupling strategy, the quantification data acquired by LC-MS and the identification data acquired by LC-MS/MS are matched and correlated to identify differential expressed proteins, according to retention time and accurate mass. This strategy made protein identification possible for all samples using a single pooled sample, and therefore gave a good reproducibility in protein identification across multiple samples, and allowed for optimizing peptide identification separately so as to identify more proteins. Copyright © 2017 Elsevier B.V. All rights reserved.
Explicit reference governor for linear systems
NASA Astrophysics Data System (ADS)
Garone, Emanuele; Nicotra, Marco; Ntogramatzidis, Lorenzo
2018-06-01
The explicit reference governor is a constrained control scheme that was originally introduced for generic nonlinear systems. This paper presents two explicit reference governor strategies that are specifically tailored for the constrained control of linear time-invariant systems subject to linear constraints. Both strategies are based on the idea of maintaining the system states within an invariant set which is entirely contained in the constraints. This invariant set can be constructed by exploiting either the Lyapunov inequality or modal decomposition. To improve the performance, we show that the two strategies can be combined by choosing at each time instant the least restrictive set. Numerical simulations illustrate that the proposed scheme achieves performances that are comparable to optimisation-based reference governors.
We are requesting the reference set, which includes 50 HCC cases and 50 cirrhotic controls. In our preliminary study, AFP had a AUROC of 0.66 while the AUROC for the 5 glycoproteins was 0.81. The sensitivity and specificity for the 5 glycoproteins was 79% and 72% at the point that maximizes sensitivity+specificity in the ROC curve, and it was 79% and 35%, respectively, for AFP at the same point in the ROC curve. The reference set will allow us to determine the best performance of the 5 glycoproteins by themselves or whether their combination has a better sensitivity and/or specificity and AUROC. While a direct comparison with AFP will be made, the reference set will not allow a robust comparison due to the low sample size. If the glycoproteins are complementary or have better performance than AFP, then the next step would be to test them in the entire phase 2 hepatocellular carcinoma set.
Liver Rapid Reference Set Application ( #2): Lubman - Univ of Michigan (2010) — EDRN Public Portal
We are requesting the reference set, which includes 50 HCC cases and 50 cirrhotic controls. In our preliminary study, AFP had a AUROC of 0.66 while the AUROC for the 5 glycoproteins was 0.81. The sensitivity and specificity for the 5 glycoproteins was 79% and 72% at the point that maximizes sensitivity+specificity in the ROC curve, and it was 79% and 35%, respectively, for AFP at the same point in the ROC curve. The reference set will allow us to determine the best performance of the 5 glycoproteins by themselves or whether their combination has a better sensitivity and/or specificity and AUROC. While a direct comparison with AFP will be made, the reference set will not allow a robust comparison due to the low sample size. If the glycoproteins are complementary or have better performance than AFP, then the next step would be to test them in the entire phase 2 hepatocellular carcinoma set.
Protein and Genetic Composition of Four Chromatin Types in Drosophila melanogaster Cell Lines.
Boldyreva, Lidiya V; Goncharov, Fyodor P; Demakova, Olga V; Zykova, Tatyana Yu; Levitsky, Victor G; Kolesnikov, Nikolay N; Pindyurin, Alexey V; Semeshin, Valeriy F; Zhimulev, Igor F
2017-04-01
Recently, we analyzed genome-wide protein binding data for the Drosophila cell lines S2, Kc, BG3 and Cl.8 (modENCODE Consortium) and identified a set of 12 proteins enriched in the regions corresponding to interbands of salivary gland polytene chromosomes. Using these data, we developed a bioinformatic pipeline that partitioned the Drosophila genome into four chromatin types that we hereby refer to as aquamarine, lazurite, malachite and ruby. Here, we describe the properties of these chromatin types across different cell lines. We show that aquamarine chromatin tends to harbor transcription start sites (TSSs) and 5' untranslated regions (5'UTRs) of the genes, is enriched in diverse "open" chromatin proteins, histone modifications, nucleosome remodeling complexes and transcription factors. It encompasses most of the tRNA genes and shows enrichment for non-coding RNAs and miRNA genes. Lazurite chromatin typically encompasses gene bodies. It is rich in proteins involved in transcription elongation. Frequency of both point mutations and natural deletion breakpoints is elevated within lazurite chromatin. Malachite chromatin shows higher frequency of insertions of natural transposons. Finally, ruby chromatin is enriched for proteins and histone modifications typical for the "closed" chromatin. Ruby chromatin has a relatively low frequency of point mutations and is essentially devoid of miRNA and tRNA genes. Aquamarine and ruby chromatin types are highly stable across cell lines and have contrasting properties. Lazurite and malachite chromatin types also display characteristic protein composition, as well as enrichment for specific genomic features. We found that two types of chromatin, aquamarine and ruby, retain their complementary protein patterns in four Drosophila cell lines.
2015-01-01
The rapidly expanding availability of high-resolution mass spectrometry has substantially enhanced the ion-current-based relative quantification techniques. Despite the increasing interest in ion-current-based methods, quantitative sensitivity, accuracy, and false discovery rate remain the major concerns; consequently, comprehensive evaluation and development in these regards are urgently needed. Here we describe an integrated, new procedure for data normalization and protein ratio estimation, termed ICan, for improved ion-current-based analysis of data generated by high-resolution mass spectrometry (MS). ICan achieved significantly better accuracy and precision, and lower false-positive rate for discovering altered proteins, over current popular pipelines. A spiked-in experiment was used to evaluate the performance of ICan to detect small changes. In this study E. coli extracts were spiked with moderate-abundance proteins from human plasma (MAP, enriched by IgY14-SuperMix procedure) at two different levels to set a small change of 1.5-fold. Forty-five (92%, with an average ratio of 1.71 ± 0.13) of 49 identified MAP protein (i.e., the true positives) and none of the reference proteins (1.0-fold) were determined as significantly altered proteins, with cutoff thresholds of ≥1.3-fold change and p ≤ 0.05. This is the first study to evaluate and prove competitive performance of the ion-current-based approach for assigning significance to proteins with small changes. By comparison, other methods showed remarkably inferior performance. ICan can be broadly applicable to reliable and sensitive proteomic survey of multiple biological samples with the use of high-resolution MS. Moreover, many key features evaluated and optimized here such as normalization, protein ratio determination, and statistical analyses are also valuable for data analysis by isotope-labeling methods. PMID:25285707
Automated de novo phasing and model building of coiled-coil proteins.
Rämisch, Sebastian; Lizatović, Robert; André, Ingemar
2015-03-01
Models generated by de novo structure prediction can be very useful starting points for molecular replacement for systems where suitable structural homologues cannot be readily identified. Protein-protein complexes and de novo-designed proteins are examples of systems that can be challenging to phase. In this study, the potential of de novo models of protein complexes for use as starting points for molecular replacement is investigated. The approach is demonstrated using homomeric coiled-coil proteins, which are excellent model systems for oligomeric systems. Despite the stereotypical fold of coiled coils, initial phase estimation can be difficult and many structures have to be solved with experimental phasing. A method was developed for automatic structure determination of homomeric coiled coils from X-ray diffraction data. In a benchmark set of 24 coiled coils, ranging from dimers to pentamers with resolutions down to 2.5 Å, 22 systems were automatically solved, 11 of which had previously been solved by experimental phasing. The generated models contained 71-103% of the residues present in the deposited structures, had the correct sequence and had free R values that deviated on average by 0.01 from those of the respective reference structures. The electron-density maps were of sufficient quality that only minor manual editing was necessary to produce final structures. The method, named CCsolve, combines methods for de novo structure prediction, initial phase estimation and automated model building into one pipeline. CCsolve is robust against errors in the initial models and can readily be modified to make use of alternative crystallographic software. The results demonstrate the feasibility of de novo phasing of protein-protein complexes, an approach that could also be employed for other small systems beyond coiled coils.
Can multi-subpopulation reference sets improve the genomic predictive ability for pigs?
Fangmann, A; Bergfelder-Drüing, S; Tholen, E; Simianer, H; Erbe, M
2015-12-01
In most countries and for most livestock species, genomic evaluations are obtained from within-breed analyses. To achieve reliable breeding values, however, a sufficient reference sample size is essential. To increase this size, the use of multibreed reference populations for small populations is considered a suitable option in other species. Over decades, the separate breeding work of different pig breeding organizations in Germany has led to stratified subpopulations in the breed German Large White. Due to this fact and the limited number of Large White animals available in each organization, there was a pressing need for ascertaining if multi-subpopulation genomic prediction is superior compared with within-subpopulation prediction in pigs. Direct genomic breeding values were estimated with genomic BLUP for the trait "number of piglets born alive" using genotype data (Illumina Porcine 60K SNP BeadChip) from 2,053 German Large White animals from five different commercial pig breeding companies. To assess the prediction accuracy of within- and multi-subpopulation reference sets, a random 5-fold cross-validation with 20 replications was performed. The five subpopulations considered were only slightly differentiated from each other. However, the prediction accuracy of the multi-subpopulations approach was not better than that of the within-subpopulation evaluation, for which the predictive ability was already high. Reference sets composed of closely related multi-subpopulation sets performed better than sets of distantly related subpopulations but not better than the within-subpopulation approach. Despite the low differentiation of the five subpopulations, the genetic connectedness between these different subpopulations seems to be too small to improve the prediction accuracy by applying multi-subpopulation reference sets. Consequently, resources should be used for enlarging the reference population within subpopulation, for example, by adding genotyped females.
Effect of defuzzification method of fuzzy modeling
NASA Astrophysics Data System (ADS)
Lapohos, Tibor; Buchal, Ralph O.
1994-10-01
Imprecision can arise in fuzzy relational modeling as a result of fuzzification, inference and defuzzification. These three sources of imprecision are difficult to separate. We have determined through numerical studies that an important source of imprecision is the defuzzification stage. This imprecision adversely affects the quality of the model output. The most widely used defuzzification algorithm is known by the name of `center of area' (COA) or `center of gravity' (COG). In this paper, we show that this algorithm not only maps the near limit values of the variables improperly but also introduces errors for middle domain values of the same variables. Furthermore, the behavior of this algorithm is a function of the shape of the reference sets. We compare the COA method to the weighted average of cluster centers (WACC) procedure in which the transformation is carried out based on the values of the cluster centers belonging to each of the reference membership functions instead of using the functions themselves. We show that this procedure is more effective and computationally much faster than the COA. The method is tested for a family of reference sets satisfying certain constraints, that is, for any support value the sum of reference membership function values equals one and the peak values of the two marginal membership functions project to the boundaries of the universe of discourse. For all the member sets of this family of reference sets the defuzzification errors do not get bigger as the linguistic variables tend to their extreme values. In addition, the more reference sets that are defined for a certain linguistic variable, the less the average defuzzification error becomes. In case of triangle shaped reference sets there is no defuzzification error at all. Finally, an alternative solution is provided that improves the performance of the COA method.
Lee, Sun Eun; Stewart, Christine P; Schulze, Kerry J; Cole, Robert N; Wu, Lee S-F; Yager, James D; Groopman, John D; Khatry, Subarna K; Adhikari, Ramesh Kant; Christian, Parul; West, Keith P
2017-03-01
Background: Malnutrition affects body growth, size, and composition of children. Yet, few functional biomarkers are known to be associated with childhood morphology. Objective: This cross-sectional study examined associations of anthropometric indicators of height, musculature, and fat mass with plasma proteins by using proteomics in a population cohort of school-aged Nepalese children. Methods: Height, weight, midupper arm circumference (MUAC), triceps and subscapular skinfolds, upper arm muscle area (AMA), and arm fat area (AFA) were assessed in 500 children 6-8 y of age. Height-for-age z scores (HAZs), weight-for-age z scores (WAZs), and body mass index-for-age z scores (BAZs) were derived from the WHO growth reference. Relative protein abundance was quantified by using tandem mass spectrometry. Protein-anthropometry associations were evaluated by linear mixed-effects models and identified as having a false discovery rate ( q ) <5%. Results: Among 982 proteins, 1, 10, 14, and 17 proteins were associated with BAZ, HAZ, MUAC, and AMA, respectively ( q < 0.05). Insulin-like growth factor (IGF)-I, 2 IGF-binding proteins, and carnosinase-1 were associated with both HAZ and AMA. Proteins involved in nutrient transport, activation of innate immunity, and bone mineralization were associated with HAZ. Several extracellular matrix proteins were positively associated with AMA alone. The proteomes of MUAC and AMA substantially overlapped, whereas no proteins were associated with AFA or triceps and subscapular skinfolds. Myosin light-chain kinase, possibly reflecting leakage from muscle, was inversely associated with BAZ. The proteome of WAZ was the largest ( n = 33) and most comprehensive, including proteins involved in neural development and oxidative stress response, among others. Conclusions: Plasma proteomics confirmed known biomarkers of childhood growth and revealed novel proteins associated with lean mass in chronically undernourished children. Identified proteins may serve as candidates for assessing growth and nutritional status of children in similar undernourished settings. The antenatal micronutrient supplementation trial yielding the study cohort of children was registered at clinicaltrials.gov as NCT00115271.
Protein quality and growth in malnourished children
USDA-ARS?s Scientific Manuscript database
Protein quality refers to the amounts and ratios of essential amino acids in a food. Two methods most commonly used for determining protein quality are the protein digestibility-corrected amino acid score (PDCAAS) and the digestible indispensible amino acid score (DIAAS). To use existing literature ...
Imai, Takashi; Kovalenko, Andriy; Hirata, Fumio
2005-04-14
The three-dimensional reference interaction site model (3D-RISM) theory is applied to the analysis of hydration effects on the partial molar volume of proteins. For the native structure of some proteins, the partial molar volume is decomposed into geometric and hydration contributions using the 3D-RISM theory combined with the geometric volume calculation. The hydration contributions are correlated with the surface properties of the protein. The thermal volume, which is the volume of voids around the protein induced by the thermal fluctuation of water molecules, is directly proportional to the accessible surface area of the protein. The interaction volume, which is the contribution of electrostatic interactions between the protein and water molecules, is apparently governed by the charged atomic groups on the protein surface. The polar atomic groups do not make any contribution to the interaction volume. The volume differences between low- and high-pressure structures of lysozyme are also analyzed by the present method.
Genetics Home Reference: Sheldon-Hall syndrome
... proteins that are involved in muscle tensing (contraction). Muscle contraction occurs when thick filaments made of proteins called ... early development of the muscles. The process of muscle contraction is controlled (regulated) by other proteins called troponins ...
Saha, Sudipto; Dazard, Jean-Eudes; Xu, Hua; Ewing, Rob M.
2013-01-01
Large-scale protein–protein interaction data sets have been generated for several species including yeast and human and have enabled the identification, quantification, and prediction of cellular molecular networks. Affinity purification-mass spectrometry (AP-MS) is the preeminent methodology for large-scale analysis of protein complexes, performed by immunopurifying a specific “bait” protein and its associated “prey” proteins. The analysis and interpretation of AP-MS data sets is, however, not straightforward. In addition, although yeast AP-MS data sets are relatively comprehensive, current human AP-MS data sets only sparsely cover the human interactome. Here we develop a framework for analysis of AP-MS data sets that addresses the issues of noise, missing data, and sparsity of coverage in the context of a current, real world human AP-MS data set. Our goal is to extend and increase the density of the known human interactome by integrating bait–prey and cocomplexed preys (prey–prey associations) into networks. Our framework incorporates a score for each identified protein, as well as elements of signal processing to improve the confidence of identified protein–protein interactions. We identify many protein networks enriched in known biological processes and functions. In addition, we show that integrated bait–prey and prey–prey interactions can be used to refine network topology and extend known protein networks. PMID:22845868
Cankorur-Cetinkaya, Ayca; Dereli, Elif; Eraslan, Serpil; Karabekmez, Erkan; Dikicioglu, Duygu; Kirdar, Betul
2012-01-01
Background Understanding the dynamic mechanism behind the transcriptional organization of genes in response to varying environmental conditions requires time-dependent data. The dynamic transcriptional response obtained by real-time RT-qPCR experiments could only be correctly interpreted if suitable reference genes are used in the analysis. The lack of available studies on the identification of candidate reference genes in dynamic gene expression studies necessitates the identification and the verification of a suitable gene set for the analysis of transient gene expression response. Principal Findings In this study, a candidate reference gene set for RT-qPCR analysis of dynamic transcriptional changes in Saccharomyces cerevisiae was determined using 31 different publicly available time series transcriptome datasets. Ten of the twelve candidates (TPI1, FBA1, CCW12, CDC19, ADH1, PGK1, GCN4, PDC1, RPS26A and ARF1) we identified were not previously reported as potential reference genes. Our method also identified the commonly used reference genes ACT1 and TDH3. The most stable reference genes from this pool were determined as TPI1, FBA1, CDC19 and ACT1 in response to a perturbation in the amount of available glucose and as FBA1, TDH3, CCW12 and ACT1 in response to a perturbation in the amount of available ammonium. The use of these newly proposed gene sets outperformed the use of common reference genes in the determination of dynamic transcriptional response of the target genes, HAP4 and MEP2, in response to relaxation from glucose and ammonium limitations, respectively. Conclusions A candidate reference gene set to be used in dynamic real-time RT-qPCR expression profiling in yeast was proposed for the first time in the present study. Suitable pools of stable reference genes to be used under different experimental conditions could be selected from this candidate set in order to successfully determine the expression profiles for the genes of interest. PMID:22675547
How many atoms are required to characterize accurately trajectory fluctuations of a protein?
NASA Astrophysics Data System (ADS)
Cukier, Robert I.
2010-06-01
Large molecules, whose thermal fluctuations sample a complex energy landscape, exhibit motions on an extended range of space and time scales. Principal component analysis (PCA) is often used to extract dominant motions that in proteins are typically domain motions. These motions are captured in the large eigenvalue (leading) principal components. There is also information in the small eigenvalues, arising from approximate linear dependencies among the coordinates. These linear dependencies suggest that instead of using all the atom coordinates to represent a trajectory, it should be possible to use a reduced set of coordinates with little loss in the information captured by the large eigenvalue principal components. In this work, methods that can monitor the correlation (overlap) between a reduced set of atoms and any number of retained principal components are introduced. For application to trajectory data generated by simulations, where the overall translational and rotational motion needs to be eliminated before PCA is carried out, some difficulties with the overlap measures arise and methods are developed to overcome them. The overlap measures are evaluated for a trajectory generated by molecular dynamics for the protein adenylate kinase, which consists of a stable, core domain, and two more mobile domains, referred to as the LID domain and the AMP-binding domain. The use of reduced sets corresponding, for the smallest set, to one-eighth of the alpha carbon (CA) atoms relative to using all the CA atoms is shown to predict the dominant motions of adenylate kinase. The overlap between using all the CA atoms and all the backbone atoms is essentially unity for a sum over PCA modes that effectively capture the exact trajectory. A reduction to a few atoms (three in the LID and three in the AMP-binding domain) shows that at least the first principal component, characterizing a large part of the LID-binding and AMP-binding motion, is well described. Based on these results, the overlap criterion should be applicable as a guide to postulating and validating coarse-grained descriptions of generic biomolecular assemblies.
Yu, Jingkai; Finley, Russell L
2009-01-01
High-throughput experimental and computational methods are generating a wealth of protein-protein interaction data for a variety of organisms. However, data produced by current state-of-the-art methods include many false positives, which can hinder the analyses needed to derive biological insights. One way to address this problem is to assign confidence scores that reflect the reliability and biological significance of each interaction. Most previously described scoring methods use a set of likely true positives to train a model to score all interactions in a dataset. A single positive training set, however, may be biased and not representative of true interaction space. We demonstrate a method to score protein interactions by utilizing multiple independent sets of training positives to reduce the potential bias inherent in using a single training set. We used a set of benchmark yeast protein interactions to show that our approach outperforms other scoring methods. Our approach can also score interactions across data types, which makes it more widely applicable than many previously proposed methods. We applied the method to protein interaction data from both Drosophila melanogaster and Homo sapiens. Independent evaluations show that the resulting confidence scores accurately reflect the biological significance of the interactions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Almeida, Luciana O.; Goto, Renata N.; Neto, Marinaldo P.C.
We hypothesized that SET, a protein accumulated in some cancer types and Alzheimer disease, is involved in cell death through mitochondrial mechanisms. We addressed the mRNA and protein levels of the mitochondrial uncoupling proteins UCP1, UCP2 and UCP3 (S and L isoforms) by quantitative real-time PCR and immunofluorescence as well as other mitochondrial involvements, in HEK293 cells overexpressing the SET protein (HEK293/SET), either in the presence or absence of oxidative stress induced by the pro-oxidant t-butyl hydroperoxide (t-BHP). SET overexpression in HEK293 cells decreased UCP1 and increased UCP2 and UCP3 (S/L) mRNA and protein levels, whilst also preventing lipid peroxidationmore » and decreasing the content of cellular ATP. SET overexpression also (i) decreased the area of mitochondria and increased the number of organelles and lysosomes, (ii) increased mitochondrial fission, as demonstrated by increased FIS1 mRNA and FIS-1 protein levels, an apparent accumulation of DRP-1 protein, and an increase in the VDAC protein level, and (iii) reduced autophagic flux, as demonstrated by a decrease in LC3B lipidation (LC3B-II) in the presence of chloroquine. Therefore, SET overexpression in HEK293 cells promotes mitochondrial fission and reduces autophagic flux in apparent association with up-regulation of UCP2 and UCP3; this implies a potential involvement in cellular processes that are deregulated such as in Alzheimer's disease and cancer. - Highlights: • SET, UCPs and autophagy prevention are correlated. • SET action has mitochondrial involvement. • UCP2/3 may reduce ROS and prevent autophagy. • SET protects cell from ROS via UCP2/3.« less
Sitterlé, E; Giraud, S; Leto, J; Bouchara, J P; Rougeron, A; Morio, F; Dauphin, B; Angebault, C; Quesne, G; Beretti, J L; Hassouni, N; Nassif, X; Bougnoux, M E
2014-09-01
An increasing number of infections due to Pseudallescheria/Scedosporium species has been reported during the past decades, both in immunocompromised and immunocompetent patients. Additionally, these fungi are now recognized worldwide as common agents of fungal colonization of the airways in cystic fibrosis patients, which represents a risk factor for disseminated infections after lung transplantation. Currently six species are described within the Pseudallescheria/Scedosporium genus, including Scedosporium prolificans and species of the Pseudallescheria/Scedosporium apiospermum complex (i.e. S. apiospermum sensu stricto, Pseudallescheria boydii, Scedosporium aurantiacum, Pseudallescheria minutispora and Scedosporium dehoogii). Precise identification of clinical isolates at the species level is required because these species differ in their antifungal drug susceptibility patterns. Matrix-assisted laser desorption ionization (MALDI)-time of flight (TOF)/mass spectrometry (MS) is a powerful tool to rapidly identify moulds at the species level. We investigated the potential of this technology to discriminate Pseudallescheria/Scedosporium species. Forty-seven reference strains were used to build a reference database library. Profiles from 3-, 5- and 7-day-old cultures of each reference strain were analysed to identify species-specific discriminating profiles. The database was tested for accuracy using a set of 64 clinical or environmental isolates previously identified by multilocus sequencing. All isolates were unequivocally identified at the species level by MALDI-TOF/MS. Our results, obtained using a simple protocol, without prior protein extraction or standardization of the culture, demonstrate that MALDI-TOF/MS is a powerful tool for rapid identification of Pseudallescheria/Scedosporium species that cannot be currently identified by morphological examination in the clinical setting. © 2014 The Authors Clinical Microbiology and Infection © 2014 European Society of Clinical Microbiology and Infectious Diseases.
Hyatt, Michael W; Field, Cara L; Clauss, Tonya M; Arheart, Kristopher L; Cray, Carolyn
2016-12-01
Preventative health care of elasmobranchs is an important but understudied field of aquatic veterinary medicine. Evaluation of inflammation through the acute phase response is a valuable tool in health assessments. To better assess the health of bonnethead sharks ( Sphyrna tiburo ) under managed care, normal reference intervals of protein electrophoresis (EPH) and the acute phase proteins, C-reactive protein (CRP) and haptoglobin (HP), were established. Blood was collected from wild caught, captive raised bonnethead sharks housed at public aquaria. Lithium heparinized plasma was either submitted fresh or stored at -80°C prior to submission. Electrophoresis identified protein fractions with migration characteristics similar to other animals with albumin, α-1 globulin, α-2 globulin, β globulin, and γ globulin. These fractions were classified as fractions 1-5 as fractional contents are unknown in this species. Commercial reagents for CRP and HP were validated for use in bonnethead sharks. Reference intervals were established using the robust method recommended by the American Society for Veterinary Clinical Pathology for the calculation of 90% reference intervals. Once established, the diagnostic and clinical applicability of these reference intervals was used to assess blood from individuals with known infectious diseases that resulted in systemic inflammation and eventual death. Unhealthy bonnethead sharks had significantly decreased fraction 2, fraction 3, and fraction 3:4 ratio and significantly increased fraction 5, CRP, and HP. These findings advance our understanding of elasmobranch acute phase inflammatory response and health and aid clinicians in the diagnosis of inflammatory disease in bonnethead sharks.
Nutrient reference value: non-communicable disease endpoints--a conference report.
Lupton, J R; Blumberg, J B; L'Abbe, M; LeDoux, M; Rice, H B; von Schacky, C; Yaktine, A; Griffiths, J C
2016-03-01
Nutrition is complex-and seemingly getting more complicated. Most consumers are familiar with "essential nutrients," e.g., vitamins and minerals, and more recently protein and important amino acids. These essential nutrients have nutrient reference values, referred to as dietary reference intakes (DRIs) developed by consensus committees of scientific experts convened by the Institute of Medicine of the National Academy of Sciences, Engineering, and Medicine and carried out by the Food and Nutrition Board. The DRIs comprise a set of four nutrient-based reverence values, the estimated average requirements, the recommended dietary allowances (RDAs), the adequate intakes and the tolerable upper intake levels for micronutrient intakes and an acceptable macronutrient distribution range for macronutrient intakes. From the RDA, the US Food and Drug Administration (FDA) derives a labeling value called the daily value (DV), which appears on the nutrition label of all foods for sale in the US. The DRI reports do not make recommendations about whether the DV labeling values can be set only for what have been defined to date as "essential nutrients." For example, the FDA set a labeling value for "dietary fiber" without having the DV. Nutrient reference values-requirements are set by Codex Alimentarius for essential nutrients, and regulatory bodies in many countries use these Codex values in setting national policy for recommended dietary intakes. However, the focus of this conference is not on essential nutrients, but on the "nonessential nutrients," also termed dietary bioactive components. They can be defined as "Constituents in foods or dietary supplements, other than those needed to meet basic human nutritional needs, which are responsible for changes in health status (Office of Disease Prevention and Health Promotion, Office of Public Health and Science, Department of Health and Human Services in Fed Regist 69:55821-55822, 2004)." Substantial and often persuasive scientific evidence does exist to confirm a relationship between the intake of a specific bioactive constituent and enhanced health conditions or reduced risk of a chronic disease. Further, research on the putative mechanisms of action of various classes of bioactives is supported by national and pan-national government agencies, and academic institutions, as well as functional food and dietary supplement manufacturers. Consumers are becoming educated and are seeking to purchase products containing bioactives, yet there is no evaluative process in place to let the public know how strong the science is behind the benefits or the quantitative amounts needed to achieve these beneficial health effects or to avoid exceeding the upper level (UL). When one lacks an essential nutrient, overt deficiency with concomitant physiological determents and eventually death are expected. The absence of bioactive substances from the diet results in suboptimal health, e.g., poor cellular and/or physiological function, which is relative and not absolute. Regrettably at this time, there is no DRI process to evaluate bioactives, although a recent workshop convened by the National Institutes of Health (Options for Consideration of Chronic Disease Endpoints for Dietary Reference Intakes (DRIs); March 10-11, 2015; http://health.gov/dietaryguidelines/dri/ ) did explore the process to develop DVs for nutrients, the lack of which result in increased risk of chronic disease (non-communicable disease) endpoints. A final report is expected soon. This conference (CRN-International Scientific Symposium; "Nutrient Reference Value-Non-Communicable Disease (NRV-NCD) Endpoints," 20 November in Kronberg, Germany; http://www.crn-i.ch/2015symposium/ ) explores concepts related to the Codex NRV process, the public health opportunities in setting NRVs for bioactive constituents, and further research and details on the specific class of bioactives, n-3 long-chain polyunsaturated fatty acids (also termed omega-3 fatty acids) and their constituents, specifically docosahexaenoic acid and eicosapentaenoic acid.
Josman, Nicky; Tee, Nancy W S; Maiwald, Matthias; Loo, Liat Hui; Ho, Clement K M
2018-06-15
It is often impractical for each laboratory to establish its own paediatric reference intervals. This is particularly true for specimen types collected using invasive procedures, for example, cerebrospinal fluid (CSF). Published CSF reference intervals for white cell count, and concentrations of total protein and glucose were reviewed by stakeholders in a paediatric hospital. Consensus reference intervals for the three CSF parameters were then subjected to verification using guidelines from the Clinical Laboratory Standards Institute and residual CSF specimens. Consensus paediatric reference intervals adapted from published studies with minor modifications were locally verified as follows. White cell count (x10 6 cells/L): 0-20 (<1 month); 0-10 (1-2 months); 0-5 (>2 months). Total protein (g/L): 0.3-1.2 (<1 month); 0.2-0.6 (1-3 months); 0.1-0.4 (>3 months). Glucose (mmol/L): 2.0-5.6 (<6 months); 2.4-4.3 (6 months or older). © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Generic comparison of protein inference engines.
Claassen, Manfred; Reiter, Lukas; Hengartner, Michael O; Buhmann, Joachim M; Aebersold, Ruedi
2012-04-01
Protein identifications, instead of peptide-spectrum matches, constitute the biologically relevant result of shotgun proteomics studies. How to appropriately infer and report protein identifications has triggered a still ongoing debate. This debate has so far suffered from the lack of appropriate performance measures that allow us to objectively assess protein inference approaches. This study describes an intuitive, generic and yet formal performance measure and demonstrates how it enables experimentalists to select an optimal protein inference strategy for a given collection of fragment ion spectra. We applied the performance measure to systematically explore the benefit of excluding possibly unreliable protein identifications, such as single-hit wonders. Therefore, we defined a family of protein inference engines by extending a simple inference engine by thousands of pruning variants, each excluding a different specified set of possibly unreliable identifications. We benchmarked these protein inference engines on several data sets representing different proteomes and mass spectrometry platforms. Optimally performing inference engines retained all high confidence spectral evidence, without posterior exclusion of any type of protein identifications. Despite the diversity of studied data sets consistently supporting this rule, other data sets might behave differently. In order to ensure maximal reliable proteome coverage for data sets arising in other studies we advocate abstaining from rigid protein inference rules, such as exclusion of single-hit wonders, and instead consider several protein inference approaches and assess these with respect to the presented performance measure in the specific application context.
Côté, Richard G; Jones, Philip; Martens, Lennart; Kerrien, Samuel; Reisinger, Florian; Lin, Quan; Leinonen, Rasko; Apweiler, Rolf; Hermjakob, Henning
2007-10-18
Each major protein database uses its own conventions when assigning protein identifiers. Resolving the various, potentially unstable, identifiers that refer to identical proteins is a major challenge. This is a common problem when attempting to unify datasets that have been annotated with proteins from multiple data sources or querying data providers with one flavour of protein identifiers when the source database uses another. Partial solutions for protein identifier mapping exist but they are limited to specific species or techniques and to a very small number of databases. As a result, we have not found a solution that is generic enough and broad enough in mapping scope to suit our needs. We have created the Protein Identifier Cross-Reference (PICR) service, a web application that provides interactive and programmatic (SOAP and REST) access to a mapping algorithm that uses the UniProt Archive (UniParc) as a data warehouse to offer protein cross-references based on 100% sequence identity to proteins from over 70 distinct source databases loaded into UniParc. Mappings can be limited by source database, taxonomic ID and activity status in the source database. Users can copy/paste or upload files containing protein identifiers or sequences in FASTA format to obtain mappings using the interactive interface. Search results can be viewed in simple or detailed HTML tables or downloaded as comma-separated values (CSV) or Microsoft Excel (XLS) files suitable for use in a local database or a spreadsheet. Alternatively, a SOAP interface is available to integrate PICR functionality in other applications, as is a lightweight REST interface. We offer a publicly available service that can interactively map protein identifiers and protein sequences to the majority of commonly used protein databases. Programmatic access is available through a standards-compliant SOAP interface or a lightweight REST interface. The PICR interface, documentation and code examples are available at http://www.ebi.ac.uk/Tools/picr.
Côté, Richard G; Jones, Philip; Martens, Lennart; Kerrien, Samuel; Reisinger, Florian; Lin, Quan; Leinonen, Rasko; Apweiler, Rolf; Hermjakob, Henning
2007-01-01
Background Each major protein database uses its own conventions when assigning protein identifiers. Resolving the various, potentially unstable, identifiers that refer to identical proteins is a major challenge. This is a common problem when attempting to unify datasets that have been annotated with proteins from multiple data sources or querying data providers with one flavour of protein identifiers when the source database uses another. Partial solutions for protein identifier mapping exist but they are limited to specific species or techniques and to a very small number of databases. As a result, we have not found a solution that is generic enough and broad enough in mapping scope to suit our needs. Results We have created the Protein Identifier Cross-Reference (PICR) service, a web application that provides interactive and programmatic (SOAP and REST) access to a mapping algorithm that uses the UniProt Archive (UniParc) as a data warehouse to offer protein cross-references based on 100% sequence identity to proteins from over 70 distinct source databases loaded into UniParc. Mappings can be limited by source database, taxonomic ID and activity status in the source database. Users can copy/paste or upload files containing protein identifiers or sequences in FASTA format to obtain mappings using the interactive interface. Search results can be viewed in simple or detailed HTML tables or downloaded as comma-separated values (CSV) or Microsoft Excel (XLS) files suitable for use in a local database or a spreadsheet. Alternatively, a SOAP interface is available to integrate PICR functionality in other applications, as is a lightweight REST interface. Conclusion We offer a publicly available service that can interactively map protein identifiers and protein sequences to the majority of commonly used protein databases. Programmatic access is available through a standards-compliant SOAP interface or a lightweight REST interface. The PICR interface, documentation and code examples are available at . PMID:17945017
The Processing Cost of Reference Set Computation: Acquisition of Stress Shift and Focus
ERIC Educational Resources Information Center
Reinhart, Tanya
2004-01-01
Reference set computation -- the construction of a (global) comparison set to determine whether a given derivation is appropriate in context -- comes with a processing cost. I argue that this cost is directly visible at the acquisition stage: In those linguistic areas in which it has been independently established that such computation is indeed…
Genetics Home Reference: aniridia
... PAX6 protein attaches (binds) to specific regions of DNA and regulates the activity of other genes. On the basis of this role, the PAX6 protein is called a transcription factor. Following birth, the PAX6 protein regulates several ...
Genetics Home Reference: spinal muscular atrophy with respiratory distress type 1
... a protein involved in copying (replicating) DNA ; producing RNA, a chemical cousin of DNA; and producing proteins. ... aid in DNA replication and the production of RNA and proteins. These problems particularly affect alpha-motor ...
Kanda, Kojun; Pflug, James M; Sproul, John S; Dasenko, Mark A; Maddison, David R
2015-01-01
In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced.
Dasenko, Mark A.
2015-01-01
In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced. PMID:26716693
Taylor, Candy M; Jost, Ricarda; Erskine, William; Nelson, Matthew N
2016-01-01
Quantitative Reverse Transcription PCR (qRT-PCR) is currently one of the most popular, high-throughput and sensitive technologies available for quantifying gene expression. Its accurate application depends heavily upon normalisation of gene-of-interest data with reference genes that are uniformly expressed under experimental conditions. The aim of this study was to provide the first validation of reference genes for Lupinus angustifolius (narrow-leafed lupin, a significant grain legume crop) using a selection of seven genes previously trialed as reference genes for the model legume, Medicago truncatula. In a preliminary evaluation, the seven candidate reference genes were assessed on the basis of primer specificity for their respective targeted region, PCR amplification efficiency, and ability to discriminate between cDNA and gDNA. Following this assessment, expression of the three most promising candidates [Ubiquitin C (UBC), Helicase (HEL), and Polypyrimidine tract-binding protein (PTB)] was evaluated using the NormFinder and RefFinder statistical algorithms in two narrow-leafed lupin lines, both with and without vernalisation treatment, and across seven organ types (cotyledons, stem, leaves, shoot apical meristem, flowers, pods and roots) encompassing three developmental stages. UBC was consistently identified as the most stable candidate and has sufficiently uniform expression that it may be used as a sole reference gene under the experimental conditions tested here. However, as organ type and developmental stage were associated with greater variability in relative expression, it is recommended using UBC and HEL as a pair to achieve optimal normalisation. These results highlight the importance of rigorously assessing candidate reference genes for each species across a diverse range of organs and developmental stages. With emerging technologies, such as RNAseq, and the completion of valuable transcriptome data sets, it is possible that other potentially more suitable reference genes will be identified for this species in future.
Erskine, William; Nelson, Matthew N.
2016-01-01
Quantitative Reverse Transcription PCR (qRT-PCR) is currently one of the most popular, high-throughput and sensitive technologies available for quantifying gene expression. Its accurate application depends heavily upon normalisation of gene-of-interest data with reference genes that are uniformly expressed under experimental conditions. The aim of this study was to provide the first validation of reference genes for Lupinus angustifolius (narrow-leafed lupin, a significant grain legume crop) using a selection of seven genes previously trialed as reference genes for the model legume, Medicago truncatula. In a preliminary evaluation, the seven candidate reference genes were assessed on the basis of primer specificity for their respective targeted region, PCR amplification efficiency, and ability to discriminate between cDNA and gDNA. Following this assessment, expression of the three most promising candidates [Ubiquitin C (UBC), Helicase (HEL), and Polypyrimidine tract-binding protein (PTB)] was evaluated using the NormFinder and RefFinder statistical algorithms in two narrow-leafed lupin lines, both with and without vernalisation treatment, and across seven organ types (cotyledons, stem, leaves, shoot apical meristem, flowers, pods and roots) encompassing three developmental stages. UBC was consistently identified as the most stable candidate and has sufficiently uniform expression that it may be used as a sole reference gene under the experimental conditions tested here. However, as organ type and developmental stage were associated with greater variability in relative expression, it is recommended using UBC and HEL as a pair to achieve optimal normalisation. These results highlight the importance of rigorously assessing candidate reference genes for each species across a diverse range of organs and developmental stages. With emerging technologies, such as RNAseq, and the completion of valuable transcriptome data sets, it is possible that other potentially more suitable reference genes will be identified for this species in future. PMID:26872362
Falkner, Jayson; Andrews, Philip
2005-05-15
Comparing tandem mass spectra (MSMS) against a known dataset of protein sequences is a common method for identifying unknown proteins; however, the processing of MSMS by current software often limits certain applications, including comprehensive coverage of post-translational modifications, non-specific searches and real-time searches to allow result-dependent instrument control. This problem deserves attention as new mass spectrometers provide the ability for higher throughput and as known protein datasets rapidly grow in size. New software algorithms need to be devised in order to address the performance issues of conventional MSMS protein dataset-based protein identification. This paper describes a novel algorithm based on converting a collection of monoisotopic, centroided spectra to a new data structure, named 'peptide finite state machine' (PFSM), which may be used to rapidly search a known dataset of protein sequences, regardless of the number of spectra searched or the number of potential modifications examined. The algorithm is verified using a set of commercially available tryptic digest protein standards analyzed using an ABI 4700 MALDI TOFTOF mass spectrometer, and a free, open source PFSM implementation. It is illustrated that a PFSM can accurately search large collections of spectra against large datasets of protein sequences (e.g. NCBI nr) using a regular desktop PC; however, this paper only details the method for identifying peptide and subsequently protein candidates from a dataset of known protein sequences. The concept of using a PFSM as a peptide pre-screening technique for MSMS-based search engines is validated by using PFSM with Mascot and XTandem. Complete source code, documentation and examples for the reference PFSM implementation are freely available at the Proteome Commons, http://www.proteomecommons.org and source code may be used both commercially and non-commercially as long as the original authors are credited for their work.
Estimating clinical chemistry reference values based on an existing data set of unselected animals.
Dimauro, Corrado; Bonelli, Piero; Nicolussi, Paola; Rassu, Salvatore P G; Cappio-Borlino, Aldo; Pulina, Giuseppe
2008-11-01
In an attempt to standardise the determination of biological reference values, the International Federation of Clinical Chemistry (IFCC) has published a series of recommendations on developing reference intervals. The IFCC recommends the use of an a priori sampling of at least 120 healthy individuals. However, such a high number of samples and laboratory analysis is expensive, time-consuming and not always feasible, especially in veterinary medicine. In this paper, an alternative (a posteriori) method is described and is used to determine reference intervals for biochemical parameters of farm animals using an existing laboratory data set. The method used was based on the detection and removal of outliers to obtain a large sample of animals likely to be healthy from the existing data set. This allowed the estimation of reliable reference intervals for biochemical parameters in Sarda dairy sheep. This method may also be useful for the determination of reference intervals for different species, ages and gender.
PANDORA: keyword-based analysis of protein sets by integration of annotation sources.
Kaplan, Noam; Vaaknin, Avishay; Linial, Michal
2003-10-01
Recent advances in high-throughput methods and the application of computational tools for automatic classification of proteins have made it possible to carry out large-scale proteomic analyses. Biological analysis and interpretation of sets of proteins is a time-consuming undertaking carried out manually by experts. We have developed PANDORA (Protein ANnotation Diagram ORiented Analysis), a web-based tool that provides an automatic representation of the biological knowledge associated with any set of proteins. PANDORA uses a unique approach of keyword-based graphical analysis that focuses on detecting subsets of proteins that share unique biological properties and the intersections of such sets. PANDORA currently supports SwissProt keywords, NCBI Taxonomy, InterPro entries and the hierarchical classification terms from ENZYME, SCOP and GO databases. The integrated study of several annotation sources simultaneously allows a representation of biological relations of structure, function, cellular location, taxonomy, domains and motifs. PANDORA is also integrated into the ProtoNet system, thus allowing testing thousands of automatically generated clusters. We illustrate how PANDORA enhances the biological understanding of large, non-uniform sets of proteins originating from experimental and computational sources, without the need for prior biological knowledge on individual proteins.
Rivas, Manuel A.; Avila, Brandon E.; Koskela, Jukka; Stevens, Christine; Pirinen, Matti; Neale, Benjamin M.; Ganna, Andrea; Graham, Daniel; Glaser, Benjamin; Peter, Inga; Atzmon, Gil; Barzilai, Nir; Levine, Adam P.; Schiff, Elena; Weisburd, Ben; Lek, Monkol; Bloom, Jonathan; Minikel, Eric V.; Petersen, Britt-Sabina; Beaugerie, Laurent; Seksik, Philippe; Cosnes, Jacques; Schreiber, Stefan; Bokemeyer, Bernd; Bethge, Johannes; Ahmad, Tariq; Plagnol, Vincent; Segal, Anthony W.; Targan, Stephan; Turner, Dan; Saavalainen, Paivi; Farkkila, Martti; Kontula, Kimmo; Palotie, Aarno; Brant, Steven R.; Duerr, Richard H.; Silverberg, Mark S.; Weersma, Rinse K.; Franke, Andre; Jostins, Luke; Barrett, Jeffrey C.; MacArthur, Daniel G.; Jalas, Chaim; Sokol, Harry; Xavier, Ramnik J.; Pulver, Ann; Cho, Judy H.; McGovern, Dermot P. B.; Daly, Mark J.
2018-01-01
As part of a broader collaborative network of exome sequencing studies, we developed a jointly called data set of 5,685 Ashkenazi Jewish exomes. We make publicly available a resource of site and allele frequencies, which should serve as a reference for medical genetics in the Ashkenazim (hosted in part at https://ibd.broadinstitute.org, also available in gnomAD at http://gnomad.broadinstitute.org). We estimate that 34% of protein-coding alleles present in the Ashkenazi Jewish population at frequencies greater than 0.2% are significantly more frequent (mean 15-fold) than their maximum frequency observed in other reference populations. Arising via a well-described founder effect approximately 30 generations ago, this catalog of enriched alleles can contribute to differences in genetic risk and overall prevalence of diseases between populations. As validation we document 148 AJ enriched protein-altering alleles that overlap with "pathogenic" ClinVar alleles (table available at https://github.com/macarthur-lab/clinvar/blob/master/output/clinvar.tsv), including those that account for 10–100 fold differences in prevalence between AJ and non-AJ populations of some rare diseases, especially recessive conditions, including Gaucher disease (GBA, p.Asn409Ser, 8-fold enrichment); Canavan disease (ASPA, p.Glu285Ala, 12-fold enrichment); and Tay-Sachs disease (HEXA, c.1421+1G>C, 27-fold enrichment; p.Tyr427IlefsTer5, 12-fold enrichment). We next sought to use this catalog, of well-established relevance to Mendelian disease, to explore Crohn's disease, a common disease with an estimated two to four-fold excess prevalence in AJ. We specifically attempt to evaluate whether strong acting rare alleles, particularly protein-truncating or otherwise large effect-size alleles, enriched by the same founder-effect, contribute excess genetic risk to Crohn's disease in AJ, and find that ten rare genetic risk factors in NOD2 and LRRK2 are enriched in AJ (p < 0.005), including several novel contributing alleles, show evidence of association to CD. Independently, we find that genomewide common variant risk defined by GWAS shows a strong difference between AJ and non-AJ European control population samples (0.97 s.d. higher, p<10−16). Taken together, the results suggest coordinated selection in AJ population for higher CD risk alleles in general. The results and approach illustrate the value of exome sequencing data in case-control studies along with reference data sets like ExAC (sites VCF available via FTP at ftp.broadinstitute.org/pub/ExAC_release/release0.3/) to pinpoint genetic variation that contributes to variable disease predisposition across populations. PMID:29795570
NASA Astrophysics Data System (ADS)
Gampe, D.; Ludwig, R.
2017-12-01
Regional Climate Models (RCMs) that downscale General Circulation Models (GCMs) are the primary tool to project future climate and serve as input to many impact models to assess the related changes and impacts under such climate conditions. Such RCMs are made available through the Coordinated Regional climate Downscaling Experiment (CORDEX). The ensemble of models provides a range of possible future climate changes around the ensemble mean climate change signal. The model outputs however are prone to biases compared to regional observations. A bias correction of these deviations is a crucial step in the impact modelling chain to allow the reproduction of historic conditions of i.e. river discharge. However, the detection and quantification of model biases are highly dependent on the selected regional reference data set. Additionally, in practice due to computational constraints it is usually not feasible to consider the entire ensembles of climate simulations with all members as input for impact models which provide information to support decision-making. Although more and more studies focus on model selection based on the preservation of the climate model spread, a selection based on validity, i.e. the representation of the historic conditions is still a widely applied approach. In this study, several available reference data sets for precipitation are selected to detect the model bias for the reference period 1989 - 2008 over the alpine catchment of the Adige River located in Northern Italy. The reference data sets originate from various sources, such as station data or reanalysis. These data sets are remapped to the common RCM grid at 0.11° resolution and several indicators, such as dry and wet spells, extreme precipitation and general climatology, are calculate to evaluate the capability of the RCMs to produce the historical conditions. The resulting RCM spread is compared against the spread of the reference data set to determine the related uncertainties and detect potential model biases with respect to each reference data set. The RCMs are then ranked based on various statistical measures for each indicator and a score matrix is derived to select a subset of RCMs. We show the impact and importance of the reference data set with respect to the resulting climate change signal on the catchment scale.
Genetics Home Reference: progressive pseudorheumatoid dysplasia
... caused by mutations in the WISP3 gene. The function of the protein produced from this gene is not well understood, ... protein that may not function. Loss of WISP3 protein function likely disrupts normal cartilage maintenance and bone growth, ...
Genetics Home Reference: nail-patella syndrome
... protein that attaches (binds) to specific regions of DNA and regulates the activity of other genes. On the basis of this role, the LMX1B protein is called a transcription factor. The LMX1B protein appears to be particularly ...
Rice proteome database: a step toward functional analysis of the rice genome.
Komatsu, Setsuko
2005-09-01
The technique of proteome analysis using two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) has the power to monitor global changes that occur in the protein complement of tissues and subcellular compartments. In this study, the proteins of rice were cataloged, a rice proteome database was constructed, and a functional characterization of some of the identified proteins was undertaken. Proteins extracted from various tissues and subcellular compartments in rice were separated by 2D-PAGE and an image analyzer was used to construct a display of the proteins. The Rice Proteome Database contains 23 reference maps based on 2D-PAGE of proteins from various rice tissues and subcellular compartments. These reference maps comprise 13129 identified proteins, and the amino acid sequences of 5092 proteins are entered in the database. Major proteins involved in growth or stress responses were identified using the proteome approach. Some of these proteins, including a beta-tubulin, calreticulin, and ribulose-1,5-bisphosphate carboxylase/oxygenase activase in rice, have unexpected functions. The information obtained from the Rice Proteome Database will aid in cloning the genes for and predicting the function of unknown proteins.
Woo, Jongmin; Han, Dohyun; Park, Joonho; Kim, Sang Jeong; Kim, Youngsoo
2015-11-01
Microglia, astrocytes, and neurons, which have important functions in the central nervous system (CNS), communicate mutually to generate a signal through secreted proteins or small molecules, but many of which have not been identified. Because establishing a reference for the secreted proteins from CNS cells could be invaluable in examining cell-to-cell communication in the brain, we analyzed the secretome of three murine CNS cell lines without prefractionation by high-resolution mass spectrometry. In this study, 2795 proteins were identified from conditioned media of the three cell lines, and 2125 proteins were annotated as secreted proteins by bioinformatics analysis. Further, approximately 500 secreted proteins were quantifiable as differentially expressed proteins by label-free quantitation. As a result, our secretome references are useful datasets for the future study of neuronal diseases. All MS data have been deposited in the ProteomeXchange with identifier PXD001597 (http://proteomecentral.proteomexchange.org/dataset/PXD001597). © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Accuracy Estimation and Parameter Advising for Protein Multiple Sequence Alignment
DeBlasio, Dan
2013-01-01
Abstract We develop a novel and general approach to estimating the accuracy of multiple sequence alignments without knowledge of a reference alignment, and use our approach to address a new task that we call parameter advising: the problem of choosing values for alignment scoring function parameters from a given set of choices to maximize the accuracy of a computed alignment. For protein alignments, we consider twelve independent features that contribute to a quality alignment. An accuracy estimator is learned that is a polynomial function of these features; its coefficients are determined by minimizing its error with respect to true accuracy using mathematical optimization. Compared to prior approaches for estimating accuracy, our new approach (a) introduces novel feature functions that measure nonlocal properties of an alignment yet are fast to evaluate, (b) considers more general classes of estimators beyond linear combinations of features, and (c) develops new regression formulations for learning an estimator from examples; in addition, for parameter advising, we (d) determine the optimal parameter set of a given cardinality, which specifies the best parameter values from which to choose. Our estimator, which we call Facet (for “feature-based accuracy estimator”), yields a parameter advisor that on the hardest benchmarks provides more than a 27% improvement in accuracy over the best default parameter choice, and for parameter advising significantly outperforms the best prior approaches to assessing alignment quality. PMID:23489379
Adenovirus Core Protein VII Downregulates the DNA Damage Response on the Host Genome
Avgousti, Daphne C.; Della Fera, Ashley N.; Otter, Clayton J.; Herrmann, Christin; Pancholi, Neha J.
2017-01-01
ABSTRACT Viral manipulation of cellular proteins allows viruses to suppress host defenses and generate infectious progeny. Due to the linear double-stranded DNA nature of the adenovirus genome, the cellular DNA damage response (DDR) is considered a barrier to successful infection. The adenovirus genome is packaged with protein VII, a virally encoded histone-like core protein that is suggested to protect incoming viral genomes from detection by the cellular DNA damage machinery. We showed that protein VII localizes to host chromatin during infection, leading us to hypothesize that protein VII may affect DNA damage responses on the cellular genome. Here we show that protein VII at cellular chromatin results in a significant decrease in accumulation of phosphorylated H2AX (γH2AX) following irradiation, indicating that protein VII inhibits DDR signaling. The oncoprotein SET was recently suggested to modulate the DDR by affecting access of repair proteins to chromatin. Since protein VII binds SET, we investigated a role for SET in DDR inhibition by protein VII. We show that knockdown of SET partially rescues the protein VII-induced decrease in γH2AX accumulation on the host genome, suggesting that SET is required for inhibition. Finally, we show that knockdown of SET also allows ATM to localize to incoming viral genomes bound by protein VII during infection with a mutant lacking early region E4. Together, our data suggest that the protein VII-SET interaction contributes to DDR evasion by adenovirus. Our results provide an additional example of a strategy used by adenovirus to abrogate the host DDR and show how viruses can modify cellular processes through manipulation of host chromatin. IMPORTANCE The DNA damage response (DDR) is a cellular network that is crucial for maintaining genome integrity. DNA viruses replicating in the nucleus challenge the resident genome and must overcome cellular responses, including the DDR. Adenoviruses are prevalent human pathogens that can cause a multitude of diseases, such as respiratory infections and conjunctivitis. Here we describe how a small adenovirus core protein that localizes to host chromatin during infection can globally downregulate the DDR. Our study focuses on key players in the damage signaling pathway and highlights how viral manipulation of chromatin may influence access of DDR proteins to the host genome. PMID:28794020
Casu, Fabio; Watson, Aaron M; Yost, Justin; Leffler, John W; Gaylord, Thomas Gibson; Barrows, Frederic T; Sandifer, Paul A; Denson, Michael R; Bearden, Daniel W
2017-07-07
We investigated the metabolic effects of four different commercial soy-based protein products on red drum fish (Sciaenops ocellatus) using nuclear magnetic resonance (NMR) spectroscopy-based metabolomics along with unsupervised principal component analysis (PCA) to evaluate metabolic profiles in liver, muscle, and plasma tissues. Specifically, during a 12 week feeding trial, juvenile red drum maintained in an indoor recirculating aquaculture system were fed four different commercially available soy formulations, containing the same amount of crude protein, and two reference diets as performance controls: a 60% soybean meal diet that had been used in a previous trial in our lab and a natural diet. Red drum liver, muscle, and plasma tissues were sampled at multiple time points to provide a more accurate snapshot of specific metabolic states during the grow-out. PCA score plots derived from NMR spectroscopy data sets showed significant differences between fish fed the natural diet and the soy-based diets, in both liver and muscle tissues. While red drum tolerated the inclusion of soy with good feed conversion ratios, a comparison to fish fed the natural diet revealed that the soy-fed fish in this study displayed a distinct metabolic signature characterized by increased protein and lipid catabolism, suggesting an energetic imbalance. Furthermore, among the soy-based formulations, one diet showed a more pronounced catabolic signature.
Jin, Liang; Zhang, Kai; Sternglanz, Rolf; Neiman, Aaron M
2017-05-01
In response to starvation, diploid cells of Saccharomyces cerevisiae undergo meiosis and form haploid spores, a process collectively referred to as sporulation. The differentiation into spores requires extensive changes in gene expression. The transcriptional activator Ndt80 is a central regulator of this process, which controls many genes essential for sporulation. Ndt80 induces ∼300 genes coordinately during meiotic prophase, but different mRNAs within the NDT80 regulon are translated at different times during sporulation. The protein kinase Ime2 and RNA binding protein Rim4 are general regulators of meiotic translational delay, but how differential timing of individual transcripts is achieved was not known. This report describes the characterization of two related NDT80 -induced genes, PES4 and MIP6 , encoding predicted RNA binding proteins. These genes are necessary to regulate the steady-state expression, translational timing, and localization of a set of mRNAs that are transcribed by NDT80 but not translated until the end of meiosis II. Mutations in the predicted RNA binding domains within PES4 alter the stability of target mRNAs. PES4 and MIP6 affect only a small portion of the NDT80 regulon, indicating that they act as modulators of the general Ime2/Rim4 pathway for specific transcripts. Copyright © 2017 American Society for Microbiology.
Liu, Juntai; Friebe, Vincent M; Swainsbury, David J K; Crouch, Lucy I; Szabo, David A; Frese, Raoul N; Jones, Michael R
2018-04-17
Reaction centre/light harvesting proteins such as the RCLH1X complex from Rhodobacter sphaeroides carry out highly quantum-efficient conversion of solar energy through ultrafast energy transfer and charge separation, and these pigment-proteins have been incorporated into biohybrid photoelectrochemical cells for a variety of applications. In this work we demonstrate that, despite not being able to support normal photosynthetic growth of Rhodobacter sphaeroides, an engineered variant of this RCLH1X complex lacking the PufX protein and with an enlarged light harvesting antenna is unimpaired in its capacity for photocurrent generation in two types of bio-photoelectrochemical cells. Removal of PufX also did not impair the ability of the RCLH1 complex to act as an acceptor of energy from synthetic light harvesting quantum dots. Unexpectedly, the removal of PufX led to a marked improvement in the overall stability of the RCLH1 complex under heat stress. We conclude that PufX-deficient RCLH1 complexes are fully functional in solar energy conversion in a device setting and that their enhanced structural stability could make them a preferred choice over their native PufX-containing counterpart. Our findings on the competence of RCLH1 complexes for light energy conversion in vitro are discussed with reference to the reason why these PufX-deficient proteins are not capable of light energy conversion in vivo.
Lee, Insuk; Li, Zhihua; Marcotte, Edward M.
2007-01-01
Background Probabilistic functional gene networks are powerful theoretical frameworks for integrating heterogeneous functional genomics and proteomics data into objective models of cellular systems. Such networks provide syntheses of millions of discrete experimental observations, spanning DNA microarray experiments, physical protein interactions, genetic interactions, and comparative genomics; the resulting networks can then be easily applied to generate testable hypotheses regarding specific gene functions and associations. Methodology/Principal Findings We report a significantly improved version (v. 2) of a probabilistic functional gene network [1] of the baker's yeast, Saccharomyces cerevisiae. We describe our optimization methods and illustrate their effects in three major areas: the reduction of functional bias in network training reference sets, the application of a probabilistic model for calculating confidences in pair-wise protein physical or genetic interactions, and the introduction of simple thresholds that eliminate many false positive mRNA co-expression relationships. Using the network, we predict and experimentally verify the function of the yeast RNA binding protein Puf6 in 60S ribosomal subunit biogenesis. Conclusions/Significance YeastNet v. 2, constructed using these optimizations together with additional data, shows significant reduction in bias and improvements in precision and recall, in total covering 102,803 linkages among 5,483 yeast proteins (95% of the validated proteome). YeastNet is available from http://www.yeastnet.org. PMID:17912365
Mark, Tomer; Jayabalan, David; Coleman, Morton; Pearse, Roger N; Wang, Y Lynn; Lent, Richard; Christos, Paul J; Lee, Joong W; Agrawal, Yash P; Matthew, Susan; Ely, Scott; Mazumdar, Madhu; Cesarman, Ethel; Leonard, John P; Furman, Richard R; Chen-Kiang, Selina; Niesvizky, Ruben
2008-12-01
The M-protein is the major reference measure for response in multiple myeloma (MM) and its correct interpretation is key to clinical management. The emergence of oligoclonal banding is recognized as a benign finding in the postautologous stem cell transplantation setting (ASCT) for MM but its significance during non-myeloablative therapy is unknown. In a study of the immunomodulatory combination BiRD, (lenalidomide and dexamethasone with clarithromycin), we frequently detected the emergence of mono- and oligo-clonal immunoglobulins unrelated to the baseline diagnostic M-protein. The new M-proteins seen on serum immunofixation electrophoresis were clearly different in either heavy or light chain component(s) from the original M-spike protein and were termed atypical serum immunofixation patterns (ASIPs). Overall, 24/72 (33%) patients treated with BiRD developed ASIPs. Patients who developed ASIPs compared with patients treated with BiRD without ASIPs, had a significantly greater overall response (100% vs. 85%) and complete response rates (71% vs. 23%). ASIPs were not associated with new clonal plasma cells or other lymphoproliferative processes, and molecular remissions were documented. This is the first time this phenomenon has been seen with regularity in non-myeloablative therapy for MM. Analogous to the ASCT experience, ASIPs do not signal incipient disease progression, but rather herald robust response.
Musungu, Bryan; Bhatnagar, Deepak; Brown, Robert L.; Fakhoury, Ahmad M.; Geisler, Matt
2015-01-01
Interactomes are genome-wide roadmaps of protein-protein interactions. They have been produced for humans, yeast, the fruit fly, and Arabidopsis thaliana and have become invaluable tools for generating and testing hypotheses. A predicted interactome for Zea mays (PiZeaM) is presented here as an aid to the research community for this valuable crop species. PiZeaM was built using a proven method of interologs (interacting orthologs) that were identified using both one-to-one and many-to-many orthology between genomes of maize and reference species. Where both maize orthologs occurred for an experimentally determined interaction in the reference species, we predicted a likely interaction in maize. A total of 49,026 unique interactions for 6004 maize proteins were predicted. These interactions are enriched for processes that are evolutionarily conserved, but include many otherwise poorly annotated proteins in maize. The predicted maize interactions were further analyzed by comparing annotation of interacting proteins, including different layers of ontology. A map of pairwise gene co-expression was also generated and compared to predicted interactions. Two global subnetworks were constructed for highly conserved interactions. These subnetworks showed clear clustering of proteins by function. Another subnetwork was created for disease response using a bait and prey strategy to capture interacting partners for proteins that respond to other organisms. Closer examination of this subnetwork revealed the connectivity between biotic and abiotic hormone stress pathways. We believe PiZeaM will provide a useful tool for the prediction of protein function and analysis of pathways for Z. mays researchers and is presented in this paper as a reference tool for the exploration of protein interactions in maize. PMID:26089837
Metzger, Fabian; Mischek, Daniel; Stoffers, Frédéric
2017-01-01
Here we show that the hydrodynamic radii-dependent entry of blood proteins into cerebrospinal fluid (CSF) can best be modeled with a diffusional system of consecutive interdependent steady states between barrier-restricted molecular flux and bulk flow of CSF. The connected steady state model fits precisely to experimental results and provides the theoretical backbone to calculate the in-vivo hydrodynamic radii of blood-derived proteins as well as individual barrier characteristics. As the experimental reference set we used a previously published large-scale patient cohort of CSF to serum quotient ratios of immunoglobulins in relation to the respective albumin quotients. We related the inter-individual variances of these quotient relationships to the individual CSF flow time and barrier characteristics. We claim that this new concept allows the diagnosis of inflammatory processes with Reibergrams derived from population-based thresholds to be shifted to individualized judgment, thereby improving diagnostic sensitivity. We further use the source-dependent gradient patterns of proteins in CSF as intrinsic tracers for CSF flow characteristics. We assume that the rostrocaudal gradient of blood-derived proteins is a consequence of CSF bulk flow, whereas the slope of the gradient is a consequence of the unidirectional bulk flow and bidirectional pulsatile flow of CSF. Unlike blood-derived proteins, the influence of CSF flow characteristics on brain-derived proteins in CSF has been insufficiently discussed to date. By critically reviewing existing experimental data and by reassessing their conformity to CSF flow assumptions we conclude that the biomarker potential of brain-derived proteins in CSF can be improved by considering individual subproteomic dynamics of the CSF system.
Thermal motion in proteins: Large effects on the time-averaged interaction energies
DOE Office of Scientific and Technical Information (OSTI.GOV)
Goethe, Martin, E-mail: martingoethe@ub.edu; Rubi, J. Miguel; Fita, Ignacio
As a consequence of thermal motion, inter-atomic distances in proteins fluctuate strongly around their average values, and hence, also interaction energies (i.e. the pair-potentials evaluated at the fluctuating distances) are not constant in time but exhibit pronounced fluctuations. These fluctuations cause that time-averaged interaction energies do generally not coincide with the energy values obtained by evaluating the pair-potentials at the average distances. More precisely, time-averaged interaction energies behave typically smoother in terms of the average distance than the corresponding pair-potentials. This averaging effect is referred to as the thermal smoothing effect. Here, we estimate the strength of the thermal smoothingmore » effect on the Lennard-Jones pair-potential for globular proteins at ambient conditions using x-ray diffraction and simulation data of a representative set of proteins. For specific atom species, we find a significant smoothing effect where the time-averaged interaction energy of a single atom pair can differ by various tens of cal/mol from the Lennard-Jones potential at the average distance. Importantly, we observe a dependency of the effect on the local environment of the involved atoms. The effect is typically weaker for bulky backbone atoms in beta sheets than for side-chain atoms belonging to other secondary structure on the surface of the protein. The results of this work have important practical implications for protein software relying on free energy expressions. We show that the accuracy of free energy expressions can largely be increased by introducing environment specific Lennard-Jones parameters accounting for the fact that the typical thermal motion of protein atoms depends strongly on their local environment.« less
Thermal motion in proteins: Large effects on the time-averaged interaction energies
NASA Astrophysics Data System (ADS)
Goethe, Martin; Fita, Ignacio; Rubi, J. Miguel
2016-03-01
As a consequence of thermal motion, inter-atomic distances in proteins fluctuate strongly around their average values, and hence, also interaction energies (i.e. the pair-potentials evaluated at the fluctuating distances) are not constant in time but exhibit pronounced fluctuations. These fluctuations cause that time-averaged interaction energies do generally not coincide with the energy values obtained by evaluating the pair-potentials at the average distances. More precisely, time-averaged interaction energies behave typically smoother in terms of the average distance than the corresponding pair-potentials. This averaging effect is referred to as the thermal smoothing effect. Here, we estimate the strength of the thermal smoothing effect on the Lennard-Jones pair-potential for globular proteins at ambient conditions using x-ray diffraction and simulation data of a representative set of proteins. For specific atom species, we find a significant smoothing effect where the time-averaged interaction energy of a single atom pair can differ by various tens of cal/mol from the Lennard-Jones potential at the average distance. Importantly, we observe a dependency of the effect on the local environment of the involved atoms. The effect is typically weaker for bulky backbone atoms in beta sheets than for side-chain atoms belonging to other secondary structure on the surface of the protein. The results of this work have important practical implications for protein software relying on free energy expressions. We show that the accuracy of free energy expressions can largely be increased by introducing environment specific Lennard-Jones parameters accounting for the fact that the typical thermal motion of protein atoms depends strongly on their local environment.
Influence of Protein Abundance on High-Throughput Protein-Protein Interaction Detection
2009-06-05
the interaction data sets we determined, via comparisons with strict randomized simulations , the propensity for essential proteins to selectively...and analysis of high- quality PPI data sets. Materials and Methods We analyzed protein interaction networks for yeast and E. coli determined from Y2H...we reinvestigated the centrality-lethality rule, which implies that proteins having more interactions are more likely to be essential. From analysis
Patterns of HIV-1 Protein Interaction Identify Perturbed Host-Cellular Subsystems
MacPherson, Jamie I.; Dickerson, Jonathan E.; Pinney, John W.; Robertson, David L.
2010-01-01
Human immunodeficiency virus type 1 (HIV-1) exploits a diverse array of host cell functions in order to replicate. This is mediated through a network of virus-host interactions. A variety of recent studies have catalogued this information. In particular the HIV-1, Human Protein Interaction Database (HHPID) has provided a unique depth of protein interaction detail. However, as a map of HIV-1 infection, the HHPID is problematic, as it contains curation error and redundancy; in addition, it is based on a heterogeneous set of experimental methods. Based on identifying shared patterns of HIV-host interaction, we have developed a novel methodology to delimit the core set of host-cellular functions and their associated perturbation from the HHPID. Initially, using biclustering, we identify 279 significant sets of host proteins that undergo the same types of interaction. The functional cohesiveness of these protein sets was validated using a human protein-protein interaction network, gene ontology annotation and sequence similarity. Next, using a distance measure, we group host protein sets and identify 37 distinct higher-level subsystems. We further demonstrate the biological significance of these subsystems by cross-referencing with global siRNA screens that have been used to detect host factors necessary for HIV-1 replication, and investigate the seemingly small intersect between these data sets. Our results highlight significant host-cell subsystems that are perturbed during the course of HIV-1 infection. Moreover, we characterise the patterns of interaction that contribute to these perturbations. Thus, our work disentangles the complex set of HIV-1-host protein interactions in the HHPID, reconciles these with siRNA screens and provides an accessible and interpretable map of infection. PMID:20686668
Deorphanizing the human transmembrane genome: A landscape of uncharacterized membrane proteins.
Babcock, Joseph J; Li, Min
2014-01-01
The sequencing of the human genome has fueled the last decade of work to functionally characterize genome content. An important subset of genes encodes membrane proteins, which are the targets of many drugs. They reside in lipid bilayers, restricting their endogenous activity to a relatively specialized biochemical environment. Without a reference phenotype, the application of systematic screens to profile candidate membrane proteins is not immediately possible. Bioinformatics has begun to show its effectiveness in focusing the functional characterization of orphan proteins of a particular functional class, such as channels or receptors. Here we discuss integration of experimental and bioinformatics approaches for characterizing the orphan membrane proteome. By analyzing the human genome, a landscape reference for the human transmembrane genome is provided.
Chao, Jinquan; Yang, Shuguang; Chen, Yueyi; Tian, Wei-Min
2016-01-01
Latex exploitation-caused latex flow is effective in enhancing latex regeneration in laticifer cells of rubber tree. It should be suitable for screening appropriate reference gene for analysis of the expression of latex regeneration-related genes by quantitative real-time PCR (qRT-PCR). In the present study, the expression stability of 23 candidate reference genes was evaluated on the basis of latex flow by using geNorm and NormFinder algorithms. Ubiquitin-protein ligase 2a (UBC2a) and ubiquitin-protein ligase 2b (UBC2b) were the two most stable genes among the selected candidate references in rubber tree clones with differential duration of latex flow. The two genes were also high-ranked in previous reference gene screening across different tissues and experimental conditions. By contrast, the transcripts of latex regeneration-related genes fluctuated significantly during latex flow. The results suggest that screening reference gene during latex flow should be an efficient and effective clue for selection of reference genes in qRT-PCR. PMID:27524995
Nonlinear interferometric vibrational imaging
NASA Technical Reports Server (NTRS)
Boppart, Stephen A. (Inventor); Marks, Daniel L. (Inventor)
2009-01-01
A method of examining a sample, which includes: exposing a reference to a first set of electromagnetic radiation, to form a second set of electromagnetic radiation scattered from the reference; exposing a sample to a third set of electromagnetic radiation to form a fourth set of electromagnetic radiation scattered from the sample; and interfering the second set of electromagnetic radiation and the fourth set of electromagnetic radiation. The first set and the third set of electromagnetic radiation are generated from a source; at least a portion of the second set of electromagnetic radiation is of a frequency different from that of the first set of electromagnetic radiation; and at least a portion of the fourth set of electromagnetic radiation is of a frequency different from that of the third set of electromagnetic radiation.
Foster, Joseph M; Moreno, Pablo; Fabregat, Antonio; Hermjakob, Henning; Steinbeck, Christoph; Apweiler, Rolf; Wakelam, Michael J O; Vizcaíno, Juan Antonio
2013-01-01
Protein sequence databases are the pillar upon which modern proteomics is supported, representing a stable reference space of predicted and validated proteins. One example of such resources is UniProt, enriched with both expertly curated and automatic annotations. Taken largely for granted, similar mature resources such as UniProt are not available yet in some other "omics" fields, lipidomics being one of them. While having a seasoned community of wet lab scientists, lipidomics lies significantly behind proteomics in the adoption of data standards and other core bioinformatics concepts. This work aims to reduce the gap by developing an equivalent resource to UniProt called 'LipidHome', providing theoretically generated lipid molecules and useful metadata. Using the 'FASTLipid' Java library, a database was populated with theoretical lipids, generated from a set of community agreed upon chemical bounds. In parallel, a web application was developed to present the information and provide computational access via a web service. Designed specifically to accommodate high throughput mass spectrometry based approaches, lipids are organised into a hierarchy that reflects the variety in the structural resolution of lipid identifications. Additionally, cross-references to other lipid related resources and papers that cite specific lipids were used to annotate lipid records. The web application encompasses a browser for viewing lipid records and a 'tools' section where an MS1 search engine is currently implemented. LipidHome can be accessed at http://www.ebi.ac.uk/apweiler-srv/lipidhome.
Phylogenetic Origin and Diversification of RNAi Pathway Genes in Insects.
Dowling, Daniel; Pauli, Thomas; Donath, Alexander; Meusemann, Karen; Podsiadlowski, Lars; Petersen, Malte; Peters, Ralph S; Mayer, Christoph; Liu, Shanlin; Zhou, Xin; Misof, Bernhard; Niehuis, Oliver
2016-12-01
RNA interference (RNAi) refers to the set of molecular processes found in eukaryotic organisms in which small RNA molecules mediate the silencing or down-regulation of target genes. In insects, RNAi serves a number of functions, including regulation of endogenous genes, anti-viral defense, and defense against transposable elements. Despite being well studied in model organisms, such as Drosophila, the distribution of core RNAi pathway genes and their evolution in insects is not well understood. Here we present the most comprehensive overview of the distribution and diversity of core RNAi pathway genes across 100 insect species, encompassing all currently recognized insect orders. We inferred the phylogenetic origin of insect-specific RNAi pathway genes and also identified several hitherto unrecorded gene expansions using whole-body transcriptome data from the international 1KITE (1000 Insect Transcriptome Evolution) project as well as other resources such as i5K (5000 Insect Genome Project). Specifically, we traced the origin of the double stranded RNA binding protein R2D2 to the last common ancestor of winged insects (Pterygota), the loss of Sid-1/Tag-130 orthologs in Antliophora (fleas, flies and relatives, and scorpionflies in a broad sense), and confirm previous evidence for the splitting of the Argonaute proteins Aubergine and Piwi in Brachyceran flies (Diptera, Brachycera). Our study offers new reference points for future experimental research on RNAi-related pathway genes in insects. © The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
... the energy-producing centers in cells. While most protein synthesis occurs in the fluid surrounding the nucleus ( cytoplasm ), some proteins are synthesized in the mitochondria. During protein synthesis, in either the mitochondria or the cytoplasm, building ...
... the energy-producing centers in cells. While most protein synthesis occurs in the fluid surrounding the cell nucleus ( ... some proteins are synthesized in the mitochondria. During protein synthesis , in either the mitochondria or the cytoplasm, building ...
Routine development of objectively derived search strategies.
Hausner, Elke; Waffenschmidt, Siw; Kaiser, Thomas; Simon, Michael
2012-02-29
Over the past few years, information retrieval has become more and more professionalized, and information specialists are considered full members of a research team conducting systematic reviews. Research groups preparing systematic reviews and clinical practice guidelines have been the driving force in the development of search strategies, but open questions remain regarding the transparency of the development process and the available resources. An empirically guided approach to the development of a search strategy provides a way to increase transparency and efficiency. Our aim in this paper is to describe the empirically guided development process for search strategies as applied by the German Institute for Quality and Efficiency in Health Care (Institut für Qualität und Wirtschaftlichkeit im Gesundheitswesen, or "IQWiG"). This strategy consists of the following steps: generation of a test set, as well as the development, validation and standardized documentation of the search strategy. We illustrate our approach by means of an example, that is, a search for literature on brachytherapy in patients with prostate cancer. For this purpose, a test set was generated, including a total of 38 references from 3 systematic reviews. The development set for the generation of the strategy included 25 references. After application of textual analytic procedures, a strategy was developed that included all references in the development set. To test the search strategy on an independent set of references, the remaining 13 references in the test set (the validation set) were used. The validation set was also completely identified. Our conclusion is that an objectively derived approach similar to that used in search filter development is a feasible way to develop and validate reliable search strategies. Besides creating high-quality strategies, the widespread application of this approach will result in a substantial increase in the transparency of the development process of search strategies.
2014-01-01
Background Gene expression analysis using quantitative reverse transcription PCR (qRT-PCR) is a robust method wherein the expression levels of target genes are normalised using internal control genes, known as reference genes, to derive changes in gene expression levels. Although reference genes have recently been suggested for olive tissues, combined/independent analysis on different cultivars has not yet been tested. Therefore, an assessment of reference genes was required to validate the recent findings and select stably expressed genes across different olive cultivars. Results A total of eight candidate reference genes [glyceraldehyde 3-phosphate dehydrogenase (GAPDH), serine/threonine-protein phosphatase catalytic subunit (PP2A), elongation factor 1 alpha (EF1-alpha), polyubiquitin (OUB2), aquaporin tonoplast intrinsic protein (TIP2), tubulin alpha (TUBA), 60S ribosomal protein L18-3 (60S RBP L18-3) and polypyrimidine tract-binding protein homolog 3 (PTB)] were chosen based on their stability in olive tissues as well as in other plants. Expression stability was examined by qRT-PCR across 12 biological samples, representing mesocarp tissues at various developmental stages in three different olive cultivars, Barnea, Frantoio and Picual, independently and together during the 2009 season with two software programs, GeNorm and BestKeeper. Both software packages identified GAPDH, EF1-alpha and PP2A as the three most stable reference genes across the three cultivars and in the cultivar, Barnea. GAPDH, EF1-alpha and 60S RBP L18-3 were found to be most stable reference genes in the cultivar Frantoio while 60S RBP L18-3, OUB2 and PP2A were found to be most stable reference genes in the cultivar Picual. Conclusions The analyses of expression stability of reference genes using qRT-PCR revealed that GAPDH, EF1-alpha, PP2A, 60S RBP L18-3 and OUB2 are suitable reference genes for expression analysis in developing Olea europaea mesocarp tissues, displaying the highest level of expression stability across three different olive cultivars, Barnea, Frantoio and Picual, however the combination of the three most stable reference genes do vary amongst individual cultivars. This study will provide guidance to other researchers to select reference genes for normalization against target genes by qPCR across tissues obtained from the mesocarp region of the olive fruit in the cultivars, Barnea, Frantoio and Picual. PMID:24884716
Ferreira da Costa, Joana; Silva, David; Caamaño, Olga; Brea, José M; Loza, Maria Isabel; Munteanu, Cristian R; Pazos, Alejandro; García-Mera, Xerardo; González-Díaz, Humbert
2018-06-25
Predicting drug-protein interactions (DPIs) for target proteins involved in dopamine pathways is a very important goal in medicinal chemistry. We can tackle this problem using Molecular Docking or Machine Learning (ML) models for one specific protein. Unfortunately, these models fail to account for large and complex big data sets of preclinical assays reported in public databases. This includes multiple conditions of assays, such as different experimental parameters, biological assays, target proteins, cell lines, organism of the target, or organism of assay. On the other hand, perturbation theory (PT) models allow us to predict the properties of a query compound or molecular system in experimental assays with multiple boundary conditions based on a previously known case of reference. In this work, we report the first PTML (PT + ML) study of a large ChEMBL data set of preclinical assays of compounds targeting dopamine pathway proteins. The best PTML model found predicts 50000 cases with accuracy of 70-91% in training and external validation series. We also compared the linear PTML model with alternative PTML models trained with multiple nonlinear methods (artificial neural network (ANN), Random Forest, Deep Learning, etc.). Some of the nonlinear methods outperform the linear model but at the cost of a notable increment of the complexity of the model. We illustrated the practical use of the new model with a proof-of-concept theoretical-experimental study. We reported for the first time the organic synthesis, chemical characterization, and pharmacological assay of a new series of l-prolyl-l-leucyl-glycinamide (PLG) peptidomimetic compounds. In addition, we performed a molecular docking study for some of these compounds with the software Vina AutoDock. The work ends with a PTML model predictive study of the outcomes of the new compounds in a large number of assays. Therefore, this study offers a new computational methodology for predicting the outcome for any compound in new assays. This PTML method focuses on the prediction with a simple linear model of multiple pharmacological parameters (IC 50 , EC 50 , K i , etc.) for compounds in assays involving different cell lines used, organisms of the protein target, or organism of assay for proteins in the dopamine pathway.
Biomarker Reference Sets for Cancers in Women — EDRN Public Portal
The purpose of this study is to develop a standard reference set of specimens for use by investigators participating in the National Cancer Institutes Early Detection Research Network (EDRN) in defining false positive rates for new cancer biomarkers in women.
Background | Office of Cancer Clinical Proteomics Research
The term "proteomics" refers to a large-scale comprehensive study of a specific proteome resulting from its genome, including abundances of proteins, their variations and modifications, and interacting partners and networks in order to understand cellular processes involved. Similarly, “Cancer proteomics” refers to comprehensive analyses of proteins and their derivatives translated from a specific cancer genome using a human biospecimen or a preclinical model (e.g., cultured cell or animal model).
Meeting the nutrient reference values on a vegetarian diet.
Reid, Michelle A; Marsh, Kate A; Zeuschner, Carol L; Saunders, Angela V; Baines, Surinder K
2013-08-19
Surveys over the past 10 years have shown that Australians are increasingly consuming more plant-based vegetarian meals. Many studies demonstrate the health benefits of vegetarian diets. As with any type of eating plan, vegetarian diets must be well planned to ensure nutritional needs are being met. This clinical focus project shows that well planned vegetarian diets can meet almost all the nutritional needs of children and adults of all ages. Sample single-day lacto-ovo-vegetarian meal plans were developed to comply with the nutrient reference values - including the increased requirements for iron and zinc at 180% and 150%, respectively, for vegetarians - for both sexes and all age groups set by Australia's National Health and Medical Research Council and the New Zealand Ministry of Health. With the exception of vitamin D, long-chain omega-3 fatty acids and extended iron requirements in pregnancy for vegetarians, the meal plans meet key requirements with respect to energy; protein; carbohydrate; total fat; saturated, poly- and monounsaturated fats; α-linolenic acid; fibre; iron; zinc; calcium; folate; and vitamins A, C, E and B₁₂.
Unified View of Backward Backtracking in Short Read Mapping
NASA Astrophysics Data System (ADS)
Mäkinen, Veli; Välimäki, Niko; Laaksonen, Antti; Katainen, Riku
Mapping short DNA reads to the reference genome is the core task in the recent high-throughput technologies to study e.g. protein-DNA interactions (ChIP-seq) and alternative splicing (RNA-seq). Several tools for the task (bowtie, bwa, SOAP2, TopHat) have been developed that exploit Burrows-Wheeler transform and the backward backtracking technique on it, to map the reads to their best approximate occurrences in the genome. These tools use different tailored mechanisms for small error-levels to prune the search phase significantly. We propose a new pruning mechanism that can be seen a generalization of the tailored mechanisms used so far. It uses a novel idea of storing all cyclic rotations of fixed length substrings of the reference sequence with a compressed index that is able to exploit the repetitions created to level out the growth of the input set. For RNA-seq we propose a new method that combines dynamic programming with backtracking to map efficiently and correctly all reads that span two exons. Same mechanism can also be used for mapping mate-pair reads.
21 CFR 102.23 - Peanut spreads.
Code of Federal Regulations, 2012 CFR
2012-04-01
... equivalent to peanut butter if it meets all of the following conditions: (1) Protein. (i) The protein content... mixed protein sources a nitrogen conversion factor of 6.25 may be used. (1) Protein quantity: “Official... incorporation by reference is given in paragraph (c)(1) of this section. (3) Niacin: AOAC, 13th Ed. (1980...
21 CFR 102.23 - Peanut spreads.
Code of Federal Regulations, 2014 CFR
2014-04-01
... equivalent to peanut butter if it meets all of the following conditions: (1) Protein. (i) The protein content... mixed protein sources a nitrogen conversion factor of 6.25 may be used. (1) Protein quantity: “Official... incorporation by reference is given in paragraph (c)(1) of this section. (3) Niacin: AOAC, 13th Ed. (1980...
21 CFR 102.23 - Peanut spreads.
Code of Federal Regulations, 2013 CFR
2013-04-01
... equivalent to peanut butter if it meets all of the following conditions: (1) Protein. (i) The protein content... mixed protein sources a nitrogen conversion factor of 6.25 may be used. (1) Protein quantity: “Official... incorporation by reference is given in paragraph (c)(1) of this section. (3) Niacin: AOAC, 13th Ed. (1980...
21 CFR 102.23 - Peanut spreads.
Code of Federal Regulations, 2010 CFR
2010-04-01
... equivalent to peanut butter if it meets all of the following conditions: (1) Protein. (i) The protein content... mixed protein sources a nitrogen conversion factor of 6.25 may be used. (1) Protein quantity: “Official... incorporation by reference is given in paragraph (c)(1) of this section. (3) Niacin: AOAC, 13th Ed. (1980...
21 CFR 102.23 - Peanut spreads.
Code of Federal Regulations, 2011 CFR
2011-04-01
... equivalent to peanut butter if it meets all of the following conditions: (1) Protein. (i) The protein content... mixed protein sources a nitrogen conversion factor of 6.25 may be used. (1) Protein quantity: “Official... incorporation by reference is given in paragraph (c)(1) of this section. (3) Niacin: AOAC, 13th Ed. (1980...
Protein Identification Using Top-Down Spectra*
Liu, Xiaowen; Sirotkin, Yakov; Shen, Yufeng; Anderson, Gordon; Tsai, Yihsuan S.; Ting, Ying S.; Goodlett, David R.; Smith, Richard D.; Bafna, Vineet; Pevzner, Pavel A.
2012-01-01
In the last two years, because of advances in protein separation and mass spectrometry, top-down mass spectrometry moved from analyzing single proteins to analyzing complex samples and identifying hundreds and even thousands of proteins. However, computational tools for database search of top-down spectra against protein databases are still in their infancy. We describe MS-Align+, a fast algorithm for top-down protein identification based on spectral alignment that enables searches for unexpected post-translational modifications. We also propose a method for evaluating statistical significance of top-down protein identifications and further benchmark various software tools on two top-down data sets from Saccharomyces cerevisiae and Salmonella typhimurium. We demonstrate that MS-Align+ significantly increases the number of identified spectra as compared with MASCOT and OMSSA on both data sets. Although MS-Align+ and ProSightPC have similar performance on the Salmonella typhimurium data set, MS-Align+ outperforms ProSightPC on the (more complex) Saccharomyces cerevisiae data set. PMID:22027200
Mekonnen, Zewdie; Amuamuta, Asmare; Mulu, Wondemagegn; Yimer, Mulat; Zenebe, Yohannes; Adem, Yesuf; Abera, Bayeh; Gebeyehu, Wondemu; Gebregziabher, Yakob
2017-01-01
Reference interval is crucial for disease screening, diagnosis, monitoring, progression and treatment efficacy. Due to lack of locally derived reference values for the parameters, clinicians use reference intervals derived from western population. But, studies conducted in different African countries have indicated differences between locally and western derived reference values. Different studies also indicated considerable variation in clinical chemistry reference intervals by several variables such as age, sex, geographical location, environment, lifestyle and genetic variation. This study aimed to determine the reference intervals of common clinical chemistry parameters of the community of Gojjam Zones, Northwest Ethiopia. Population based cross-sectional study was conducted from November 2015 to December 2016 in healthy adult populations of Gojjam zone. Data such as, medical history, physical examination and socio-demographic data were collected. In addition, laboratory investigations were undertaken to screen the population. Clinical chemistry parameters were measured using Mindray BS 200 clinical chemistry autoanalyzer as per the manufacturer's instructions. Descriptive statistics was used to calculate mean, median and 95th percentiles. Independent sample T-test and one way ANOVA were used to see association between variables. After careful screening of a total of 799 apparently healthy adults who were consented for this study, complete data from 446 (224 females and 222 males) were included for the analysis. The mean age of both the study participants was 28.8 years. Males had high (P<0.05) mean and 2.5th-97.5th percentile ranges of ALT, AST, ALP, creatinine and direct bilirubin. The reference intervals of amylase, LDH, total protein and total bilirubin were not significantly different between the two sex groups (P>0.05). Mean, median, 95% percentile values of AST, ALP, amylase, LDH, creatinine, total protein, total bilirubin, and direct bilirubin across all age groups of participants were similar (P>0.05). But, there was a significant difference in the value of ALT (P<0.05). The reference intervals of ALT, total protein and creatinine were significantly (P<0.05) high in people having monthly income >1500 ETB compared to those with low monthly income. Significant (P<0.05) higher values of the ALT, ALP and total protein were observed in people living in high land compared to low land residences. The study showed that some of the common clinical chemistry parameters reference intervals of healthy adults in Gojjam zones were higher than the reference intervals generated from developed countries. Therefore, strict adherence to the reference values generated in developed countries could lead to inappropriate diagnosis and treatment of patients. There was also variation of reference interval values based on climate, gender, age, monthly income and geographical locations. Therefore, further study is required to establish reference intervals for Ethiopian population.
Electrostatic contribution to the binding stability of protein-protein complexes.
Dong, Feng; Zhou, Huan-Xiang
2006-10-01
To investigate roles of electrostatic interactions in protein binding stability, electrostatic calculations were carried out on a set of 64 mutations over six protein-protein complexes. These mutations alter polar interactions across the interface and were selected for putative dominance of electrostatic contributions to the binding stability. Three protocols of implementing the Poisson-Boltzmann model were tested. In vdW4 the dielectric boundary between the protein low dielectric and the solvent high dielectric is defined as the protein van der Waals surface and the protein dielectric constant is set to 4. In SE4 and SE20, the dielectric boundary is defined as the surface of the protein interior inaccessible to a 1.4-A solvent probe, and the protein dielectric constant is set to 4 and 20, respectively. In line with earlier studies on the barnase-barstar complex, the vdW4 results on the large set of mutations showed the closest agreement with experimental data. The agreement between vdW4 and experiment supports the contention of dominant electrostatic contributions for the mutations, but their differences also suggest van der Waals and hydrophobic contributions. The results presented here will serve as a guide for future refinement in electrostatic calculation and inclusion of nonelectrostatic effects. Proteins 2006. (c) 2006 Wiley-Liss, Inc.
Surflex-Dock: Docking benchmarks and real-world application
NASA Astrophysics Data System (ADS)
Spitzer, Russell; Jain, Ajay N.
2012-06-01
Benchmarks for molecular docking have historically focused on re-docking the cognate ligand of a well-determined protein-ligand complex to measure geometric pose prediction accuracy, and measurement of virtual screening performance has been focused on increasingly large and diverse sets of target protein structures, cognate ligands, and various types of decoy sets. Here, pose prediction is reported on the Astex Diverse set of 85 protein ligand complexes, and virtual screening performance is reported on the DUD set of 40 protein targets. In both cases, prepared structures of targets and ligands were provided by symposium organizers. The re-prepared data sets yielded results not significantly different than previous reports of Surflex-Dock on the two benchmarks. Minor changes to protein coordinates resulting from complex pre-optimization had large effects on observed performance, highlighting the limitations of cognate ligand re-docking for pose prediction assessment. Docking protocols developed for cross-docking, which address protein flexibility and produce discrete families of predicted poses, produced substantially better performance for pose prediction. Performance on virtual screening performance was shown to benefit by employing and combining multiple screening methods: docking, 2D molecular similarity, and 3D molecular similarity. In addition, use of multiple protein conformations significantly improved screening enrichment.
Automated crystallographic system for high-throughput protein structure determination.
Brunzelle, Joseph S; Shafaee, Padram; Yang, Xiaojing; Weigand, Steve; Ren, Zhong; Anderson, Wayne F
2003-07-01
High-throughput structural genomic efforts require software that is highly automated, distributive and requires minimal user intervention to determine protein structures. Preliminary experiments were set up to test whether automated scripts could utilize a minimum set of input parameters and produce a set of initial protein coordinates. From this starting point, a highly distributive system was developed that could determine macromolecular structures at a high throughput rate, warehouse and harvest the associated data. The system uses a web interface to obtain input data and display results. It utilizes a relational database to store the initial data needed to start the structure-determination process as well as generated data. A distributive program interface administers the crystallographic programs which determine protein structures. Using a test set of 19 protein targets, 79% were determined automatically.
A reference system for animal biometrics: application to the northern leopard frog
Petrovska-Delacretaz, D.; Edwards, A.; Chiasson, J.; Chollet, G.; Pilliod, D.S.
2014-01-01
Reference systems and public databases are available for human biometrics, but to our knowledge nothing is available for animal biometrics. This is surprising because animals are not required to give their agreement to be in a database. This paper proposes a reference system and database for the northern leopard frog (Lithobates pipiens). Both are available for reproducible experiments. Results of both open set and closed set experiments are given.
Engert, Christoph G; Droste, Rita; van Oudenaarden, Alexander; Horvitz, H Robert
2018-04-01
To better understand the tissue-specific regulation of chromatin state in cell-fate determination and animal development, we defined the tissue-specific expression of all 36 C. elegans presumptive lysine methyltransferase (KMT) genes using single-molecule fluorescence in situ hybridization (smFISH). Most KMTs were expressed in only one or two tissues. The germline was the tissue with the broadest KMT expression. We found that the germline-expressed C. elegans protein SET-17, which has a SET domain similar to that of the PRDM9 and PRDM7 SET-domain proteins, promotes fertility by regulating gene expression in primary spermatocytes. SET-17 drives the transcription of spermatocyte-specific genes from four genomic clusters to promote spermatid development. SET-17 is concentrated in stable chromatin-associated nuclear foci at actively transcribed msp (major sperm protein) gene clusters, which we term msp locus bodies. Our results reveal the function of a PRDM9/7-family SET-domain protein in spermatocyte transcription. We propose that the spatial intranuclear organization of chromatin factors might be a conserved mechanism in tissue-specific control of transcription.
Sader, John E; Friend, James R
2015-05-01
Overall precision of the simplified calibration method in J. E. Sader et al., Rev. Sci. Instrum. 83, 103705 (2012), Sec. III D, is dominated by the spring constant of the reference cantilever. The question arises: How does one take measurements from multiple reference cantilevers, and combine these results, to improve uncertainty of the reference cantilever's spring constant and hence the overall precision of the method? This question is addressed in this note. Its answer enables manufacturers to specify of a single set of data for the spring constant, resonant frequency, and quality factor, from measurements on multiple reference cantilevers. With this data set, users can trivially calibrate cantilevers of the same type.
Proteomic analysis of Chromobacterium violaceum and its adaptability to stress.
Castro, Diogo; Cordeiro, Isabelle Bezerra; Taquita, Paula; Eberlin, Marcos Nogueira; Garcia, Jerusa Simone; Souza, Gustavo Henrique M F; Arruda, Marco Aurélio Zezzi; Andrade, Edmar V; Filho, Spartaco A; Crainey, J Lee; Lozano, Luis Lopez; Nogueira, Paulo A; Orlandi, Patrícia P
2015-12-01
Chromobacterium violaceum (C. violaceum) occurs abundantly in a variety of ecosystems, including ecosystems that place the bacterium under stress. This study assessed the adaptability of C. violaceum by submitting it to nutritional and pH stresses and then analyzing protein expression using bi-dimensional electrophoresis (2-DE) and Maldi mass spectrometry. Chromobacterium violaceum grew best in pH neutral, nutrient-rich medium (reference conditions); however, the total protein mass recovered from stressed bacteria cultures was always higher than the total protein mass recovered from our reference culture. The diversity of proteins expressed (repressed by the number of identifiable 2-DE spots) was seen to be highest in the reference cultures, suggesting that stress reduces the overall range of proteins expressed by C. violaceum. Database comparisons allowed 43 of the 55 spots subjected to Maldi mass spectrometry to be characterized as containing a single identifiable protein. Stress-related expression changes were noted for C. violaceum proteins related to the previously characterized bacterial proteins: DnaK, GroEL-2, Rhs, EF-Tu, EF-P; MCP, homogentisate 1,2-dioxygenase, Arginine deiminase and the ATP synthase β-subunit protein as well as for the ribosomal protein subunits L1, L3, L5 and L6. The ability of C. violaceum to adapt its cellular mechanics to sub-optimal growth and protein production conditions was well illustrated by its regulation of ribosomal protein subunits. With the exception of the ribosomal subunit L3, which plays a role in protein folding and maybe therefore be more useful in stressful conditions, all the other ribosomal subunit proteins were seen to have reduced expression in stressed cultures. Curiously, C. violeaceum cultures were also observed to lose their violet color under stress, which suggests that the violacein pigment biosynthetic pathway is affected by stress. Analysis of the proteomic signatures of stressed C. violaceum indicates that nutrient-starvation and pH stress can cause changes in the expression of the C. violaceum receptors, transporters, and proteins involved with biosynthetic pathways, molecule recycling, energy production. Our findings complement the recent publication of the C. violeaceum genome sequence and could help with the future commercial exploitation of C. violeaceum.
qPIPSA: Relating enzymatic kinetic parameters and interaction fields
Gabdoulline, Razif R; Stein, Matthias; Wade, Rebecca C
2007-01-01
Background The simulation of metabolic networks in quantitative systems biology requires the assignment of enzymatic kinetic parameters. Experimentally determined values are often not available and therefore computational methods to estimate these parameters are needed. It is possible to use the three-dimensional structure of an enzyme to perform simulations of a reaction and derive kinetic parameters. However, this is computationally demanding and requires detailed knowledge of the enzyme mechanism. We have therefore sought to develop a general, simple and computationally efficient procedure to relate protein structural information to enzymatic kinetic parameters that allows consistency between the kinetic and structural information to be checked and estimation of kinetic constants for structurally and mechanistically similar enzymes. Results We describe qPIPSA: quantitative Protein Interaction Property Similarity Analysis. In this analysis, molecular interaction fields, for example, electrostatic potentials, are computed from the enzyme structures. Differences in molecular interaction fields between enzymes are then related to the ratios of their kinetic parameters. This procedure can be used to estimate unknown kinetic parameters when enzyme structural information is available and kinetic parameters have been measured for related enzymes or were obtained under different conditions. The detailed interaction of the enzyme with substrate or cofactors is not modeled and is assumed to be similar for all the proteins compared. The protein structure modeling protocol employed ensures that differences between models reflect genuine differences between the protein sequences, rather than random fluctuations in protein structure. Conclusion Provided that the experimental conditions and the protein structural models refer to the same protein state or conformation, correlations between interaction fields and kinetic parameters can be established for sets of related enzymes. Outliers may arise due to variation in the importance of different contributions to the kinetic parameters, such as protein stability and conformational changes. The qPIPSA approach can assist in the validation as well as estimation of kinetic parameters, and provide insights into enzyme mechanism. PMID:17919319
dbPAF: an integrative database of protein phosphorylation in animals and fungi.
Ullah, Shahid; Lin, Shaofeng; Xu, Yang; Deng, Wankun; Ma, Lili; Zhang, Ying; Liu, Zexian; Xue, Yu
2016-03-24
Protein phosphorylation is one of the most important post-translational modifications (PTMs) and regulates a broad spectrum of biological processes. Recent progresses in phosphoproteomic identifications have generated a flood of phosphorylation sites, while the integration of these sites is an urgent need. In this work, we developed a curated database of dbPAF, containing known phosphorylation sites in H. sapiens, M. musculus, R. norvegicus, D. melanogaster, C. elegans, S. pombe and S. cerevisiae. From the scientific literature and public databases, we totally collected and integrated 54,148 phosphoproteins with 483,001 phosphorylation sites. Multiple options were provided for accessing the data, while original references and other annotations were also present for each phosphoprotein. Based on the new data set, we computationally detected significantly over-represented sequence motifs around phosphorylation sites, predicted potential kinases that are responsible for the modification of collected phospho-sites, and evolutionarily analyzed phosphorylation conservation states across different species. Besides to be largely consistent with previous reports, our results also proposed new features of phospho-regulation. Taken together, our database can be useful for further analyses of protein phosphorylation in human and other model organisms. The dbPAF database was implemented in PHP + MySQL and freely available at http://dbpaf.biocuckoo.org.
MIPS: analysis and annotation of proteins from whole genomes in 2005
Mewes, H. W.; Frishman, D.; Mayer, K. F. X.; Münsterkötter, M.; Noubibou, O.; Pagel, P.; Rattei, T.; Oesterheld, M.; Ruepp, A.; Stümpflen, V.
2006-01-01
The Munich Information Center for Protein Sequences (MIPS at the GSF), Neuherberg, Germany, provides resources related to genome information. Manually curated databases for several reference organisms are maintained. Several of these databases are described elsewhere in this and other recent NAR database issues. In a complementary effort, a comprehensive set of >400 genomes automatically annotated with the PEDANT system are maintained. The main goal of our current work on creating and maintaining genome databases is to extend gene centered information to information on interactions within a generic comprehensive framework. We have concentrated our efforts along three lines (i) the development of suitable comprehensive data structures and database technology, communication and query tools to include a wide range of different types of information enabling the representation of complex information such as functional modules or networks Genome Research Environment System, (ii) the development of databases covering computable information such as the basic evolutionary relations among all genes, namely SIMAP, the sequence similarity matrix and the CABiNet network analysis framework and (iii) the compilation and manual annotation of information related to interactions such as protein–protein interactions or other types of relations (e.g. MPCDB, MPPI, CYGD). All databases described and the detailed descriptions of our projects can be accessed through the MIPS WWW server (). PMID:16381839
The Protein Information Resource: an integrated public resource of functional annotation of proteins
Wu, Cathy H.; Huang, Hongzhan; Arminski, Leslie; Castro-Alvear, Jorge; Chen, Yongxing; Hu, Zhang-Zhi; Ledley, Robert S.; Lewis, Kali C.; Mewes, Hans-Werner; Orcutt, Bruce C.; Suzek, Baris E.; Tsugita, Akira; Vinayaka, C. R.; Yeh, Lai-Su L.; Zhang, Jian; Barker, Winona C.
2002-01-01
The Protein Information Resource (PIR) serves as an integrated public resource of functional annotation of protein data to support genomic/proteomic research and scientific discovery. The PIR, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the PIR-International Protein Sequence Database (PSD), the major annotated protein sequence database in the public domain, containing about 250 000 proteins. To improve protein annotation and the coverage of experimentally validated data, a bibliography submission system is developed for scientists to submit, categorize and retrieve literature information. Comprehensive protein information is available from iProClass, which includes family classification at the superfamily, domain and motif levels, structural and functional features of proteins, as well as cross-references to over 40 biological databases. To provide timely and comprehensive protein data with source attribution, we have introduced a non-redundant reference protein database, PIR-NREF. The database consists of about 800 000 proteins collected from PIR-PSD, SWISS-PROT, TrEMBL, GenPept, RefSeq and PDB, with composite protein names and literature data. To promote database interoperability, we provide XML data distribution and open database schema, and adopt common ontologies. The PIR web site (http://pir.georgetown.edu/) features data mining and sequence analysis tools for information retrieval and functional identification of proteins based on both sequence and annotation information. The PIR databases and other files are also available by FTP (ftp://nbrfa.georgetown.edu/pir_databases). PMID:11752247
Adaptive Local Realignment of Protein Sequences.
DeBlasio, Dan; Kececioglu, John
2018-06-11
While mutation rates can vary markedly over the residues of a protein, multiple sequence alignment tools typically use the same values for their scoring-function parameters across a protein's entire length. We present a new approach, called adaptive local realignment, that in contrast automatically adapts to the diversity of mutation rates along protein sequences. This builds upon a recent technique known as parameter advising, which finds global parameter settings for an aligner, to now adaptively find local settings. Our approach in essence identifies local regions with low estimated accuracy, constructs a set of candidate realignments using a carefully-chosen collection of parameter settings, and replaces the region if a realignment has higher estimated accuracy. This new method of local parameter advising, when combined with prior methods for global advising, boosts alignment accuracy as much as 26% over the best default setting on hard-to-align protein benchmarks, and by 6.4% over global advising alone. Adaptive local realignment has been implemented within the Opal aligner using the Facet accuracy estimator.
Exploring the potential of 3D Zernike descriptors and SVM for protein-protein interface prediction.
Daberdaku, Sebastian; Ferrari, Carlo
2018-02-06
The correct determination of protein-protein interaction interfaces is important for understanding disease mechanisms and for rational drug design. To date, several computational methods for the prediction of protein interfaces have been developed, but the interface prediction problem is still not fully understood. Experimental evidence suggests that the location of binding sites is imprinted in the protein structure, but there are major differences among the interfaces of the various protein types: the characterising properties can vary a lot depending on the interaction type and function. The selection of an optimal set of features characterising the protein interface and the development of an effective method to represent and capture the complex protein recognition patterns are of paramount importance for this task. In this work we investigate the potential of a novel local surface descriptor based on 3D Zernike moments for the interface prediction task. Descriptors invariant to roto-translations are extracted from circular patches of the protein surface enriched with physico-chemical properties from the HQI8 amino acid index set, and are used as samples for a binary classification problem. Support Vector Machines are used as a classifier to distinguish interface local surface patches from non-interface ones. The proposed method was validated on 16 classes of proteins extracted from the Protein-Protein Docking Benchmark 5.0 and compared to other state-of-the-art protein interface predictors (SPPIDER, PrISE and NPS-HomPPI). The 3D Zernike descriptors are able to capture the similarity among patterns of physico-chemical and biochemical properties mapped on the protein surface arising from the various spatial arrangements of the underlying residues, and their usage can be easily extended to other sets of amino acid properties. The results suggest that the choice of a proper set of features characterising the protein interface is crucial for the interface prediction task, and that optimality strongly depends on the class of proteins whose interface we want to characterise. We postulate that different protein classes should be treated separately and that it is necessary to identify an optimal set of features for each protein class.
Ferreira, Ari J S; Siam, Rania; Setubal, João C; Moustafa, Ahmed; Sayed, Ahmed; Chambergo, Felipe S; Dawe, Adam S; Ghazy, Mohamed A; Sharaf, Hazem; Ouf, Amged; Alam, Intikhab; Abdel-Haleem, Alyaa M; Lehvaslaiho, Heikki; Ramadan, Eman; Antunes, André; Stingl, Ulrich; Archer, John A C; Jankovic, Boris R; Sogin, Mitchell; Bajic, Vladimir B; El-Dorry, Hamza
2014-01-01
Metagenomics-based functional profiling analysis is an effective means of gaining deeper insight into the composition of marine microbial populations and developing a better understanding of the interplay between the functional genome content of microbial communities and abiotic factors. Here we present a comprehensive analysis of 24 datasets covering surface and depth-related environments at 11 sites around the world's oceans. The complete datasets comprises approximately 12 million sequences, totaling 5,358 Mb. Based on profiling patterns of Clusters of Orthologous Groups (COGs) of proteins, a core set of reference photic and aphotic depth-related COGs, and a collection of COGs that are associated with extreme oxygen limitation were defined. Their inferred functions were utilized as indicators to characterize the distribution of light- and oxygen-related biological activities in marine environments. The results reveal that, while light level in the water column is a major determinant of phenotypic adaptation in marine microorganisms, oxygen concentration in the aphotic zone has a significant impact only in extremely hypoxic waters. Phylogenetic profiling of the reference photic/aphotic gene sets revealed a greater variety of source organisms in the aphotic zone, although the majority of individual photic and aphotic depth-related COGs are assigned to the same taxa across the different sites. This increase in phylogenetic and functional diversity of the core aphotic related COGs most probably reflects selection for the utilization of a broad range of alternate energy sources in the absence of light.
Ferreira, Ari J. S.; Siam, Rania; Setubal, João C.; Moustafa, Ahmed; Sayed, Ahmed; Chambergo, Felipe S.; Dawe, Adam S.; Ghazy, Mohamed A.; Sharaf, Hazem; Ouf, Amged; Alam, Intikhab; Abdel-Haleem, Alyaa M.; Lehvaslaiho, Heikki; Ramadan, Eman; Antunes, André; Stingl, Ulrich; Archer, John A. C.; Jankovic, Boris R.; Sogin, Mitchell; Bajic, Vladimir B.; El-Dorry, Hamza
2014-01-01
Metagenomics-based functional profiling analysis is an effective means of gaining deeper insight into the composition of marine microbial populations and developing a better understanding of the interplay between the functional genome content of microbial communities and abiotic factors. Here we present a comprehensive analysis of 24 datasets covering surface and depth-related environments at 11 sites around the world's oceans. The complete datasets comprises approximately 12 million sequences, totaling 5,358 Mb. Based on profiling patterns of Clusters of Orthologous Groups (COGs) of proteins, a core set of reference photic and aphotic depth-related COGs, and a collection of COGs that are associated with extreme oxygen limitation were defined. Their inferred functions were utilized as indicators to characterize the distribution of light- and oxygen-related biological activities in marine environments. The results reveal that, while light level in the water column is a major determinant of phenotypic adaptation in marine microorganisms, oxygen concentration in the aphotic zone has a significant impact only in extremely hypoxic waters. Phylogenetic profiling of the reference photic/aphotic gene sets revealed a greater variety of source organisms in the aphotic zone, although the majority of individual photic and aphotic depth-related COGs are assigned to the same taxa across the different sites. This increase in phylogenetic and functional diversity of the core aphotic related COGs most probably reflects selection for the utilization of a broad range of alternate energy sources in the absence of light. PMID:24921648
Raue, R; Hess, M
1998-08-01
Three different polymerase chain reactions (PCRs), two of them combined with restriction enzyme analysis (REA), were developed for detection and differentiation of all 12 fowl adenovirus (FAV) serotypes and the egg drop syndrome (EDS) virus. For primer construction FAV1, FAV10 and EDS virus hexon proteins were aligned and conserved and variable regions were determined. Two primer sets (H1/H2 and H3/H4) for single use were constructed which hybridize in three conserved regions of hexon genes. Each primer pair amplifies approximately half of the hexon gene including two loop regions. An amplification product was detected with both primer sets using purified DNA from all FAV1-12 reference strains. Viral EDS DNA was negative using the H1/H2 or H3/H4 primer pair. HaeII digestion of the H1/H2 amplification products differentiates between all viruses except FAV4 and FAV5. In comparison, much more clustering among genomic closely related FAV serotypes was seen after HpaII digestion of the H3/H4 PCR products. Oligonucleotides H5/H6 located in the variable regions of EDS virus hexon gene do not detect any of the FAV serotypes. The PCRs and REA described are suitable to detect all avian adenoviruses infecting chickens, to distinguish all 12 FAV reference strains and to differentiate FAVs from the EDS virus.
Rudnick, Paul A; Markey, Sanford P; Roth, Jeri; Mirokhin, Yuri; Yan, Xinjian; Tchekhovskoi, Dmitrii V; Edwards, Nathan J; Thangudu, Ratna R; Ketchum, Karen A; Kinsinger, Christopher R; Mesri, Mehdi; Rodriguez, Henry; Stein, Stephen E
2016-03-04
The Clinical Proteomic Tumor Analysis Consortium (CPTAC) has produced large proteomics data sets from the mass spectrometric interrogation of tumor samples previously analyzed by The Cancer Genome Atlas (TCGA) program. The availability of the genomic and proteomic data is enabling proteogenomic study for both reference (i.e., contained in major sequence databases) and nonreference markers of cancer. The CPTAC laboratories have focused on colon, breast, and ovarian tissues in the first round of analyses; spectra from these data sets were produced from 2D liquid chromatography-tandem mass spectrometry analyses and represent deep coverage. To reduce the variability introduced by disparate data analysis platforms (e.g., software packages, versions, parameters, sequence databases, etc.), the CPTAC Common Data Analysis Platform (CDAP) was created. The CDAP produces both peptide-spectrum-match (PSM) reports and gene-level reports. The pipeline processes raw mass spectrometry data according to the following: (1) peak-picking and quantitative data extraction, (2) database searching, (3) gene-based protein parsimony, and (4) false-discovery rate-based filtering. The pipeline also produces localization scores for the phosphopeptide enrichment studies using the PhosphoRS program. Quantitative information for each of the data sets is specific to the sample processing, with PSM and protein reports containing the spectrum-level or gene-level ("rolled-up") precursor peak areas and spectral counts for label-free or reporter ion log-ratios for 4plex iTRAQ. The reports are available in simple tab-delimited formats and, for the PSM-reports, in mzIdentML. The goal of the CDAP is to provide standard, uniform reports for all of the CPTAC data to enable comparisons between different samples and cancer types as well as across the major omics fields.
Ashford, Paul; Moss, David S; Alex, Alexander; Yeap, Siew K; Povia, Alice; Nobeli, Irene; Williams, Mark A
2012-03-14
Protein structures provide a valuable resource for rational drug design. For a protein with no known ligand, computational tools can predict surface pockets that are of suitable size and shape to accommodate a complementary small-molecule drug. However, pocket prediction against single static structures may miss features of pockets that arise from proteins' dynamic behaviour. In particular, ligand-binding conformations can be observed as transiently populated states of the apo protein, so it is possible to gain insight into ligand-bound forms by considering conformational variation in apo proteins. This variation can be explored by considering sets of related structures: computationally generated conformers, solution NMR ensembles, multiple crystal structures, homologues or homology models. It is non-trivial to compare pockets, either from different programs or across sets of structures. For a single structure, difficulties arise in defining particular pocket's boundaries. For a set of conformationally distinct structures the challenge is how to make reasonable comparisons between them given that a perfect structural alignment is not possible. We have developed a computational method, Provar, that provides a consistent representation of predicted binding pockets across sets of related protein structures. The outputs are probabilities that each atom or residue of the protein borders a predicted pocket. These probabilities can be readily visualised on a protein using existing molecular graphics software. We show how Provar simplifies comparison of the outputs of different pocket prediction algorithms, of pockets across multiple simulated conformations and between homologous structures. We demonstrate the benefits of use of multiple structures for protein-ligand and protein-protein interface analysis on a set of complexes and consider three case studies in detail: i) analysis of a kinase superfamily highlights the conserved occurrence of surface pockets at the active and regulatory sites; ii) a simulated ensemble of unliganded Bcl2 structures reveals extensions of a known ligand-binding pocket not apparent in the apo crystal structure; iii) visualisations of interleukin-2 and its homologues highlight conserved pockets at the known receptor interfaces and regions whose conformation is known to change on inhibitor binding. Through post-processing of the output of a variety of pocket prediction software, Provar provides a flexible approach to the analysis and visualization of the persistence or variability of pockets in sets of related protein structures.
Blank-Landeshammer, Bernhard; Kollipara, Laxmikanth; Biß, Karsten; Pfenninger, Markus; Malchow, Sebastian; Shuvaev, Konstantin; Zahedi, René P; Sickmann, Albert
2017-09-01
Complex mass spectrometry based proteomics data sets are mostly analyzed by protein database searches. While this approach performs considerably well for sequenced organisms, direct inference of peptide sequences from tandem mass spectra, i.e., de novo peptide sequencing, oftentimes is the only way to obtain information when protein databases are absent. However, available algorithms suffer from drawbacks such as lack of validation and often high rates of false positive hits (FP). Here we present a simple method of combining results from commonly available de novo peptide sequencing algorithms, which in conjunction with minor tweaks in data acquisition ensues lower empirical FDR compared to the analysis using single algorithms. Results were validated using state-of-the art database search algorithms as well specifically synthesized reference peptides. Thus, we could increase the number of PSMs meeting a stringent FDR of 5% more than 3-fold compared to the single best de novo sequencing algorithm alone, accounting for an average of 11 120 PSMs (combined) instead of 3476 PSMs (alone) in triplicate 2 h LC-MS runs of tryptic HeLa digestion.
Quantum-mechanics-derived 13Cα chemical shift server (CheShift) for protein structure validation
Vila, Jorge A.; Arnautova, Yelena A.; Martin, Osvaldo A.; Scheraga, Harold A.
2009-01-01
A server (CheShift) has been developed to predict 13Cα chemical shifts of protein structures. It is based on the generation of 696,916 conformations as a function of the φ, ψ, ω, χ1 and χ2 torsional angles for all 20 naturally occurring amino acids. Their 13Cα chemical shifts were computed at the DFT level of theory with a small basis set and extrapolated, with an empirically-determined linear regression formula, to reproduce the values obtained with a larger basis set. Analysis of the accuracy and sensitivity of the CheShift predictions, in terms of both the correlation coefficient R and the conformational-averaged rmsd between the observed and predicted 13Cα chemical shifts, was carried out for 3 sets of conformations: (i) 36 x-ray-derived protein structures solved at 2.3 Å or better resolution, for which sets of 13Cα chemical shifts were available; (ii) 15 pairs of x-ray and NMR-derived sets of protein conformations; and (iii) a set of decoys for 3 proteins showing an rmsd with respect to the x-ray structure from which they were derived of up to 3 Å. Comparative analysis carried out with 4 popular servers, namely SHIFTS, SHIFTX, SPARTA, and PROSHIFT, for these 3 sets of conformations demonstrated that CheShift is the most sensitive server with which to detect subtle differences between protein models and, hence, to validate protein structures determined by either x-ray or NMR methods, if the observed 13Cα chemical shifts are available. CheShift is available as a web server. PMID:19805131
Evaluating Question, Persuade, Refer (QPR) Suicide Prevention Training in a College Setting
ERIC Educational Resources Information Center
Mitchell, Sharon L.; Kader, Mahrin; Darrow, Sherri A.; Haggerty, Melinda Z.; Keating, Niki L.
2013-01-01
This study assesses short-term and long-term learning outcomes of Question, Persuade, Refer (QPR) suicide prevention training in a college setting. Two hundred seventy-three participants completed pretest, posttest, and follow-up surveys regarding suicide prevention knowledge, attitudes, and skills. Results indicated: (a) increases in suicide…
Development of a Reference Coastal Wetland set in Southern New England (USA)
Various measures of plants, soils, and invertebrates were described for a reference set of tidal coastal wetlands in southern New England in order to provide a framework for assessing the condition of other similar wetlands in the region. The condition of the ten coastal wetland...
The reference frame for encoding and retention of motion depends on stimulus set size.
Huynh, Duong; Tripathy, Srimant P; Bedell, Harold E; Öğmen, Haluk
2017-04-01
The goal of this study was to investigate the reference frames used in perceptual encoding and storage of visual motion information. In our experiments, observers viewed multiple moving objects and reported the direction of motion of a randomly selected item. Using a vector-decomposition technique, we computed performance during smooth pursuit with respect to a spatiotopic (nonretinotopic) and to a retinotopic component and compared them with performance during fixation, which served as the baseline. For the stimulus encoding stage, which precedes memory, we found that the reference frame depends on the stimulus set size. For a single moving target, the spatiotopic reference frame had the most significant contribution with some additional contribution from the retinotopic reference frame. When the number of items increased (Set Sizes 3 to 7), the spatiotopic reference frame was able to account for the performance. Finally, when the number of items became larger than 7, the distinction between reference frames vanished. We interpret this finding as a switch to a more abstract nonmetric encoding of motion direction. We found that the retinotopic reference frame was not used in memory. Taken together with other studies, our results suggest that, whereas a retinotopic reference frame may be employed for controlling eye movements, perception and memory use primarily nonretinotopic reference frames. Furthermore, the use of nonretinotopic reference frames appears to be capacity limited. In the case of complex stimuli, the visual system may use perceptual grouping in order to simplify the complexity of stimuli or resort to a nonmetric abstract coding of motion information.
Monoclonal protein reference change value as determined by gel-based serum protein electrophoresis.
Salamatmanesh, Mina; McCudden, Christopher R; McCurdy, Arleigh; Booth, Ronald A
2018-01-01
The International Myeloma Working Group recommendations for monitoring disease progression or response include quantitation of the involved monoclonal immunoglobulin. They have defined the minimum change criteria of ≧25% with an absolute change of no <5g/L for either minimal response or progression. Limited evidence is available to accurately determine the magnitude of change in a monoclonal protein to reflect a true change in clinical status. Here we determined the analytical and biological variability of monoclonal proteins in stable monoclonal gammopathy of undetermined significance (MGUS) patients. Analytical variability (CVa) of normal protein fractions and monoclonal proteins were assessed agarose gel-based serum protein electrophoresis. Sixteen clinically stable MGUS patients were identified from our clinical hematology database. Individual biological variability (CVi) was determined and used to calculate a monoclonal protein reference change value (RCV). Analytical variability of the normal protein fractions (albumin, alpha-1, alpha-2, beta, total gamma) ranged from 1.3% for albumin to 5.8% for the alpha-1 globulins. CVa of low (5.6g/L) and high (32.2g/L) concentration monoclonal proteins were 3.1% and 22.2%, respectively. Individual CVi of stable patients ranged from 3.5% to 24.5% with a CVi of 12.9%. The reference change value (RCV) at a 95% probability was determined to be 36.7% (low) 39.6% (high) using our CVa and CVi. Serial monitoring of monoclonal protein concentration is important for MGUS and multiple myeloma patients. Accurate criteria for interpreting a change in monoclonal protein concentration are required for appropriate decision making. We used QC results and real-world conditions to assess imprecision of serum protein fractions including low and high monoclonal protein fractions and clinically stable MGUS patients to determine CVi and RCV. The calculated RCVs of 36.7% (low) and 39.6% (high) in this study were greater that reported previously and greater than the established criteria for relapse. Response criteria may be reassessed to increase sensitivity and specificity for detection of response. Copyright © 2017 The Canadian Society of Clinical Chemists. Published by Elsevier Inc. All rights reserved.
Benchmark data sets for structure-based computational target prediction.
Schomburg, Karen T; Rarey, Matthias
2014-08-25
Structure-based computational target prediction methods identify potential targets for a bioactive compound. Methods based on protein-ligand docking so far face many challenges, where the greatest probably is the ranking of true targets in a large data set of protein structures. Currently, no standard data sets for evaluation exist, rendering comparison and demonstration of improvements of methods cumbersome. Therefore, we propose two data sets and evaluation strategies for a meaningful evaluation of new target prediction methods, i.e., a small data set consisting of three target classes for detailed proof-of-concept and selectivity studies and a large data set consisting of 7992 protein structures and 72 drug-like ligands allowing statistical evaluation with performance metrics on a drug-like chemical space. Both data sets are built from openly available resources, and any information needed to perform the described experiments is reported. We describe the composition of the data sets, the setup of screening experiments, and the evaluation strategy. Performance metrics capable to measure the early recognition of enrichments like AUC, BEDROC, and NSLR are proposed. We apply a sequence-based target prediction method to the large data set to analyze its content of nontrivial evaluation cases. The proposed data sets are used for method evaluation of our new inverse screening method iRAISE. The small data set reveals the method's capability and limitations to selectively distinguish between rather similar protein structures. The large data set simulates real target identification scenarios. iRAISE achieves in 55% excellent or good enrichment a median AUC of 0.67 and RMSDs below 2.0 Å for 74% and was able to predict the first true target in 59 out of 72 cases in the top 2% of the protein data set of about 8000 structures.
PLMD: An updated data resource of protein lysine modifications.
Xu, Haodong; Zhou, Jiaqi; Lin, Shaofeng; Deng, Wankun; Zhang, Ying; Xue, Yu
2017-05-20
Post-translational modifications (PTMs) occurring at protein lysine residues, or protein lysine modifications (PLMs), play critical roles in regulating biological processes. Due to the explosive expansion of the amount of PLM substrates and the discovery of novel PLM types, here we greatly updated our previous studies, and presented a much more integrative resource of protein lysine modification database (PLMD). In PLMD, we totally collected and integrated 284,780 modification events in 53,501 proteins across 176 eukaryotes and prokaryotes for up to 20 types of PLMs, including ubiquitination, acetylation, sumoylation, methylation, succinylation, malonylation, glutarylation, glycation, formylation, hydroxylation, butyrylation, propionylation, crotonylation, pupylation, neddylation, 2-hydroxyisobutyrylation, phosphoglycerylation, carboxylation, lipoylation and biotinylation. Using the data set, a motif-based analysis was performed for each PLM type, and the results demonstrated that different PLM types preferentially recognize distinct sequence motifs for the modifications. Moreover, various PLMs synergistically orchestrate specific cellular biological processes by mutual crosstalks with each other, and we totally found 65,297 PLM events involved in 90 types of PLM co-occurrences on the same lysine residues. Finally, various options were provided for accessing the data, while original references and other annotations were also present for each PLM substrate. Taken together, we anticipated the PLMD database can serve as a useful resource for further researches of PLMs. PLMD 3.0 was implemented in PHP + MySQL and freely available at http://plmd.biocuckoo.org. Copyright © 2017 Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, and Genetics Society of China. Published by Elsevier Ltd. All rights reserved.
Serang, Oliver; MacCoss, Michael J.; Noble, William Stafford
2010-01-01
The problem of identifying proteins from a shotgun proteomics experiment has not been definitively solved. Identifying the proteins in a sample requires ranking them, ideally with interpretable scores. In particular, “degenerate” peptides, which map to multiple proteins, have made such a ranking difficult to compute. The problem of computing posterior probabilities for the proteins, which can be interpreted as confidence in a protein’s presence, has been especially daunting. Previous approaches have either ignored the peptide degeneracy problem completely, addressed it by computing a heuristic set of proteins or heuristic posterior probabilities, or by estimating the posterior probabilities with sampling methods. We present a probabilistic model for protein identification in tandem mass spectrometry that recognizes peptide degeneracy. We then introduce graph-transforming algorithms that facilitate efficient computation of protein probabilities, even for large data sets. We evaluate our identification procedure on five different well-characterized data sets and demonstrate our ability to efficiently compute high-quality protein posteriors. PMID:20712337
2007-04-01
optimization methodology we introduce. State-of-the-art protein - protein docking approaches start by identifying conformations with good surface /chemical com...side-chains on the interface ). The protein - protein docking literature (e.g., [8] and the references therein) is predominantly treating the docking...mations by various measures of surface complementarity which can be efficiently computed using fast Fourier correlation tech- niques (FFTs). However, when
Atlas - a data warehouse for integrative bioinformatics.
Shah, Sohrab P; Huang, Yong; Xu, Tao; Yuen, Macaire M S; Ling, John; Ouellette, B F Francis
2005-02-21
We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies. The goal of the system is to provide data, as well as a software infrastructure for bioinformatics research and development. The Atlas system is based on relational data models that we developed for each of the source data types. Data stored within these relational models are managed through Structured Query Language (SQL) calls that are implemented in a set of Application Programming Interfaces (APIs). The APIs include three languages: C++, Java, and Perl. The methods in these API libraries are used to construct a set of loader applications, which parse and load the source datasets into the Atlas database, and a set of toolbox applications which facilitate data retrieval. Atlas stores and integrates local instances of GenBank, RefSeq, UniProt, Human Protein Reference Database (HPRD), Biomolecular Interaction Network Database (BIND), Database of Interacting Proteins (DIP), Molecular Interactions Database (MINT), IntAct, NCBI Taxonomy, Gene Ontology (GO), Online Mendelian Inheritance in Man (OMIM), LocusLink, Entrez Gene and HomoloGene. The retrieval APIs and toolbox applications are critical components that offer end-users flexible, easy, integrated access to this data. We present use cases that use Atlas to integrate these sources for genome annotation, inference of molecular interactions across species, and gene-disease associations. The Atlas biological data warehouse serves as data infrastructure for bioinformatics research and development. It forms the backbone of the research activities in our laboratory and facilitates the integration of disparate, heterogeneous biological sources of data enabling new scientific inferences. Atlas achieves integration of diverse data sets at two levels. First, Atlas stores data of similar types using common data models, enforcing the relationships between data types. Second, integration is achieved through a combination of APIs, ontology, and tools. The Atlas software is freely available under the GNU General Public License at: http://bioinformatics.ubc.ca/atlas/
Atlas – a data warehouse for integrative bioinformatics
Shah, Sohrab P; Huang, Yong; Xu, Tao; Yuen, Macaire MS; Ling, John; Ouellette, BF Francis
2005-01-01
Background We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies. The goal of the system is to provide data, as well as a software infrastructure for bioinformatics research and development. Description The Atlas system is based on relational data models that we developed for each of the source data types. Data stored within these relational models are managed through Structured Query Language (SQL) calls that are implemented in a set of Application Programming Interfaces (APIs). The APIs include three languages: C++, Java, and Perl. The methods in these API libraries are used to construct a set of loader applications, which parse and load the source datasets into the Atlas database, and a set of toolbox applications which facilitate data retrieval. Atlas stores and integrates local instances of GenBank, RefSeq, UniProt, Human Protein Reference Database (HPRD), Biomolecular Interaction Network Database (BIND), Database of Interacting Proteins (DIP), Molecular Interactions Database (MINT), IntAct, NCBI Taxonomy, Gene Ontology (GO), Online Mendelian Inheritance in Man (OMIM), LocusLink, Entrez Gene and HomoloGene. The retrieval APIs and toolbox applications are critical components that offer end-users flexible, easy, integrated access to this data. We present use cases that use Atlas to integrate these sources for genome annotation, inference of molecular interactions across species, and gene-disease associations. Conclusion The Atlas biological data warehouse serves as data infrastructure for bioinformatics research and development. It forms the backbone of the research activities in our laboratory and facilitates the integration of disparate, heterogeneous biological sources of data enabling new scientific inferences. Atlas achieves integration of diverse data sets at two levels. First, Atlas stores data of similar types using common data models, enforcing the relationships between data types. Second, integration is achieved through a combination of APIs, ontology, and tools. The Atlas software is freely available under the GNU General Public License at: PMID:15723693
Serum fructosamine concentrations in dogs with hypothyroidism.
Reusch, C E; Gerber, B; Boretti, F S
2002-10-01
Serum fructosamine concentrations were measured in 11 untreated hypothyroid dogs with normal serum glucose and serum protein concentrations. The fructosamine level ranged between 276 and 441 micromol/L (median 376 micromol/L; reference range 207-340 micromol/L). Nine of the 11 dogs had fructosamine levels above the reference range. The fructosamine levels decreased significantly during treatment with levothyroxine. It is suggested that serum fructosamine concentrations may be high in hypothyroid dogs because of decelerated protein turnover, independent of the blood glucose concentration.
The yeast protein extract (RM8323) developed by National Institute of Standards and Technology (NIST) under the auspices of NCI's CPTC initiative is currently available to the public at https://www-s.nist.gov/srmors/view_detail.cfm?srm=8323. The yeast proteome offers researchers a unique biological reference material. RM8323 is the most extensively characterized complex biological proteome and the only one associated with several large-scale studies to estimate protein abundance across a wide concentration range.
Vanderperre, Benoît; Lucier, Jean-François; Bissonnette, Cyntia; Motard, Julie; Tremblay, Guillaume; Vanderperre, Solène; Wisztorski, Maxence; Salzet, Michel; Boisvert, François-Michel; Roucou, Xavier
2013-01-01
A fully mature mRNA is usually associated to a reference open reading frame encoding a single protein. Yet, mature mRNAs contain unconventional alternative open reading frames (AltORFs) located in untranslated regions (UTRs) or overlapping the reference ORFs (RefORFs) in non-canonical +2 and +3 reading frames. Although recent ribosome profiling and footprinting approaches have suggested the significant use of unconventional translation initiation sites in mammals, direct evidence of large-scale alternative protein expression at the proteome level is still lacking. To determine the contribution of alternative proteins to the human proteome, we generated a database of predicted human AltORFs revealing a new proteome mainly composed of small proteins with a median length of 57 amino acids, compared to 344 amino acids for the reference proteome. We experimentally detected a total of 1,259 alternative proteins by mass spectrometry analyses of human cell lines, tissues and fluids. In plasma and serum, alternative proteins represent up to 55% of the proteome and may be a potential unsuspected new source for biomarkers. We observed constitutive co-expression of RefORFs and AltORFs from endogenous genes and from transfected cDNAs, including tumor suppressor p53, and provide evidence that out-of-frame clones representing AltORFs are mistakenly rejected as false positive in cDNAs screening assays. Functional importance of alternative proteins is strongly supported by significant evolutionary conservation in vertebrates, invertebrates, and yeast. Our results imply that coding of multiple proteins in a single gene by the use of AltORFs may be a common feature in eukaryotes, and confirm that translation of unconventional ORFs generates an as yet unexplored proteome. PMID:23950983
Amyloidosis on the fingers (image)
Amyloidosis refers to the extracellular deposition of a protein called amyloid. This protein deposition can affect multiple ... other conditions. In this picture, we see how amyloidosis can affect the skin as nodular deposits on ...
Genetics Home Reference: Walker-Warburg syndrome
... also involved in development of this condition. The proteins produced from the genes listed above and others involved in Walker-Warburg syndrome modify a protein called alpha (α)-dystroglycan; this modification, called glycosylation, ...
Estimation of reference intervals from small samples: an example using canine plasma creatinine.
Geffré, A; Braun, J P; Trumel, C; Concordet, D
2009-12-01
According to international recommendations, reference intervals should be determined from at least 120 reference individuals, which often are impossible to achieve in veterinary clinical pathology, especially for wild animals. When only a small number of reference subjects is available, the possible bias cannot be known and the normality of the distribution cannot be evaluated. A comparison of reference intervals estimated by different methods could be helpful. The purpose of this study was to compare reference limits determined from a large set of canine plasma creatinine reference values, and large subsets of this data, with estimates obtained from small samples selected randomly. Twenty sets each of 120 and 27 samples were randomly selected from a set of 1439 plasma creatinine results obtained from healthy dogs in another study. Reference intervals for the whole sample and for the large samples were determined by a nonparametric method. The estimated reference limits for the small samples were minimum and maximum, mean +/- 2 SD of native and Box-Cox-transformed values, 2.5th and 97.5th percentiles by a robust method on native and Box-Cox-transformed values, and estimates from diagrams of cumulative distribution functions. The whole sample had a heavily skewed distribution, which approached Gaussian after Box-Cox transformation. The reference limits estimated from small samples were highly variable. The closest estimates to the 1439-result reference interval for 27-result subsamples were obtained by both parametric and robust methods after Box-Cox transformation but were grossly erroneous in some cases. For small samples, it is recommended that all values be reported graphically in a dot plot or histogram and that estimates of the reference limits be compared using different methods.
Ligand placement based on prior structures: the guided ligand-replacement method
DOE Office of Scientific and Technical Information (OSTI.GOV)
Klei, Herbert E.; Bristol-Myers Squibb, Princeton, NJ 08543-4000; Moriarty, Nigel W., E-mail: nwmoriarty@lbl.gov
2014-01-01
A new module, Guided Ligand Replacement (GLR), has been developed in Phenix to increase the ease and success rate of ligand placement when prior protein-ligand complexes are available. The process of iterative structure-based drug design involves the X-ray crystal structure determination of upwards of 100 ligands with the same general scaffold (i.e. chemotype) complexed with very similar, if not identical, protein targets. In conjunction with insights from computational models and assays, this collection of crystal structures is analyzed to improve potency, to achieve better selectivity and to reduce liabilities such as absorption, distribution, metabolism, excretion and toxicology. Current methods formore » modeling ligands into electron-density maps typically do not utilize information on how similar ligands bound in related structures. Even if the electron density is of sufficient quality and resolution to allow de novo placement, the process can take considerable time as the size, complexity and torsional degrees of freedom of the ligands increase. A new module, Guided Ligand Replacement (GLR), was developed in Phenix to increase the ease and success rate of ligand placement when prior protein–ligand complexes are available. At the heart of GLR is an algorithm based on graph theory that associates atoms in the target ligand with analogous atoms in the reference ligand. Based on this correspondence, a set of coordinates is generated for the target ligand. GLR is especially useful in two situations: (i) modeling a series of large, flexible, complicated or macrocyclic ligands in successive structures and (ii) modeling ligands as part of a refinement pipeline that can automatically select a reference structure. Even in those cases for which no reference structure is available, if there are multiple copies of the bound ligand per asymmetric unit GLR offers an efficient way to complete the model after the first ligand has been placed. In all of these applications, GLR leverages prior knowledge from earlier structures to facilitate ligand placement in the current structure.« less
Amuamuta, Asmare; Mulu, Wondemagegn; Yimer, Mulat; Zenebe, Yohannes; Adem, Yesuf; Abera, Bayeh; Gebeyehu, Wondemu; Gebregziabher, Yakob
2017-01-01
Background Reference interval is crucial for disease screening, diagnosis, monitoring, progression and treatment efficacy. Due to lack of locally derived reference values for the parameters, clinicians use reference intervals derived from western population. But, studies conducted in different African countries have indicated differences between locally and western derived reference values. Different studies also indicated considerable variation in clinical chemistry reference intervals by several variables such as age, sex, geographical location, environment, lifestyle and genetic variation. Objective This study aimed to determine the reference intervals of common clinical chemistry parameters of the community of Gojjam Zones, Northwest Ethiopia. Method Population based cross-sectional study was conducted from November 2015 to December 2016 in healthy adult populations of Gojjam zone. Data such as, medical history, physical examination and socio-demographic data were collected. In addition, laboratory investigations were undertaken to screen the population. Clinical chemistry parameters were measured using Mindray BS 200 clinical chemistry autoanalyzer as per the manufacturer’s instructions. Descriptive statistics was used to calculate mean, median and 95th percentiles. Independent sample T-test and one way ANOVA were used to see association between variables. Results After careful screening of a total of 799 apparently healthy adults who were consented for this study, complete data from 446 (224 females and 222 males) were included for the analysis. The mean age of both the study participants was 28.8 years. Males had high (P<0.05) mean and 2.5th-97.5th percentile ranges of ALT, AST, ALP, creatinine and direct bilirubin. The reference intervals of amylase, LDH, total protein and total bilirubin were not significantly different between the two sex groups (P>0.05). Mean, median, 95% percentile values of AST, ALP, amylase, LDH, creatinine, total protein, total bilirubin, and direct bilirubin across all age groups of participants were similar (P>0.05). But, there was a significant difference in the value of ALT (P<0.05). The reference intervals of ALT, total protein and creatinine were significantly (P<0.05) high in people having monthly income >1500 ETB compared to those with low monthly income. Significant (P<0.05) higher values of the ALT, ALP and total protein were observed in people living in high land compared to low land residences. Conclusion The study showed that some of the common clinical chemistry parameters reference intervals of healthy adults in Gojjam zones were higher than the reference intervals generated from developed countries. Therefore, strict adherence to the reference values generated in developed countries could lead to inappropriate diagnosis and treatment of patients. There was also variation of reference interval values based on climate, gender, age, monthly income and geographical locations. Therefore, further study is required to establish reference intervals for Ethiopian population. PMID:28886191
Reference hematologic and plasma chemistry values of brown tree snakes (Boiga irregularis).
Lamirande, E W; Bratthauer, A D; Fischer, D C; Nichols, D K
1999-12-01
Reference hematologic and plasma chemistry values were determined from 103 blood samples collected from 53 clinically healthy brown tree snakes (Boiga irregularis). Female snakes had significantly higher mean plasma values for total solids, total protein, calcium (Ca), phosphorus (P), uric acid, and blood monocyte percentage than did males, whereas males had significantly higher mean plasma fibrinogen values. The variances for hematocrit, monocyte percentage, azurophil percentage, plasma total solids, plasma total protein, albumin, Ca, and P also differed significantly between sexes. The higher mean values and greater variances for plasma total protein, plasma total solids, Ca, and P in the female snakes were probably associated with yolk synthesis and accumulation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sader, John E., E-mail: jsader@unimelb.edu.au; Friend, James R.
2015-05-15
Overall precision of the simplified calibration method in J. E. Sader et al., Rev. Sci. Instrum. 83, 103705 (2012), Sec. III D, is dominated by the spring constant of the reference cantilever. The question arises: How does one take measurements from multiple reference cantilevers, and combine these results, to improve uncertainty of the reference cantilever’s spring constant and hence the overall precision of the method? This question is addressed in this note. Its answer enables manufacturers to specify of a single set of data for the spring constant, resonant frequency, and quality factor, from measurements on multiple reference cantilevers. Withmore » this data set, users can trivially calibrate cantilevers of the same type.« less
Yan, Qian; Liu, Hou-Sheng; Yao, Dan; Li, Xin; Chen, Han; Dou, Yang; Wang, Yi; Pei, Yan; Xiao, Yue-Hua
2015-01-01
Basic/helix-loop-helix (bHLH) proteins comprise one of the largest transcription factor families and play important roles in diverse cellular and molecular processes. Comprehensive analyses of the composition and evolution of the bHLH family in cotton are essential to elucidate their functions and the molecular basis of cotton development. By searching bHLH homologous genes in sequenced diploid cotton genomes (Gossypium raimondii and G. arboreum), a set of cotton bHLH reference genes containing 289 paralogs were identified and named as GobHLH001-289. Based on their phylogenetic relationships, these cotton bHLH proteins were clustered into 27 subfamilies. Compared to those in Arabidopsis and cacao, cotton bHLH proteins generally increased in number, but unevenly in different subfamilies. To further uncover evolutionary changes of bHLH genes during tetraploidization of cotton, all genes of S5a and S5b subfamilies in upland cotton and its diploid progenitors were cloned and compared, and their transcript profiles were determined in upland cotton. A total of 10 genes of S5a and S5b subfamilies (doubled from A- and D-genome progenitors) maintained in tetraploid cottons. The major sequence changes in upland cotton included a 15-bp in-frame deletion in GhbHLH130D and a long terminal repeat retrotransposon inserted in GhbHLH062A, which eliminated GhbHLH062A expression in various tissues. The S5a and S5b bHLH genes of A and D genomes (except GobHLH062) showed similar transcription patterns in various tissues including roots, stems, leaves, petals, ovules, and fibers, while the A- and D-genome genes of GobHLH110 and GobHLH130 displayed clearly different transcript profiles during fiber development. In total, this study represented a genome-wide analysis of cotton bHLH family, and revealed significant changes in sequence and expression of these genes in tetraploid cottons, which paved the way for further functional analyses of bHLH genes in the cotton genus. PMID:25992947
Yan, Qian; Liu, Hou-Sheng; Yao, Dan; Li, Xin; Chen, Han; Dou, Yang; Wang, Yi; Pei, Yan; Xiao, Yue-Hua
2015-01-01
Basic/helix-loop-helix (bHLH) proteins comprise one of the largest transcription factor families and play important roles in diverse cellular and molecular processes. Comprehensive analyses of the composition and evolution of the bHLH family in cotton are essential to elucidate their functions and the molecular basis of cotton development. By searching bHLH homologous genes in sequenced diploid cotton genomes (Gossypium raimondii and G. arboreum), a set of cotton bHLH reference genes containing 289 paralogs were identified and named as GobHLH001-289. Based on their phylogenetic relationships, these cotton bHLH proteins were clustered into 27 subfamilies. Compared to those in Arabidopsis and cacao, cotton bHLH proteins generally increased in number, but unevenly in different subfamilies. To further uncover evolutionary changes of bHLH genes during tetraploidization of cotton, all genes of S5a and S5b subfamilies in upland cotton and its diploid progenitors were cloned and compared, and their transcript profiles were determined in upland cotton. A total of 10 genes of S5a and S5b subfamilies (doubled from A- and D-genome progenitors) maintained in tetraploid cottons. The major sequence changes in upland cotton included a 15-bp in-frame deletion in GhbHLH130D and a long terminal repeat retrotransposon inserted in GhbHLH062A, which eliminated GhbHLH062A expression in various tissues. The S5a and S5b bHLH genes of A and D genomes (except GobHLH062) showed similar transcription patterns in various tissues including roots, stems, leaves, petals, ovules, and fibers, while the A- and D-genome genes of GobHLH110 and GobHLH130 displayed clearly different transcript profiles during fiber development. In total, this study represented a genome-wide analysis of cotton bHLH family, and revealed significant changes in sequence and expression of these genes in tetraploid cottons, which paved the way for further functional analyses of bHLH genes in the cotton genus.
Ottaway, Josh; Farrell, Jeremy A; Kalivas, John H
2013-02-05
An essential part to calibration is establishing the analyte calibration reference samples. These samples must characterize the sample matrix and measurement conditions (chemical, physical, instrumental, and environmental) of any sample to be predicted. Calibration usually requires measuring spectra for numerous reference samples in addition to determining the corresponding analyte reference values. Both tasks are typically time-consuming and costly. This paper reports on a method named pure component Tikhonov regularization (PCTR) that does not require laboratory prepared or determined reference values. Instead, an analyte pure component spectrum is used in conjunction with nonanalyte spectra for calibration. Nonanalyte spectra can be from different sources including pure component interference samples, blanks, and constant analyte samples. The approach is also applicable to calibration maintenance when the analyte pure component spectrum is measured in one set of conditions and nonanalyte spectra are measured in new conditions. The PCTR method balances the trade-offs between calibration model shrinkage and the degree of orthogonality to the nonanalyte content (model direction) in order to obtain accurate predictions. Using visible and near-infrared (NIR) spectral data sets, the PCTR results are comparable to those obtained using ridge regression (RR) with reference calibration sets. The flexibility of PCTR also allows including reference samples if such samples are available.
Genetics Home Reference: Nager syndrome
... cousin of DNA that serves as a genetic blueprint for making proteins. The spliceosomes recognize and then ... mRNA molecules that are not used in the blueprint (which are called introns ). The SAP49 protein may ...
Genetics Home Reference: myosin storage myopathy
... proteins accumulate in type I skeletal muscle fibers, forming the protein clumps characteristic of the disorder. It ... Epub 2007 Mar 2. Citation on PubMed Tajsharghi H, Oldfors A. Myosinopathies: pathology and mechanisms. Acta Neuropathol. ...
Genetics Home Reference: autoimmune Addison disease
... is the most common form in developed countries, accounting for up to 90 percent of cases. Related ... HLA) complex . The HLA complex helps the immune system distinguish the body's own proteins from proteins made ...
Genetics Home Reference: glycoprotein VI deficiency
... protein called glycoprotein VI (GPVI). This protein is embedded in the outer membrane of blood cell fragments ... erythematosus (SLE). Autoimmune disorders occur when the immune system malfunctions and attacks the body's own cells and ...
Genetics Home Reference: MDA5 deficiency
... the protein recognizes a molecule called double-stranded RNA (a chemical cousin of DNA), which certain viruses, ... When the MDA5 protein recognizes pieces of viral RNA inside the cell, it helps turn on the ...
Paker, Ilgin; Matak, Kristen E
2016-01-15
Gelation conditions affect the setting of myofibrillar fish protein gels. Therefore the impact of widely applied pre-cooking gelation time/temperature strategies and post-cooking period on the texture and color of final protein gels was determined. Four pre-cooking gelation strategies (no setting time, 30 min at 25 °C, 1 h at 40 °C or 24 h at 4 °C) were applied to protein pastes (fish protein concentrate and standard functional additives). After cooking, texture and color were analyzed either directly or after 24 h at 4 °C on gels adjusted to 25 °C. No-set gels were harder, gummier and chewier (P < 0.05) when analyzed immediately after cooling; however, gel chewiness, cohesiveness and firmness indicated by Kramer force benefited from 24 h at 4 °C gel setting when stored post-cooking. Gel-setting conditions had a greater (P < 0.05) effect on texture when directly analyzed and most changes occurred in no-set gels. There were significant (P < 0.05) changes between directly analyzed and post-cooking stored gels in texture and color, depending on the pre-cooking gelation strategy. Pre-cooking gelation conditions will affect final protein gel texture and color, with gel stability benefiting from a gel-setting period. However, post-cooking storage may have a greater impact on final gels, with textural attributes becoming more consistent between all samples. © 2015 Society of Chemical Industry.
Stavri, Henriette; Bucurenci, Nadia; Ulea, Irina; Costache, Adriana; Popa, Loredana; Popa, Mircea Ioan
2012-11-01
Purified protein derivative (PPD) is currently the only available skin test reagent used worldwide for the diagnosis of tuberculosis (TB). The aim of this study was to develop a Mycobacterium tuberculosis specific skin test reagent, without false positive results due to Bacillus Calmette-Guerin (BCG) vaccination using recombinant antigens. Proteins in PPD IC-65 were analyzed by tandem mass spectrometry and compared to proteins in M. tuberculosis culture filtrate; 54 proteins were found in common. Top candidates MPT64, ESAT 6, and CFP 10 were overexpressed in Escherichia coli expression strains and purified as recombinant proteins. To formulate optimal immunodiagnostic PPD cocktails, the antigens were evaluated by skin testing guinea pigs sensitized with M. tuberculosis H37Rv and BCG. For single antigens and a cocktail mixture of these antigens, best results were obtained using 3 μg/0.1 ml, equivalent to 105 TU (tuberculin units). Each animal was simultaneously tested with PPD IC-65, 2 TU/0.1 ml, as reference. Reactivity of the multi-antigen cocktail was greater than that of any single antigen. The skin test results were between 34.3 and 76.6 per cent the level of reactivity compared to that of the reference when single antigens were tested and 124 per cent the level of reactivity compared to the reference for the multi-antigen cocktail. Our results showed that this specific cocktail could represent a potential candidate for a new skin diagnostic test for TB.
Clinical Chemistry Reference Intervals for C57BL/6J, C57BL/6N, and C3HeB/FeJ Mice (Mus musculus)
Otto, Gordon P; Rathkolb, Birgit; Oestereicher, Manuela A; Lengger, Christoph J; Moerth, Corinna; Micklich, Kateryna; Fuchs, Helmut; Gailus-Durner, Valérie; Wolf, Eckhard; de Angelis, Martin Hrabě
2016-01-01
Although various mouse inbred strains are widely used to investigate disease mechanisms and to establish new therapeutic strategies, sex-specific reference intervals for laboratory diagnostic analytes that are generated from large numbers of animals have been unavailable. In this retrospective study, we screened data from more than 12,000 mice phenotyped in the German Mouse Clinic from January 2006 through June 2014 and selected animals with the genetic background of C57BL/6J, C57BL/6N, or C3HeB/FeJ. In addition, we distinguished between the C57BL/6NTac substrain and C57BL/6N mice received from other vendors. The corresponding data sets of electrolytes (sodium, potassium, calcium, chloride, inorganic phosphate), lipids (cholesterol, triglyceride), and enzyme activities (ALT, AST, ALP, α-amylase) and urea, albumin, and total protein levels were analyzed. Significant effects of age and sex on these analytes were identified, and strain- or substrain- and sex-specific reference intervals for 90- to 135-d-old mice were calculated. In addition, we include an overview of the literature that reports clinical chemistry values for wild-type mice of different strains. Our results support researchers interpreting clinical chemistry values from various mouse mutants and corresponding wild-type controls based on the examined strains and substrains. PMID:27423143
Clinical Chemistry Reference Intervals for C57BL/6J, C57BL/6N, and C3HeB/FeJ Mice (Mus musculus).
Otto, Gordon P; Rathkolb, Birgit; Oestereicher, Manuela A; Lengger, Christoph J; Moerth, Corinna; Micklich, Kateryna; Fuchs, Helmut; Gailus-Durner, Valérie; Wolf, Eckhard; Hrabě de Angelis, Martin
2016-01-01
Although various mouse inbred strains are widely used to investigate disease mechanisms and to establish new therapeutic strategies, sex-specific reference intervals for laboratory diagnostic analytes that are generated from large numbers of animals have been unavailable. In this retrospective study, we screened data from more than 12,000 mice phenotyped in the German Mouse Clinic from January 2006 through June 2014 and selected animals with the genetic background of C57BL/6J, C57BL/6N, or C3HeB/FeJ. In addition, we distinguished between the C57BL/6NTac substrain and C57BL/6N mice received from other vendors. The corresponding data sets of electrolytes (sodium, potassium, calcium, chloride, inorganic phosphate), lipids (cholesterol, triglyceride), and enzyme activities (ALT, AST, ALP, α-amylase) and urea, albumin, and total protein levels were analyzed. Significant effects of age and sex on these analytes were identified, and strain- or substrain- and sex-specific reference intervals for 90- to 135-d-old mice were calculated. In addition, we include an overview of the literature that reports clinical chemistry values for wild-type mice of different strains. Our results support researchers interpreting clinical chemistry values from various mouse mutants and corresponding wild-type controls based on the examined strains and substrains.
Borowska, D; Rothwell, L; Bailey, R A; Watson, K; Kaiser, P
2016-02-01
Quantitative polymerase chain reaction (qPCR) is a powerful technique for quantification of gene expression, especially genes involved in immune responses. Although qPCR is a very efficient and sensitive tool, variations in the enzymatic efficiency, quality of RNA and the presence of inhibitors can lead to errors. Therefore, qPCR needs to be normalised to obtain reliable results and allow comparison. The most common approach is to use reference genes as internal controls in qPCR analyses. In this study, expression of seven genes, including β-actin (ACTB), β-2-microglobulin (B2M), glyceraldehyde-3-phosphate dehydrogenase (GAPDH), β-glucuronidase (GUSB), TATA box binding protein (TBP), α-tubulin (TUBAT) and 28S ribosomal RNA (r28S), was determined in cells isolated from chicken lymphoid tissues and stimulated with three different mitogens. The stability of the genes was measured using geNorm, NormFinder and BestKeeper software. The results from both geNorm and NormFinder were that the three most stably expressed genes in this panel were TBP, GAPDH and r28S. BestKeeper did not generate clear answers because of the highly heterogeneous sample set. Based on these data we will include TBP in future qPCR normalisation. The study shows the importance of appropriate reference gene normalisation in other tissues before qPCR analysis. Copyright © 2016 Elsevier B.V. All rights reserved.
Generating Ground Reference Data for a Global Impervious Surface Survey
NASA Technical Reports Server (NTRS)
Tilton, James C.; De Colstoun, Eric Brown; Wolfe, Robert E.; Tan, Bin; Huang, Chengquan
2012-01-01
We are developing an approach for generating ground reference data in support of a project to produce a 30m impervious cover data set of the entire Earth for the years 2000 and 2010 based on the Landsat Global Land Survey (GLS) data set. Since sufficient ground reference data for training and validation is not available from ground surveys, we are developing an interactive tool, called HSegLearn, to facilitate the photo-interpretation of 1 to 2 m spatial resolution imagery data, which we will use to generate the needed ground reference data at 30m. Through the submission of selected region objects and positive or negative examples of impervious surfaces, HSegLearn enables an analyst to automatically select groups of spectrally similar objects from a hierarchical set of image segmentations produced by the HSeg image segmentation program at an appropriate level of segmentation detail, and label these region objects as either impervious or nonimpervious.
NASA Astrophysics Data System (ADS)
Paszkiewicz, Zbigniew; Picard, Willy
Performance management (PM) is a key function of virtual organization (VO) management. A large set of PM indicators has been proposed and evaluated within the context of virtual breeding environments (VBEs). However, it is currently difficult to describe and select suitable PM indicators because of the lack of a common vocabulary and taxonomies of PM indicators. Therefore, there is a need for a framework unifying concepts in the domain of VO PM. In this paper, a reference model for VO PM is presented in the context of service-oriented VBEs. In the proposed reference model, both a set of terms that could be used to describe key performance indicators, and a set of taxonomies reflecting various aspects of PM are proposed. The proposed reference model is a first attempt and a work in progress that should not be supposed exhaustive.
Evolution of an Implementation-Ready Interprofessional Pain Assessment Reference Model
Collins, Sarah A; Bavuso, Karen; Swenson, Mary; Suchecki, Christine; Mar, Perry; Rocha, Roberto A.
2017-01-01
Standards to increase consistency of comprehensive pain assessments are important for safety, quality, and analytics activities, including meeting Joint Commission requirements and learning the best management strategies and interventions for the current prescription Opioid epidemic. In this study we describe the development and validation of a Pain Assessment Reference Model ready for implementation on EHR forms and flowsheets. Our process resulted in 5 successive revisions of the reference model, which more than doubled the number of data elements to 47. The organization of the model evolved during validation sessions with panels totaling 48 subject matter experts (SMEs) to include 9 sets of data elements, with one set recommended as a minimal data set. The reference model also evolved when implemented into EHR forms and flowsheets, indicating specifications such as cascading logic that are important to inform secondary use of data. PMID:29854125
Yugandhar, K; Gromiha, M Michael
2014-09-01
Protein-protein interactions are intrinsic to virtually every cellular process. Predicting the binding affinity of protein-protein complexes is one of the challenging problems in computational and molecular biology. In this work, we related sequence features of protein-protein complexes with their binding affinities using machine learning approaches. We set up a database of 185 protein-protein complexes for which the interacting pairs are heterodimers and their experimental binding affinities are available. On the other hand, we have developed a set of 610 features from the sequences of protein complexes and utilized Ranker search method, which is the combination of Attribute evaluator and Ranker method for selecting specific features. We have analyzed several machine learning algorithms to discriminate protein-protein complexes into high and low affinity groups based on their Kd values. Our results showed a 10-fold cross-validation accuracy of 76.1% with the combination of nine features using support vector machines. Further, we observed accuracy of 83.3% on an independent test set of 30 complexes. We suggest that our method would serve as an effective tool for identifying the interacting partners in protein-protein interaction networks and human-pathogen interactions based on the strength of interactions. © 2014 Wiley Periodicals, Inc.
A Scalable Approach for Protein False Discovery Rate Estimation in Large Proteomic Data Sets.
Savitski, Mikhail M; Wilhelm, Mathias; Hahne, Hannes; Kuster, Bernhard; Bantscheff, Marcus
2015-09-01
Calculating the number of confidently identified proteins and estimating false discovery rate (FDR) is a challenge when analyzing very large proteomic data sets such as entire human proteomes. Biological and technical heterogeneity in proteomic experiments further add to the challenge and there are strong differences in opinion regarding the conceptual validity of a protein FDR and no consensus regarding the methodology for protein FDR determination. There are also limitations inherent to the widely used classic target-decoy strategy that particularly show when analyzing very large data sets and that lead to a strong over-representation of decoy identifications. In this study, we investigated the merits of the classic, as well as a novel target-decoy-based protein FDR estimation approach, taking advantage of a heterogeneous data collection comprised of ∼19,000 LC-MS/MS runs deposited in ProteomicsDB (https://www.proteomicsdb.org). The "picked" protein FDR approach treats target and decoy sequences of the same protein as a pair rather than as individual entities and chooses either the target or the decoy sequence depending on which receives the highest score. We investigated the performance of this approach in combination with q-value based peptide scoring to normalize sample-, instrument-, and search engine-specific differences. The "picked" target-decoy strategy performed best when protein scoring was based on the best peptide q-value for each protein yielding a stable number of true positive protein identifications over a wide range of q-value thresholds. We show that this simple and unbiased strategy eliminates a conceptual issue in the commonly used "classic" protein FDR approach that causes overprediction of false-positive protein identification in large data sets. The approach scales from small to very large data sets without losing performance, consistently increases the number of true-positive protein identifications and is readily implemented in proteomics analysis software. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.
A Scalable Approach for Protein False Discovery Rate Estimation in Large Proteomic Data Sets
Savitski, Mikhail M.; Wilhelm, Mathias; Hahne, Hannes; Kuster, Bernhard; Bantscheff, Marcus
2015-01-01
Calculating the number of confidently identified proteins and estimating false discovery rate (FDR) is a challenge when analyzing very large proteomic data sets such as entire human proteomes. Biological and technical heterogeneity in proteomic experiments further add to the challenge and there are strong differences in opinion regarding the conceptual validity of a protein FDR and no consensus regarding the methodology for protein FDR determination. There are also limitations inherent to the widely used classic target–decoy strategy that particularly show when analyzing very large data sets and that lead to a strong over-representation of decoy identifications. In this study, we investigated the merits of the classic, as well as a novel target–decoy-based protein FDR estimation approach, taking advantage of a heterogeneous data collection comprised of ∼19,000 LC-MS/MS runs deposited in ProteomicsDB (https://www.proteomicsdb.org). The “picked” protein FDR approach treats target and decoy sequences of the same protein as a pair rather than as individual entities and chooses either the target or the decoy sequence depending on which receives the highest score. We investigated the performance of this approach in combination with q-value based peptide scoring to normalize sample-, instrument-, and search engine-specific differences. The “picked” target–decoy strategy performed best when protein scoring was based on the best peptide q-value for each protein yielding a stable number of true positive protein identifications over a wide range of q-value thresholds. We show that this simple and unbiased strategy eliminates a conceptual issue in the commonly used “classic” protein FDR approach that causes overprediction of false-positive protein identification in large data sets. The approach scales from small to very large data sets without losing performance, consistently increases the number of true-positive protein identifications and is readily implemented in proteomics analysis software. PMID:25987413
Port, Sarah A; Mendes, Adélia; Valkova, Christina; Spillner, Christiane; Fahrenkrog, Birthe; Kaether, Christoph; Kehlenbach, Ralph H
2016-10-28
Genetic rearrangements are a hallmark of several forms of leukemia and can lead to oncogenic fusion proteins. One example of an affected chromosomal region is the gene coding for Nup214, a nucleoporin that localizes to the cytoplasmic side of the nuclear pore complex (NPC). We investigated two such fusion proteins, SET-Nup214 and SQSTM1 (sequestosome)-Nup214, both containing C-terminal portions of Nup214. SET-Nup214 nuclear bodies containing the nuclear export receptor CRM1 were observed in the leukemia cell lines LOUCY and MEGAL. Overexpression of SET-Nup214 in HeLa cells leads to the formation of similar nuclear bodies that recruit CRM1, export cargo proteins, and certain nucleoporins and concomitantly affect nuclear protein and poly(A) + RNA export. SQSTM1-Nup214, although mostly cytoplasmic, also forms nuclear bodies and inhibits nuclear protein but not poly(A) + RNA export. The interaction of the fusion proteins with CRM1 is RanGTP-dependent, as shown in co-immunoprecipitation experiments and binding assays. Further analysis revealed that the Nup214 parts mediate the inhibition of nuclear export, whereas the SET or SQSTM1 part determines the localization of the fusion protein and therefore the extent of the effect. SET-Nup214 nuclear bodies are highly mobile structures, which are in equilibrium with the nucleoplasm in interphase and disassemble during mitosis or upon treatment of cells with the CRM1-inhibitor leptomycin B. Strikingly, we found that nucleoporins can be released from nuclear bodies and reintegrated into existing NPC. Our results point to nuclear bodies as a means of preventing the formation of potentially insoluble and harmful protein aggregates that also may serve as storage compartments for nuclear transport factors. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.
Port, Sarah A.; Mendes, Adélia; Valkova, Christina; Spillner, Christiane; Fahrenkrog, Birthe; Kaether, Christoph; Kehlenbach, Ralph H.
2016-01-01
Genetic rearrangements are a hallmark of several forms of leukemia and can lead to oncogenic fusion proteins. One example of an affected chromosomal region is the gene coding for Nup214, a nucleoporin that localizes to the cytoplasmic side of the nuclear pore complex (NPC). We investigated two such fusion proteins, SET-Nup214 and SQSTM1 (sequestosome)-Nup214, both containing C-terminal portions of Nup214. SET-Nup214 nuclear bodies containing the nuclear export receptor CRM1 were observed in the leukemia cell lines LOUCY and MEGAL. Overexpression of SET-Nup214 in HeLa cells leads to the formation of similar nuclear bodies that recruit CRM1, export cargo proteins, and certain nucleoporins and concomitantly affect nuclear protein and poly(A)+ RNA export. SQSTM1-Nup214, although mostly cytoplasmic, also forms nuclear bodies and inhibits nuclear protein but not poly(A)+ RNA export. The interaction of the fusion proteins with CRM1 is RanGTP-dependent, as shown in co-immunoprecipitation experiments and binding assays. Further analysis revealed that the Nup214 parts mediate the inhibition of nuclear export, whereas the SET or SQSTM1 part determines the localization of the fusion protein and therefore the extent of the effect. SET-Nup214 nuclear bodies are highly mobile structures, which are in equilibrium with the nucleoplasm in interphase and disassemble during mitosis or upon treatment of cells with the CRM1-inhibitor leptomycin B. Strikingly, we found that nucleoporins can be released from nuclear bodies and reintegrated into existing NPC. Our results point to nuclear bodies as a means of preventing the formation of potentially insoluble and harmful protein aggregates that also may serve as storage compartments for nuclear transport factors. PMID:27613868
Liu, Juntai; Friebe, Vincent M.; Swainsbury, David J. K.; Crouch, Lucy I.; Szabo, David A.; Frese, Raoul N.
2018-01-01
Reaction centre/light harvesting proteins such as the RCLH1X complex from Rhodobacter sphaeroides carry out highly quantum-efficient conversion of solar energy through ultrafast energy transfer and charge separation, and these pigment-proteins have been incorporated into biohybrid photoelectrochemical cells for a variety of applications. In this work we demonstrate that, despite not being able to support normal photosynthetic growth of Rhodobacter sphaeroides, an engineered variant of this RCLH1X complex lacking the PufX protein and with an enlarged light harvesting antenna is unimpaired in its capacity for photocurrent generation in two types of bio-photoelectrochemical cells. Removal of PufX also did not impair the ability of the RCLH1 complex to act as an acceptor of energy from synthetic light harvesting quantum dots. Unexpectedly, the removal of PufX led to a marked improvement in the overall stability of the RCLH1 complex under heat stress. We conclude that PufX-deficient RCLH1 complexes are fully functional in solar energy conversion in a device setting and that their enhanced structural stability could make them a preferred choice over their native PufX-containing counterpart. Our findings on the competence of RCLH1 complexes for light energy conversion in vitro are discussed with reference to the reason why these PufX-deficient proteins are not capable of light energy conversion in vivo. PMID:29364305
Analysis of Gene Regulatory Networks of Maize in Response to Nitrogen.
Jiang, Lu; Ball, Graham; Hodgman, Charlie; Coules, Anne; Zhao, Han; Lu, Chungui
2018-03-08
Nitrogen (N) fertilizer has a major influence on the yield and quality. Understanding and optimising the response of crop plants to nitrogen fertilizer usage is of central importance in enhancing food security and agricultural sustainability. In this study, the analysis of gene regulatory networks reveals multiple genes and biological processes in response to N. Two microarray studies have been used to infer components of the nitrogen-response network. Since they used different array technologies, a map linking the two probe sets to the maize B73 reference genome has been generated to allow comparison. Putative Arabidopsis homologues of maize genes were used to query the Biological General Repository for Interaction Datasets (BioGRID) network, which yielded the potential involvement of three transcription factors (TFs) (GLK5, MADS64 and bZIP108) and a Calcium-dependent protein kinase. An Artificial Neural Network was used to identify influential genes and retrieved bZIP108 and WRKY36 as significant TFs in both microarray studies, along with genes for Asparagine Synthetase, a dual-specific protein kinase and a protein phosphatase. The output from one study also suggested roles for microRNA (miRNA) 399b and Nin-like Protein 15 (NLP15). Co-expression-network analysis of TFs with closely related profiles to known Nitrate-responsive genes identified GLK5, GLK8 and NLP15 as candidate regulators of genes repressed under low Nitrogen conditions, while bZIP108 might play a role in gene activation.
Device and Method for Gathering Ensemble Data Sets
NASA Technical Reports Server (NTRS)
Racette, Paul E. (Inventor)
2014-01-01
An ensemble detector uses calibrated noise references to produce ensemble sets of data from which properties of non-stationary processes may be extracted. The ensemble detector comprising: a receiver; a switching device coupled to the receiver, the switching device configured to selectively connect each of a plurality of reference noise signals to the receiver; and a gain modulation circuit coupled to the receiver and configured to vary a gain of the receiver based on a forcing signal; whereby the switching device selectively connects each of the plurality of reference noise signals to the receiver to produce an output signal derived from the plurality of reference noise signals and the forcing signal.
Phillips, Melissa M; Bedner, Mary; Reitz, Manuela; Burdette, Carolyn Q; Nelson, Michael A; Yen, James H; Sander, Lane C; Rimmer, Catherine A
2017-02-01
Two independent analytical approaches, based on liquid chromatography with absorbance detection and liquid chromatography with mass spectrometric detection, have been developed for determination of isoflavones in soy materials. These two methods yield comparable results for a variety of soy-based foods and dietary supplements. Four Standard Reference Materials (SRMs) have been produced by the National Institute of Standards and Technology to assist the food and dietary supplement community in method validation and have been assigned values for isoflavone content using both methods. These SRMs include SRM 3234 Soy Flour, SRM 3236 Soy Protein Isolate, SRM 3237 Soy Protein Concentrate, and SRM 3238 Soy-Containing Solid Oral Dosage Form. A fifth material, SRM 3235 Soy Milk, was evaluated using the methods and found to be inhomogeneous for isoflavones and unsuitable for value assignment. Graphical Abstract Separation of six isoflavone aglycones and glycosides found in Standard Reference Material (SRM) 3236 Soy Protein Isolate.
Perceptual quality estimation of H.264/AVC videos using reduced-reference and no-reference models
NASA Astrophysics Data System (ADS)
Shahid, Muhammad; Pandremmenou, Katerina; Kondi, Lisimachos P.; Rossholm, Andreas; Lövström, Benny
2016-09-01
Reduced-reference (RR) and no-reference (NR) models for video quality estimation, using features that account for the impact of coding artifacts, spatio-temporal complexity, and packet losses, are proposed. The purpose of this study is to analyze a number of potentially quality-relevant features in order to select the most suitable set of features for building the desired models. The proposed sets of features have not been used in the literature and some of the features are used for the first time in this study. The features are employed by the least absolute shrinkage and selection operator (LASSO), which selects only the most influential of them toward perceptual quality. For comparison, we apply feature selection in the complete feature sets and ridge regression on the reduced sets. The models are validated using a database of H.264/AVC encoded videos that were subjectively assessed for quality in an ITU-T compliant laboratory. We infer that just two features selected by RR LASSO and two bitstream-based features selected by NR LASSO are able to estimate perceptual quality with high accuracy, higher than that of ridge, which uses more features. The comparisons with competing works and two full-reference metrics also verify the superiority of our models.
Selection of antifungal protein-producing molds from dry-cured meat products.
Acosta, Raquel; Rodríguez-Martín, Andrea; Martín, Alberto; Núñez, Félix; Asensio, Miguel A
2009-09-30
To control unwanted molds in dry-cured meats it is necessary to allow the fungal development essential for the desired characteristics of the final product. Molds producing antifungal proteins could be useful to prevent hazards due to the growth of mycotoxigenic molds. The objective has been to select Penicillium spp. that produce antifungal proteins against toxigenic molds. To obtain strains adapted to these products, molds were isolated from dry-cured ham. A first screening with 281 isolates by the radial inhibition assay revealed that 166 were active against some of the toxigenic P. echinulatum, P. commune, and Aspergillusniger used as reference molds. The activity of different extracts from cultured medium was evaluated by a microspectroscopic assay. Molds producing active chloroform extracts were eliminated from further consideration. A total of 16 Penicillium isolates were screened for antifungal activity from both cell-free media and the aqueous residues obtained after chloroform extraction. The cell-free media of 10 isolates that produced a strong inhibition of the three reference molds were fractionated by FPLC on a cationic column. For protein purification, the fractions of the three molds that showed high inhibitory activity were further chromatographed on a gel filtration column, and the subfractions containing the highest absorbance peaks were assayed against the most sensitive reference molds. One subfraction each from strains AS51D and RP42C from Penicilliumchrysogenum confirmed the inhibitory activity against the reference molds. SDS-PAGE revealed a single band from each subfraction, with estimated molecular masses of 37kDa for AS51D and 9kDa for RP42C. Although further characterisation is required, both these proteins and the producing strains can be of interest to control unwanted molds on foods.
Twomey, Michèle; Wallis, Lee A; Myers, Jonathan E
2014-07-01
To evaluate the construct of triage acuity as measured by the South African Triage Scale (SATS) against a set of reference vignettes. A modified Delphi method was used to develop a set of reference vignettes. Delphi participants completed a 2-round consensus-building process, and independently assigned triage acuity ratings to 100 written vignettes unaware of the ratings given by others. Triage acuity ratings were summarised for all vignettes, and only those that reached 80% consensus during round 2 were included in the reference set. Triage ratings for the reference vignettes given by two independent experts using the SATS were compared with the ratings given by the international Delphi panel. Measures of sensitivity, specificity, associated percentages for over-triage/under-triage were used to evaluate the construct of triage acuity (as measured by the SATS) by examining the association between the ratings by the two experts and the international panel. On completion of the Delphi process, 42 of the 100 vignettes reached 80% consensus on their acuity rating and made up the reference set. On average, over all acuity levels, sensitivity was 74% (CI 64% to 82%), specificity 92% (CI 87% to 94%), under-triage occurred 14% (CI 8% to 23%) and over-triage 12% (CI 8% to 23%) of the time. The results of this study provide an alternative to evaluating triage scales against the construct of acuity as measured with the SATS. This method of using 80% consensus vignettes may, however, systematically bias the validity estimate towards better performance. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Rideout, Jai Ram; He, Yan; Navas-Molina, Jose A; Walters, William A; Ursell, Luke K; Gibbons, Sean M; Chase, John; McDonald, Daniel; Gonzalez, Antonio; Robbins-Pianka, Adam; Clemente, Jose C; Gilbert, Jack A; Huse, Susan M; Zhou, Hong-Wei; Knight, Rob; Caporaso, J Gregory
2014-01-01
We present a performance-optimized algorithm, subsampled open-reference OTU picking, for assigning marker gene (e.g., 16S rRNA) sequences generated on next-generation sequencing platforms to operational taxonomic units (OTUs) for microbial community analysis. This algorithm provides benefits over de novo OTU picking (clustering can be performed largely in parallel, reducing runtime) and closed-reference OTU picking (all reads are clustered, not only those that match a reference database sequence with high similarity). Because more of our algorithm can be run in parallel relative to "classic" open-reference OTU picking, it makes open-reference OTU picking tractable on massive amplicon sequence data sets (though on smaller data sets, "classic" open-reference OTU clustering is often faster). We illustrate that here by applying it to the first 15,000 samples sequenced for the Earth Microbiome Project (1.3 billion V4 16S rRNA amplicons). To the best of our knowledge, this is the largest OTU picking run ever performed, and we estimate that our new algorithm runs in less than 1/5 the time than would be required of "classic" open reference OTU picking. We show that subsampled open-reference OTU picking yields results that are highly correlated with those generated by "classic" open-reference OTU picking through comparisons on three well-studied datasets. An implementation of this algorithm is provided in the popular QIIME software package, which uses uclust for read clustering. All analyses were performed using QIIME's uclust wrappers, though we provide details (aided by the open-source code in our GitHub repository) that will allow implementation of subsampled open-reference OTU picking independently of QIIME (e.g., in a compiled programming language, where runtimes should be further reduced). Our analyses should generalize to other implementations of these OTU picking algorithms. Finally, we present a comparison of parameter settings in QIIME's OTU picking workflows and make recommendations on settings for these free parameters to optimize runtime without reducing the quality of the results. These optimized parameters can vastly decrease the runtime of uclust-based OTU picking in QIIME.
Assessment of the Accuracy of the Bethe-Salpeter (BSE/GW) Oscillator Strengths.
Jacquemin, Denis; Duchemin, Ivan; Blondel, Aymeric; Blase, Xavier
2016-08-09
Aiming to assess the accuracy of the oscillator strengths determined at the BSE/GW level, we performed benchmark calculations using three complementary sets of molecules. In the first, we considered ∼80 states in Thiel's set of compounds and compared the BSE/GW oscillator strengths to recently determined ADC(3/2) and CC3 reference values. The second set includes the oscillator strengths of the low-lying states of 80 medium to large dyes for which we have determined CC2/aug-cc-pVTZ values. The third set contains 30 anthraquinones for which experimental oscillator strengths are available. We find that BSE/GW accurately reproduces the trends for all series with excellent correlation coefficients to the benchmark data and generally very small errors. Indeed, for Thiel's sets, the BSE/GW values are more accurate (using CC3 references) than both CC2 and ADC(3/2) values on both absolute and relative scales. For all three sets, BSE/GW errors also tend to be nicely spread with almost equal numbers of positive and negative deviations as compared to reference values.
Clark, Samuel A; Hickey, John M; Daetwyler, Hans D; van der Werf, Julius H J
2012-02-09
The theory of genomic selection is based on the prediction of the effects of genetic markers in linkage disequilibrium with quantitative trait loci. However, genomic selection also relies on relationships between individuals to accurately predict genetic value. This study aimed to examine the importance of information on relatives versus that of unrelated or more distantly related individuals on the estimation of genomic breeding values. Simulated and real data were used to examine the effects of various degrees of relationship on the accuracy of genomic selection. Genomic Best Linear Unbiased Prediction (gBLUP) was compared to two pedigree based BLUP methods, one with a shallow one generation pedigree and the other with a deep ten generation pedigree. The accuracy of estimated breeding values for different groups of selection candidates that had varying degrees of relationships to a reference data set of 1750 animals was investigated. The gBLUP method predicted breeding values more accurately than BLUP. The most accurate breeding values were estimated using gBLUP for closely related animals. Similarly, the pedigree based BLUP methods were also accurate for closely related animals, however when the pedigree based BLUP methods were used to predict unrelated animals, the accuracy was close to zero. In contrast, gBLUP breeding values, for animals that had no pedigree relationship with animals in the reference data set, allowed substantial accuracy. An animal's relationship to the reference data set is an important factor for the accuracy of genomic predictions. Animals that share a close relationship to the reference data set had the highest accuracy from genomic predictions. However a baseline accuracy that is driven by the reference data set size and the overall population effective population size enables gBLUP to estimate a breeding value for unrelated animals within a population (breed), using information previously ignored by pedigree based BLUP methods.
A novel method for purification of the endogenously expressed fission yeast Set2 complex.
Suzuki, Shota; Nagao, Koji; Obuse, Chikashi; Murakami, Yota; Takahata, Shinya
2014-05-01
Chromatin-associated proteins are heterogeneously and dynamically composed. To gain a complete understanding of DNA packaging and basic nuclear functions, it is important to generate a comprehensive inventory of these proteins. However, biochemical purification of chromatin-associated proteins is difficult and is accompanied by concerns over complex stability, protein solubility and yield. Here, we describe a new method for optimized purification of the endogenously expressed fission yeast Set2 complex, histone H3K36 methyltransferase. Using the standard centrifugation procedure for purification, approximately half of the Set2 protein separated into the insoluble chromatin pellet fraction, making it impossible to recover the large amounts of soluble Set2. To overcome this poor recovery, we developed a novel protein purification technique termed the filtration/immunoaffinity purification/mass spectrometry (FIM) method, which eliminates the need for centrifugation. Using the FIM method, in which whole cell lysates were filtered consecutively through eight different pore sizes (53-0.8μm), a high yield of soluble FLAG-tagged Set2 was obtained from fission yeast. The technique was suitable for affinity purification and produced a low background. A mass spectrometry analysis of anti-FLAG immunoprecipitated proteins revealed that Rpb1, Rpb2 and Rpb3, which have all been reported previously as components of the budding yeast Set2 complex, were isolated from fission yeast using the FIM method. In addition, other subunits of RNA polymerase II and its phosphatase were also identified. In conclusion, the FIM method is valid for the efficient purification of protein complexes that separate into the insoluble chromatin pellet fraction during centrifugation. Copyright © 2014 Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Xu; Zhou, Jianying; Chin, Mark H
2010-02-15
Parkinson’s disease (PD) is characterized by dopaminergic neurodegeneration in the nigrostriatal region of the brain; however, the neurodegeneration extends well beyond dopaminergic neurons. To gain a better understanding of the molecular changes relevant to PD, we applied two-dimensional LC-MS/MS to comparatively analyze the proteome changes in four brain regions (striatum, cerebellum, cortex, and the rest of brain) using a MPTP-induced PD mouse model with the objective to identify nigrostriatal-specific and other region-specific protein abundance changes. The combined analyses resulted in the identification of 4,895 non-redundant proteins with at least two unique peptides per protein. The relative abundance changes in eachmore » analyzed brain region were estimated based on the spectral count information. A total of 518 proteins were observed with significant MPTP-induced changes across different brain regions. 270 of these proteins were observed with specific changes occurring either only in the striatum and/or in the rest of the brain region that contains substantia nigra, suggesting that these proteins are associated with the underlying nigrostriatal pathways. Many of the proteins that exhibit significant abundance changes were associated with dopamine signaling, mitochondrial dysfunction, the ubiquitin system, calcium signaling, the oxidative stress response, and apoptosis. A set of proteins with either consistent change across all brain regions or with changes specific to the cortex and cerebellum regions were also detected. One of the interesting proteins is ubiquitin specific protease (USP9X), a deubiquination enzyme involved in the protection of proteins from degradation and promotion of the TGF-β pathway, which exhibited altered abundances in all brain regions. Western blot validation showed similar spatial changes, suggesting that USP9X is potentially associated with neurodegeneration. Together, this study for the first time presents an overall picture of proteome changes underlying both nigrostriatal pathways and other brain regions potentially involved in MPTP-induced neurodegeneration. The observed molecular changes provide a valuable reference resource for future hypothesis-driven functional studies of PD.« less
Mardassi, H; Athanassious, R; Mounir, S; Dea, S
1994-01-01
Cytolytic and noncytolytic strains of the porcine reproductive and respiratory syndrome virus (PRRSV) were isolated in primary cultures of porcine alveolar macrophages (PAM) from lung homogenates of stillborn fetuses or blood samples of dyspneic piglets collected from Quebec pig farms having experienced acute or chronic outbreaks of PRRS. Serological identification of the virus was confirmed by indirect immunofluorescence and indirect protein A-gold immunoelectron microscopy using reference antiserum prepared from experimentally-infected specific pathogen free (SPF) piglets and monoclonal antibodies (MoAbs) directed against the p15 nucleocapsid (N) protein of the reference ATCC-VR2332 isolate. Intracytoplasmic enveloped viral particles that tended to accumulate into cytoplasmic vesicles were observed in the infected PAM; no budding was demonstrated at the level of the cytoplasmic membrane. The extracellular virions appeared as pleomorphic but mostly spherical enveloped particles, 50-72 nm in diameter (averaged diameter of 50 particles was 58.3 nm), with an isometric core about 25-30 nm. Buoyant density of the virus in CsCL density gradients was estimated to 1.18-1.20 g/mL. No hemagglutinating activity was demonstrated. Analysis of semipurified virions of isolate IAF-exp91 by radioimmunoprecipitation (RIPA) and Western immunoblotting experiments, using reference rabbit and porcine hyperimmune sera, revealed four major viral proteins, a predominant 15 kD N protein and three other proteins with predicted M(r_ of 19, 26 and 42 kD. Progeny viral particles produced in PRRSV-infected PAM in the presence of tunicamycin lacked the 42 kD protein, thus confirming its N-glycosylated nature. Immunoprecipitation experiments using the anti-ATCC-VR2332 MoAbs confirmed the close antigenic relationships between Quebec and American reference isolates of PRRSV. Images Fig. 1. Fig. 2. Fig. 3. Fig. 4. Fig. 5. Fig. 6. PMID:8143254
Genetics Home Reference: benign recurrent intrahepatic cholestasis
... for making a protein called the bile salt export pump (BSEP). This protein is found in the ... progressive and benign mutations of the bile salt export pump (Bsep/Abcb11) correlate with severity of cholestatic ...
Genetics Home Reference: CLN6 disease
... is a severe reduction in the amount of functional CLN6 protein in cells. While it is not ... suggests that these CLN6 gene mutations allow enough functional protein to be produced so that signs and ...
Genetics Home Reference: DOLK-congenital disorder of glycosylation
... called glycosylation, which attaches groups of sugar molecules (oligosaccharides) to proteins. Glycosylation changes proteins in ways that ... to dolichol phosphate in order to build the oligosaccharide chain. Once the chain is formed, dolichol phosphate ...
Genetics Home Reference: familial cylindromatosis
... instructions for making a protein that helps regulate nuclear factor-kappa-B. Nuclear factor-kappa-B is a group of related ... to certain signals. In regulating the action of nuclear factor-kappa-B, the CYLD protein allows cells ...
Genetics Home Reference: lateral meningocele syndrome
... meningocele syndrome is caused by mutations in the NOTCH3 gene. This gene provides instructions for making a ... from the outer surface of the cell. The NOTCH3 protein is called a receptor protein because certain ...
Genetics Home Reference: DICER1 syndrome
... called microRNA (miRNA). MicroRNA is a type of RNA, a chemical cousin of DNA, that attaches to a protein's blueprint (a molecule called messenger RNA) and blocks the production of proteins from it. ...
Genetics Home Reference: 5q31.3 microdeletion syndrome
... of the characteristic features of the condition. The protein produced from the PURA gene, called Pur-alpha ( ... aiding in the copying (replication) of DNA. This protein is especially important for normal brain development; it ...
Elshehawi, Waleed; Alsaffar, Hani; Roberts, Graham; Lucas, Victoria; McDonald, Fraser; Camilleri, Simon
2016-04-01
The purpose of this study was to develop and validate a Reference Data Set for Dental Age Assessment of the Maltese population and compare the mean Age of Attainment to a UK Caucasian Reference Data Set. The Maltese Reference Data Set was developed from 1593 Dental Panoramic Tomograms of patients aged between 4 and 26 years, taken from the radiographic archives of the Dental Department, Mater Dei Hospital, Malta. Tooth Development Stages were recorded for all 16 maxillary and mandibular permanent teeth on the left side and both permanent third molars on the right, according to Demirjian's staging method. Summary and percentile data were calculated for each Tooth Development Stage, including the mean Age of Attainment. These means were used to estimate the Dental Age of each subject in the study sample using the simple unweighted average method. The estimated Dental Age was compared to the gold standard of the Chronological Age. Comparison of the Maltese and UK Caucasian Reference Data Set was by a series of t-tests, carried out for each paired Tooth Development Stage by gender. The mean Age of Attainment was slightly higher for the Maltese than the UK Caucasians in both males and females. However there was no statistically significant difference between the Chronological Age and Dental Age for either sex. Copyright © 2016 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
Hart, Andrew; Cortés, María Paz; Latorre, Mauricio; Martinez, Servet
2018-01-01
The analysis of codon usage bias has been widely used to characterize different communities of microorganisms. In this context, the aim of this work was to study the codon usage bias in a natural consortium of five acidophilic bacteria used for biomining. The codon usage bias of the consortium was contrasted with genes from an alternative collection of acidophilic reference strains and metagenome samples. Results indicate that acidophilic bacteria preferentially have low codon usage bias, consistent with both their capacity to live in a wide range of habitats and their slow growth rate, a characteristic probably acquired independently from their phylogenetic relationships. In addition, the analysis showed significant differences in the unique sets of genes from the autotrophic species of the consortium in relation to other acidophilic organisms, principally in genes which code for proteins involved in metal and oxidative stress resistance. The lower values of codon usage bias obtained in this unique set of genes suggest higher transcriptional adaptation to living in extreme conditions, which was probably acquired as a measure for resisting the elevated metal conditions present in the mine.
Ivens, Katherine O; Baumert, Joseph L; Taylor, Steve L
2016-07-01
Numerous commercial enzyme-linked immunosorbent assay (ELISA) kits exist to quantitatively detect bovine milk residues in foods. Milk contains many proteins that can serve as ELISA targets including caseins (α-, β-, or κ-casein) and whey proteins (α-lactalbumin or β-lactoglobulin). Nine commercially-available milk ELISA kits were selected to compare the specificity and sensitivity with 5 purified milk proteins and 3 milk-derived ingredients. All of the milk kits were capable of quantifying nonfat dry milk (NFDM), but did not necessarily detect all individual protein fractions. While milk-derived ingredients were detected by the kits, their quantitation may be inaccurate due to the use of different calibrators, reference materials, and antibodies in kit development. The establishment of a standard reference material for the calibration of milk ELISA kits is increasingly important. The appropriate selection and understanding of milk ELISA kits for food analysis is critical to accurate quantification of milk residues and informed risk management decisions. © 2016 Institute of Food Technologists®