Modeling Protein Expression and Protein Signaling Pathways
Telesca, Donatello; Müller, Peter; Kornblau, Steven M.; Suchard, Marc A.; Ji, Yuan
2015-01-01
High-throughput functional proteomic technologies provide a way to quantify the expression of proteins of interest. Statistical inference centers on identifying the activation state of proteins and their patterns of molecular interaction formalized as dependence structure. Inference on dependence structure is particularly important when proteins are selected because they are part of a common molecular pathway. In that case, inference on dependence structure reveals properties of the underlying pathway. We propose a probability model that represents molecular interactions at the level of hidden binary latent variables that can be interpreted as indicators for active versus inactive states of the proteins. The proposed approach exploits available expert knowledge about the target pathway to define an informative prior on the hidden conditional dependence structure. An important feature of this prior is that it provides an instrument to explicitly anchor the model space to a set of interactions of interest, favoring a local search approach to model determination. We apply our model to reverse-phase protein array data from a study on acute myeloid leukemia. Our inference identifies relevant subpathways in relation to the unfolding of the biological process under study. PMID:26246646
A Computational Framework for Analyzing Stochasticity in Gene Expression
Sherman, Marc S.; Cohen, Barak A.
2014-01-01
Stochastic fluctuations in gene expression give rise to distributions of protein levels across cell populations. Despite a mounting number of theoretical models explaining stochasticity in protein expression, we lack a robust, efficient, assumption-free approach for inferring the molecular mechanisms that underlie the shape of protein distributions. Here we propose a method for inferring sets of biochemical rate constants that govern chromatin modification, transcription, translation, and RNA and protein degradation from stochasticity in protein expression. We asked whether the rates of these underlying processes can be estimated accurately from protein expression distributions, in the absence of any limiting assumptions. To do this, we (1) derived analytical solutions for the first four moments of the protein distribution, (2) found that these four moments completely capture the shape of protein distributions, and (3) developed an efficient algorithm for inferring gene expression rate constants from the moments of protein distributions. Using this algorithm we find that most protein distributions are consistent with a large number of different biochemical rate constant sets. Despite this degeneracy, the solution space of rate constants almost always informs on underlying mechanism. For example, we distinguish between regimes where transcriptional bursting occurs from regimes reflecting constitutive transcript production. Our method agrees with the current standard approach, and in the restrictive regime where the standard method operates, also identifies rate constants not previously obtainable. Even without making any assumptions we obtain estimates of individual biochemical rate constants, or meaningful ratios of rate constants, in 91% of tested cases. In some cases our method identified all of the underlying rate constants. The framework developed here will be a powerful tool for deducing the contributions of particular molecular mechanisms to specific patterns of gene expression. PMID:24811315
A prior-based integrative framework for functional transcriptional regulatory network inference
Siahpirani, Alireza F.
2017-01-01
Abstract Transcriptional regulatory networks specify regulatory proteins controlling the context-specific expression levels of genes. Inference of genome-wide regulatory networks is central to understanding gene regulation, but remains an open challenge. Expression-based network inference is among the most popular methods to infer regulatory networks, however, networks inferred from such methods have low overlap with experimentally derived (e.g. ChIP-chip and transcription factor (TF) knockouts) networks. Currently we have a limited understanding of this discrepancy. To address this gap, we first develop a regulatory network inference algorithm, based on probabilistic graphical models, to integrate expression with auxiliary datasets supporting a regulatory edge. Second, we comprehensively analyze our and other state-of-the-art methods on different expression perturbation datasets. Networks inferred by integrating sequence-specific motifs with expression have substantially greater agreement with experimentally derived networks, while remaining more predictive of expression than motif-based networks. Our analysis suggests natural genetic variation as the most informative perturbation for network inference, and, identifies core TFs whose targets are predictable from expression. Multiple reasons make the identification of targets of other TFs difficult, including network architecture and insufficient variation of TF mRNA level. Finally, we demonstrate the utility of our inference algorithm to infer stress-specific regulatory networks and for regulator prioritization. PMID:27794550
A linear programming model for protein inference problem in shotgun proteomics.
Huang, Ting; He, Zengyou
2012-11-15
Assembling peptides identified from tandem mass spectra into a list of proteins, referred to as protein inference, is an important issue in shotgun proteomics. The objective of protein inference is to find a subset of proteins that are truly present in the sample. Although many methods have been proposed for protein inference, several issues such as peptide degeneracy still remain unsolved. In this article, we present a linear programming model for protein inference. In this model, we use a transformation of the joint probability that each peptide/protein pair is present in the sample as the variable. Then, both the peptide probability and protein probability can be expressed as a formula in terms of the linear combination of these variables. Based on this simple fact, the protein inference problem is formulated as an optimization problem: minimize the number of proteins with non-zero probabilities under the constraint that the difference between the calculated peptide probability and the peptide probability generated from peptide identification algorithms should be less than some threshold. This model addresses the peptide degeneracy issue by forcing some joint probability variables involving degenerate peptides to be zero in a rigorous manner. The corresponding inference algorithm is named as ProteinLP. We test the performance of ProteinLP on six datasets. Experimental results show that our method is competitive with the state-of-the-art protein inference algorithms. The source code of our algorithm is available at: https://sourceforge.net/projects/prolp/. zyhe@dlut.edu.cn. Supplementary data are available at Bioinformatics Online.
A Protein Standard That Emulates Homology for the Characterization of Protein Inference Algorithms.
The, Matthew; Edfors, Fredrik; Perez-Riverol, Yasset; Payne, Samuel H; Hoopmann, Michael R; Palmblad, Magnus; Forsström, Björn; Käll, Lukas
2018-05-04
A natural way to benchmark the performance of an analytical experimental setup is to use samples of known composition and see to what degree one can correctly infer the content of such a sample from the data. For shotgun proteomics, one of the inherent problems of interpreting data is that the measured analytes are peptides and not the actual proteins themselves. As some proteins share proteolytic peptides, there might be more than one possible causative set of proteins resulting in a given set of peptides and there is a need for mechanisms that infer proteins from lists of detected peptides. A weakness of commercially available samples of known content is that they consist of proteins that are deliberately selected for producing tryptic peptides that are unique to a single protein. Unfortunately, such samples do not expose any complications in protein inference. Hence, for a realistic benchmark of protein inference procedures, there is a need for samples of known content where the present proteins share peptides with known absent proteins. Here, we present such a standard, that is based on E. coli expressed human protein fragments. To illustrate the application of this standard, we benchmark a set of different protein inference procedures on the data. We observe that inference procedures excluding shared peptides provide more accurate estimates of errors compared to methods that include information from shared peptides, while still giving a reasonable performance in terms of the number of identified proteins. We also demonstrate that using a sample of known protein content without proteins with shared tryptic peptides can give a false sense of accuracy for many protein inference methods.
Timescales and bottlenecks in miRNA-dependent gene regulation.
Hausser, Jean; Syed, Afzal Pasha; Selevsek, Nathalie; van Nimwegen, Erik; Jaskiewicz, Lukasz; Aebersold, Ruedi; Zavolan, Mihaela
2013-12-03
MiRNAs are post-transcriptional regulators that contribute to the establishment and maintenance of gene expression patterns. Although their biogenesis and decay appear to be under complex control, the implications of miRNA expression dynamics for the processes that they regulate are not well understood. We derived a mathematical model of miRNA-mediated gene regulation, inferred its parameters from experimental data sets, and found that the model describes well time-dependent changes in mRNA, protein and ribosome density levels measured upon miRNA transfection and induction. The inferred parameters indicate that the timescale of miRNA-dependent regulation is slower than initially thought. Delays in miRNA loading into Argonaute proteins and the slow decay of proteins relative to mRNAs can explain the typically small changes in protein levels observed upon miRNA transfection. For miRNAs to regulate protein expression on the timescale of a day, as miRNAs involved in cell-cycle regulation do, accelerated miRNA turnover is necessary.
Arenas, Ailan F; Salcedo, Gladys E; Gomez-Marin, Jorge E
2017-01-01
Pathogen-host protein-protein interaction systems examine the interactions between the protein repertoires of 2 distinct organisms. Some of these pathogen proteins interact with the host protein system and may manipulate it for their own advantages. In this work, we designed an R script by concatenating 2 functions called rowDM and rowCVmed to infer pathogen-host interaction using previously reported microarray data, including host gene enrichment analysis and the crossing of interspecific domain-domain interactions. We applied this script to the Toxoplasma-host system to describe pathogen survival mechanisms from human, mouse, and Toxoplasma Gene Expression Omnibus series. Our outcomes exhibited similar results with previously reported microarray analyses, but we found other important proteins that could contribute to toxoplasma pathogenesis. We observed that Toxoplasma ROP38 is the most differentially expressed protein among toxoplasma strains. Enrichment analysis and KEGG mapping indicated that the human retinal genes most affected by Toxoplasma infections are those related to antiapoptotic mechanisms. We suggest that proteins PIK3R1, PRKCA, PRKCG, PRKCB, HRAS, and c-JUN could be the possible substrates for differentially expressed Toxoplasma kinase ROP38. Likewise, we propose that Toxoplasma causes overexpression of apoptotic suppression human genes. PMID:29317802
A combinatorial perspective of the protein inference problem.
Yang, Chao; He, Zengyou; Yu, Weichuan
2013-01-01
In a shotgun proteomics experiment, proteins are the most biologically meaningful output. The success of proteomics studies depends on the ability to accurately and efficiently identify proteins. Many methods have been proposed to facilitate the identification of proteins from peptide identification results. However, the relationship between protein identification and peptide identification has not been thoroughly explained before. In this paper, we devote ourselves to a combinatorial perspective of the protein inference problem. We employ combinatorial mathematics to calculate the conditional protein probabilities (protein probability means the probability that a protein is correctly identified) under three assumptions, which lead to a lower bound, an upper bound, and an empirical estimation of protein probabilities, respectively. The combinatorial perspective enables us to obtain an analytical expression for protein inference. Our method achieves comparable results with ProteinProphet in a more efficient manner in experiments on two data sets of standard protein mixtures and two data sets of real samples. Based on our model, we study the impact of unique peptides and degenerate peptides (degenerate peptides are peptides shared by at least two proteins) on protein probabilities. Meanwhile, we also study the relationship between our model and ProteinProphet. We name our program ProteinInfer. Its Java source code, our supplementary document and experimental results are available at: >http://bioinformatics.ust.hk/proteininfer.
Will, Thorsten; Helms, Volkhard
2017-04-04
Differential analysis of cellular conditions is a key approach towards understanding the consequences and driving causes behind biological processes such as developmental transitions or diseases. The progress of whole-genome expression profiling enabled to conveniently capture the state of a cell's transcriptome and to detect the characteristic features that distinguish cells in specific conditions. In contrast, mapping the physical protein interactome for many samples is experimentally infeasible at the moment. For the understanding of the whole system, however, it is equally important how the interactions of proteins are rewired between cellular states. To overcome this deficiency, we recently showed how condition-specific protein interaction networks that even consider alternative splicing can be inferred from transcript expression data. Here, we present the differential network analysis tool PPICompare that was specifically designed for isoform-sensitive protein interaction networks. Besides detecting significant rewiring events between the interactomes of grouped samples, PPICompare infers which alterations to the transcriptome caused each rewiring event and what is the minimal set of alterations necessary to explain all between-group changes. When applied to the development of blood cells, we verified that a reasonable amount of rewiring events were reported by the tool and found that differential gene expression was the major determinant of cellular adjustments to the interactome. Alternative splicing events were consistently necessary in each developmental step to explain all significant alterations and were especially important for rewiring in the context of transcriptional control. Applying PPICompare enabled us to investigate the dynamics of the human protein interactome during developmental transitions. A platform-independent implementation of the tool PPICompare is available at https://sourceforge.net/projects/ppicompare/ .
Pey, Jon; Valgepea, Kaspar; Rubio, Angel; Beasley, John E; Planes, Francisco J
2013-12-08
The study of cellular metabolism in the context of high-throughput -omics data has allowed us to decipher novel mechanisms of importance in biotechnology and health. To continue with this progress, it is essential to efficiently integrate experimental data into metabolic modeling. We present here an in-silico framework to infer relevant metabolic pathways for a particular phenotype under study based on its gene/protein expression data. This framework is based on the Carbon Flux Path (CFP) approach, a mixed-integer linear program that expands classical path finding techniques by considering additional biophysical constraints. In particular, the objective function of the CFP approach is amended to account for gene/protein expression data and influence obtained paths. This approach is termed integrative Carbon Flux Path (iCFP). We show that gene/protein expression data also influences the stoichiometric balancing of CFPs, which provides a more accurate picture of active metabolic pathways. This is illustrated in both a theoretical and real scenario. Finally, we apply this approach to find novel pathways relevant in the regulation of acetate overflow metabolism in Escherichia coli. As a result, several targets which could be relevant for better understanding of the phenomenon leading to impaired acetate overflow are proposed. A novel mathematical framework that determines functional pathways based on gene/protein expression data is presented and validated. We show that our approach is able to provide new insights into complex biological scenarios such as acetate overflow in Escherichia coli.
Marbach, Daniel; Roy, Sushmita; Ay, Ferhat; Meyer, Patrick E.; Candeias, Rogerio; Kahveci, Tamer; Bristow, Christopher A.; Kellis, Manolis
2012-01-01
Gaining insights on gene regulation from large-scale functional data sets is a grand challenge in systems biology. In this article, we develop and apply methods for transcriptional regulatory network inference from diverse functional genomics data sets and demonstrate their value for gene function and gene expression prediction. We formulate the network inference problem in a machine-learning framework and use both supervised and unsupervised methods to predict regulatory edges by integrating transcription factor (TF) binding, evolutionarily conserved sequence motifs, gene expression, and chromatin modification data sets as input features. Applying these methods to Drosophila melanogaster, we predict ∼300,000 regulatory edges in a network of ∼600 TFs and 12,000 target genes. We validate our predictions using known regulatory interactions, gene functional annotations, tissue-specific expression, protein–protein interactions, and three-dimensional maps of chromosome conformation. We use the inferred network to identify putative functions for hundreds of previously uncharacterized genes, including many in nervous system development, which are independently confirmed based on their tissue-specific expression patterns. Last, we use the regulatory network to predict target gene expression levels as a function of TF expression, and find significantly higher predictive power for integrative networks than for motif or ChIP-based networks. Our work reveals the complementarity between physical evidence of regulatory interactions (TF binding, motif conservation) and functional evidence (coordinated expression or chromatin patterns) and demonstrates the power of data integration for network inference and studies of gene regulation at the systems level. PMID:22456606
A Prize-Collecting Steiner Tree Approach for Transduction Network Inference
NASA Astrophysics Data System (ADS)
Bailly-Bechet, Marc; Braunstein, Alfredo; Zecchina, Riccardo
Into the cell, information from the environment is mainly propagated via signaling pathways which form a transduction network. Here we propose a new algorithm to infer transduction networks from heterogeneous data, using both the protein interaction network and expression datasets. We formulate the inference problem as an optimization task, and develop a message-passing, probabilistic and distributed formalism to solve it. We apply our algorithm to the pheromone response in the baker’s yeast S. cerevisiae. We are able to find the backbone of the known structure of the MAPK cascade of pheromone response, validating our algorithm. More importantly, we make biological predictions about some proteins whose role could be at the interface between pheromone response and other cellular functions.
Handfield, Louis-François; Chong, Yolanda T.; Simmons, Jibril; Andrews, Brenda J.; Moses, Alan M.
2013-01-01
Protein subcellular localization has been systematically characterized in budding yeast using fluorescently tagged proteins. Based on the fluorescence microscopy images, subcellular localization of many proteins can be classified automatically using supervised machine learning approaches that have been trained to recognize predefined image classes based on statistical features. Here, we present an unsupervised analysis of protein expression patterns in a set of high-resolution, high-throughput microscope images. Our analysis is based on 7 biologically interpretable features which are evaluated on automatically identified cells, and whose cell-stage dependency is captured by a continuous model for cell growth. We show that it is possible to identify most previously identified localization patterns in a cluster analysis based on these features and that similarities between the inferred expression patterns contain more information about protein function than can be explained by a previous manual categorization of subcellular localization. Furthermore, the inferred cell-stage associated to each fluorescence measurement allows us to visualize large groups of proteins entering the bud at specific stages of bud growth. These correspond to proteins localized to organelles, revealing that the organelles must be entering the bud in a stereotypical order. We also identify and organize a smaller group of proteins that show subtle differences in the way they move around the bud during growth. Our results suggest that biologically interpretable features based on explicit models of cell morphology will yield unprecedented power for pattern discovery in high-resolution, high-throughput microscopy images. PMID:23785265
In silico prediction of protein-protein interactions in human macrophages
2014-01-01
Background Protein-protein interaction (PPI) network analyses are highly valuable in deciphering and understanding the intricate organisation of cellular functions. Nevertheless, the majority of available protein-protein interaction networks are context-less, i.e. without any reference to the spatial, temporal or physiological conditions in which the interactions may occur. In this work, we are proposing a protocol to infer the most likely protein-protein interaction (PPI) network in human macrophages. Results We integrated the PPI dataset from the Agile Protein Interaction DataAnalyzer (APID) with different meta-data to infer a contextualized macrophage-specific interactome using a combination of statistical methods. The obtained interactome is enriched in experimentally verified interactions and in proteins involved in macrophage-related biological processes (i.e. immune response activation, regulation of apoptosis). As a case study, we used the contextualized interactome to highlight the cellular processes induced upon Mycobacterium tuberculosis infection. Conclusion Our work confirms that contextualizing interactomes improves the biological significance of bioinformatic analyses. More specifically, studying such inferred network rather than focusing at the gene expression level only, is informative on the processes involved in the host response. Indeed, important immune features such as apoptosis are solely highlighted when the spotlight is on the protein interaction level. PMID:24636261
A Multi-Method Approach for Proteomic Network Inference in 11 Human Cancers.
Şenbabaoğlu, Yasin; Sümer, Selçuk Onur; Sánchez-Vega, Francisco; Bemis, Debra; Ciriello, Giovanni; Schultz, Nikolaus; Sander, Chris
2016-02-01
Protein expression and post-translational modification levels are tightly regulated in neoplastic cells to maintain cellular processes known as 'cancer hallmarks'. The first Pan-Cancer initiative of The Cancer Genome Atlas (TCGA) Research Network has aggregated protein expression profiles for 3,467 patient samples from 11 tumor types using the antibody based reverse phase protein array (RPPA) technology. The resultant proteomic data can be utilized to computationally infer protein-protein interaction (PPI) networks and to study the commonalities and differences across tumor types. In this study, we compare the performance of 13 established network inference methods in their capacity to retrieve the curated Pathway Commons interactions from RPPA data. We observe that no single method has the best performance in all tumor types, but a group of six methods, including diverse techniques such as correlation, mutual information, and regression, consistently rank highly among the tested methods. We utilize the high performing methods to obtain a consensus network; and identify four robust and densely connected modules that reveal biological processes as well as suggest antibody-related technical biases. Mapping the consensus network interactions to Reactome gene lists confirms the pan-cancer importance of signal transduction pathways, innate and adaptive immune signaling, cell cycle, metabolism, and DNA repair; and also suggests several biological processes that may be specific to a subset of tumor types. Our results illustrate the utility of the RPPA platform as a tool to study proteomic networks in cancer.
Estimation of the proteomic cancer co-expression sub networks by using association estimators.
Erdoğan, Cihat; Kurt, Zeyneb; Diri, Banu
2017-01-01
In this study, the association estimators, which have significant influences on the gene network inference methods and used for determining the molecular interactions, were examined within the co-expression network inference concept. By using the proteomic data from five different cancer types, the hub genes/proteins within the disease-associated gene-gene/protein-protein interaction sub networks were identified. Proteomic data from various cancer types is collected from The Cancer Proteome Atlas (TCPA). Correlation and mutual information (MI) based nine association estimators that are commonly used in the literature, were compared in this study. As the gold standard to measure the association estimators' performance, a multi-layer data integration platform on gene-disease associations (DisGeNET) and the Molecular Signatures Database (MSigDB) was used. Fisher's exact test was used to evaluate the performance of the association estimators by comparing the created co-expression networks with the disease-associated pathways. It was observed that the MI based estimators provided more successful results than the Pearson and Spearman correlation approaches, which are used in the estimation of biological networks in the weighted correlation network analysis (WGCNA) package. In correlation-based methods, the best average success rate for five cancer types was 60%, while in MI-based methods the average success ratio was 71% for James-Stein Shrinkage (Shrink) and 64% for Schurmann-Grassberger (SG) association estimator, respectively. Moreover, the hub genes and the inferred sub networks are presented for the consideration of researchers and experimentalists.
Estimation of the proteomic cancer co-expression sub networks by using association estimators
Kurt, Zeyneb; Diri, Banu
2017-01-01
In this study, the association estimators, which have significant influences on the gene network inference methods and used for determining the molecular interactions, were examined within the co-expression network inference concept. By using the proteomic data from five different cancer types, the hub genes/proteins within the disease-associated gene-gene/protein-protein interaction sub networks were identified. Proteomic data from various cancer types is collected from The Cancer Proteome Atlas (TCPA). Correlation and mutual information (MI) based nine association estimators that are commonly used in the literature, were compared in this study. As the gold standard to measure the association estimators’ performance, a multi-layer data integration platform on gene-disease associations (DisGeNET) and the Molecular Signatures Database (MSigDB) was used. Fisher's exact test was used to evaluate the performance of the association estimators by comparing the created co-expression networks with the disease-associated pathways. It was observed that the MI based estimators provided more successful results than the Pearson and Spearman correlation approaches, which are used in the estimation of biological networks in the weighted correlation network analysis (WGCNA) package. In correlation-based methods, the best average success rate for five cancer types was 60%, while in MI-based methods the average success ratio was 71% for James-Stein Shrinkage (Shrink) and 64% for Schurmann-Grassberger (SG) association estimator, respectively. Moreover, the hub genes and the inferred sub networks are presented for the consideration of researchers and experimentalists. PMID:29145449
Western Blotting of the Endocannabinoid System.
Wager-Miller, Jim; Mackie, Ken
2016-01-01
Measuring expression levels of G protein-coupled receptors (GPCRs) is an important step for understanding the distribution, function, and regulation of these receptors. A common approach for detecting proteins from complex biological systems is Western blotting. In this chapter, we describe a general approach to Western blotting protein components of the endocannabinoid system using sodium dodecyl sulfate-polyacrylamide gel electrophoresis and nitrocellulose membranes, with a focus on detecting type 1 cannabinoid (CB1) receptors. When this technique is carefully used, specifically with validation of the primary antibodies, it can provide quantitative information on protein expression levels. Additional information can also be inferred from Western blotting such as potential posttranslational modifications that can be further evaluated by specific analytical techniques.
Van Coillie, Samya; Liang, Lunxi; Zhang, Yao; Wang, Huanbin; Fang, Jing-Yuan; Xu, Jie
2016-04-05
High-throughput methods such as co-immunoprecipitationmass spectrometry (coIP-MS) and yeast 2 hybridization (Y2H) have suggested a broad range of unannotated protein-protein interactions (PPIs), and interpretation of these PPIs remains a challenging task. The advancements in cancer genomic researches allow for the inference of "coactivation pairs" in cancer, which may facilitate the identification of PPIs involved in cancer. Here we present OncoBinder as a tool for the assessment of proteomic interaction data based on the functional synergy of oncoproteins in cancer. This decision tree-based method combines gene mutation, copy number and mRNA expression information to infer the functional status of protein-coding genes. We applied OncoBinder to evaluate the potential binders of EGFR and ERK2 proteins based on the gastric cancer dataset of The Cancer Genome Atlas (TCGA). As a result, OncoBinder identified high confidence interactions (annotated by Kyoto Encyclopedia of Genes and Genomes (KEGG) or validated by low-throughput assays) more efficiently than co-expression based method. Taken together, our results suggest that evaluation of gene functional synergy in cancer may facilitate the interpretation of proteomic interaction data. The OncoBinder toolbox for Matlab is freely accessible online.
Identifying cooperative transcriptional regulations using protein–protein interactions
Nagamine, Nobuyoshi; Kawada, Yuji; Sakakibara, Yasubumi
2005-01-01
Cooperative transcriptional activations among multiple transcription factors (TFs) are important to understand the mechanisms of complex transcriptional regulations in eukaryotes. Previous studies have attempted to find cooperative TFs based on gene expression data with gene expression profiles as a measure of similarity of gene regulations. In this paper, we use protein–protein interaction data to infer synergistic binding of cooperative TFs. Our fundamental idea is based on the assumption that genes contributing to a similar biological process are regulated under the same control mechanism. First, the protein–protein interaction networks are used to calculate the similarity of biological processes among genes. Second, we integrate this similarity and the chromatin immuno-precipitation data to identify cooperative TFs. Our computational experiments in yeast show that predictions made by our method have successfully identified eight pairs of cooperative TFs that have literature evidences but could not be identified by the previous method. Further, 12 new possible pairs have been inferred and we have examined the biological relevances for them. However, since a typical problem using protein–protein interaction data is that many false-positive data are contained, we propose a method combining various biological data to increase the prediction accuracy. PMID:16126847
Liu, Xuewu; Huang, Yuxiao; Liang, Jiao; Zhang, Shuai; Li, Yinghui; Wang, Jun; Shen, Yan; Xu, Zhikai; Zhao, Ya
2014-11-30
The invasion of red blood cells (RBCs) by malarial parasites is an essential step in the life cycle of Plasmodium falciparum. Human-parasite surface protein interactions play a critical role in this process. Although several interactions between human and parasite proteins have been discovered, the mechanism related to invasion remains poorly understood because numerous human-parasite protein interactions have not yet been identified. High-throughput screening experiments are not feasible for malarial parasites due to difficulty in expressing the parasite proteins. Here, we performed computational prediction of the PPIs involved in malaria parasite invasion to elucidate the mechanism by which invasion occurs. In this study, an expectation maximization algorithm was used to estimate the probabilities of domain-domain interactions (DDIs). Estimates of DDI probabilities were then used to infer PPI probabilities. We found that our prediction performance was better than that based on the information of D. melanogaster alone when information related to the six species was used. Prediction performance was assessed using protein interaction data from S. cerevisiae, indicating that the predicted results were reliable. We then used the estimates of DDI probabilities to infer interactions between 490 parasite and 3,787 human membrane proteins. A small-scale dataset was used to illustrate the usability of our method in predicting interactions between human and parasite proteins. The positive predictive value (PPV) was lower than that observed in S. cerevisiae. We integrated gene expression data to improve prediction accuracy and to reduce false positives. We identified 80 membrane proteins highly expressed in the schizont stage by fast Fourier transform method. Approximately 221 erythrocyte membrane proteins were identified using published mass spectral datasets. A network consisting of 205 interactions was predicted. Results of network analysis suggest that SNARE proteins of parasites and APP of humans may function in the invasion of RBCs by parasites. We predicted a small-scale PPI network that may be involved in parasite invasion of RBCs by integrating DDI information and expression profiles. Experimental studies should be conducted to validate the predicted interactions. The predicted PPIs help elucidate the mechanism of parasite invasion and provide directions for future experimental investigations.
Expression of c-Fes protein isoforms correlates with differentiation in myeloid leukemias.
Carlson, Anne; Berkowitz, Jeanne McAdara; Browning, Damaris; Slamon, Dennis J; Gasson, Judith C; Yates, Karen E
2005-05-01
The cellular fes gene encodes a 93-kilodalton protein-tyrosine kinase (p93) that is expressed in both normal and neoplastic myeloid cells. Increased c-Fes expression is associated with differentiation in normal myeloid cells and cell lines. Our hypothesis was that primary leukemia cells would show a similar pattern of increased expression in more differentiated cells. Therefore, we compared c-Fes expression in cells with an undifferentiated, blast phenotype (acute myelogenous leukemia--AML) to cells with a differentiated phenotype (chronic myelogenous leukemia--CML). Instead of differences in p93 expression levels, we found complex patterns of c-Fes immunoreactive proteins that corresponded with differentiation in normal and leukemic myeloid cells. The "blast" pattern consisted of c-Fes immunoreactive proteins p93, p74, and p70; the "differentiated" pattern showed two additional c-Fes immunoreactive proteins, p67 and p62. Using mRNA from mouse and human cell lines, we found deletion of one or more exons in the c-fes mRNA. Those deletions predicted truncation of conserved domains (CDC15/FCH and SH2) involved in protein-protein interactions. No deletions were found, however, within the kinase domain. We infer that alternative splicing generates a family of c-Fes proteins. This may be a mechanism to direct the c-Fes kinase domain to different subcellular locations and/or substrates at specific stages of myeloid cell differentiation.
Djordjevic, Michael A; Chen, Han Cai; Natera, Siria; Van Noorden, Giel; Menzel, Christian; Taylor, Scott; Renard, Clotilde; Geiger, Otto; Weiller, Georg F
2003-06-01
A proteomic examination of Sinorhizobium meliloti strain 1021 was undertaken using a combination of 2-D gel electrophoresis, peptide mass fingerprinting, and bioinformatics. Our goal was to identify (i) putative symbiosis- or nutrient-stress-specific proteins, (ii) the biochemical pathways active under different conditions, (iii) potential new genes, and (iv) the extent of posttranslational modifications of S. meliloti proteins. In total, we identified the protein products of 810 genes (13.1% of the genome's coding capacity). The 810 genes generated 1,180 gene products, with chromosomal genes accounting for 78% of the gene products identified (18.8% of the chromosome's coding capacity). The activity of 53 metabolic pathways was inferred from bioinformatic analysis of proteins with assigned Enzyme Commission numbers. Of the remaining proteins that did not encode enzymes, ABC-type transporters composed 12.7% and regulatory proteins 3.4% of the total. Proteins with up to seven transmembrane domains were identified in membrane preparations. A total of 27 putative nodule-specific proteins and 35 nutrient-stress-specific proteins were identified and used as a basis to define genes and describe processes occurring in S. meliloti cells in nodules and under stress. Several nodule proteins from the plant host were present in the nodule bacteria preparations. We also identified seven potentially novel proteins not predicted from the DNA sequence. Post-translational modifications such as N-terminal processing could be inferred from the data. The posttranslational addition of UMP to the key regulator of nitrogen metabolism, PII, was demonstrated. This work demonstrates the utility of combining mass spectrometry with protein arraying or separation techniques to identify candidate genes involved in important biological processes and niche occupations that may be intransigent to other methods of gene expression profiling.
Inference of quantitative models of bacterial promoters from time-series reporter gene data.
Stefan, Diana; Pinel, Corinne; Pinhal, Stéphane; Cinquemani, Eugenio; Geiselmann, Johannes; de Jong, Hidde
2015-01-01
The inference of regulatory interactions and quantitative models of gene regulation from time-series transcriptomics data has been extensively studied and applied to a range of problems in drug discovery, cancer research, and biotechnology. The application of existing methods is commonly based on implicit assumptions on the biological processes under study. First, the measurements of mRNA abundance obtained in transcriptomics experiments are taken to be representative of protein concentrations. Second, the observed changes in gene expression are assumed to be solely due to transcription factors and other specific regulators, while changes in the activity of the gene expression machinery and other global physiological effects are neglected. While convenient in practice, these assumptions are often not valid and bias the reverse engineering process. Here we systematically investigate, using a combination of models and experiments, the importance of this bias and possible corrections. We measure in real time and in vivo the activity of genes involved in the FliA-FlgM module of the E. coli motility network. From these data, we estimate protein concentrations and global physiological effects by means of kinetic models of gene expression. Our results indicate that correcting for the bias of commonly-made assumptions improves the quality of the models inferred from the data. Moreover, we show by simulation that these improvements are expected to be even stronger for systems in which protein concentrations have longer half-lives and the activity of the gene expression machinery varies more strongly across conditions than in the FliA-FlgM module. The approach proposed in this study is broadly applicable when using time-series transcriptome data to learn about the structure and dynamics of regulatory networks. In the case of the FliA-FlgM module, our results demonstrate the importance of global physiological effects and the active regulation of FliA and FlgM half-lives for the dynamics of FliA-dependent promoters.
Chasman, Deborah; Walters, Kevin B.; Lopes, Tiago J. S.; Eisfeld, Amie J.; Kawaoka, Yoshihiro; Roy, Sushmita
2016-01-01
Mammalian host response to pathogenic infections is controlled by a complex regulatory network connecting regulatory proteins such as transcription factors and signaling proteins to target genes. An important challenge in infectious disease research is to understand molecular similarities and differences in mammalian host response to diverse sets of pathogens. Recently, systems biology studies have produced rich collections of omic profiles measuring host response to infectious agents such as influenza viruses at multiple levels. To gain a comprehensive understanding of the regulatory network driving host response to multiple infectious agents, we integrated host transcriptomes and proteomes using a network-based approach. Our approach combines expression-based regulatory network inference, structured-sparsity based regression, and network information flow to infer putative physical regulatory programs for expression modules. We applied our approach to identify regulatory networks, modules and subnetworks that drive host response to multiple influenza infections. The inferred regulatory network and modules are significantly enriched for known pathways of immune response and implicate apoptosis, splicing, and interferon signaling processes in the differential response of viral infections of different pathogenicities. We used the learned network to prioritize regulators and study virus and time-point specific networks. RNAi-based knockdown of predicted regulators had significant impact on viral replication and include several previously unknown regulators. Taken together, our integrated analysis identified novel module level patterns that capture strain and pathogenicity-specific patterns of expression and helped identify important regulators of host response to influenza infection. PMID:27403523
Rajjou, Loïc; Belghazi, Maya; Huguet, Romain; Robin, Caroline; Moreau, Adrien; Job, Claudette; Job, Dominique
2006-07-01
The influence of salicylic acid (SA) on elicitation of defense mechanisms in Arabidopsis (Arabidopsis thaliana) seeds and seedlings was assessed by physiological measurements combined with global expression profiling (proteomics). Parallel experiments were carried out using the NahG transgenic plants expressing the bacterial gene encoding SA hydroxylase, which cannot accumulate the active form of this plant defense elicitor. SA markedly improved germination under salt stress. Proteomic analyses disclosed a specific accumulation of protein spots regulated by SA as inferred by silver-nitrate staining of two-dimensional gels, detection of carbonylated (oxidized) proteins, and neosynthesized proteins with [35S]-methionine. The combined results revealed several processes potentially affected by SA. This molecule enhanced the reinduction of the late maturation program during early stages of germination, thereby allowing the germinating seeds to reinforce their capacity to mount adaptive responses in environmental water stress. Other processes affected by SA concerned the quality of protein translation, the priming of seed metabolism, the synthesis of antioxidant enzymes, and the mobilization of seed storage proteins. All the observed effects are likely to improve seed vigor. Another aspect revealed by this study concerned the oxidative stress entailed by SA in germinating seeds, as inferred from a characterization of the carbonylated (oxidized) proteome. Finally, the proteomic data revealed a close interplay between abscisic signaling and SA elicitation of seed vigor.
Narimani, Zahra; Beigy, Hamid; Ahmad, Ashar; Masoudi-Nejad, Ali; Fröhlich, Holger
2017-01-01
Inferring the structure of molecular networks from time series protein or gene expression data provides valuable information about the complex biological processes of the cell. Causal network structure inference has been approached using different methods in the past. Most causal network inference techniques, such as Dynamic Bayesian Networks and ordinary differential equations, are limited by their computational complexity and thus make large scale inference infeasible. This is specifically true if a Bayesian framework is applied in order to deal with the unavoidable uncertainty about the correct model. We devise a novel Bayesian network reverse engineering approach using ordinary differential equations with the ability to include non-linearity. Besides modeling arbitrary, possibly combinatorial and time dependent perturbations with unknown targets, one of our main contributions is the use of Expectation Propagation, an algorithm for approximate Bayesian inference over large scale network structures in short computation time. We further explore the possibility of integrating prior knowledge into network inference. We evaluate the proposed model on DREAM4 and DREAM8 data and find it competitive against several state-of-the-art existing network inference methods.
Mayfield, Anderson B; Wang, Yu-Bin; Chen, Chii-Shiarng; Chen, Shu-Hwa; Lin, Chung-Yen
2016-12-01
As significant anthropogenic pressures are putting undue stress on the world's oceans, there has been a concerted effort to understand how marine organisms respond to environmental change. Transcriptomic approaches, in particular, have been readily employed to document the mRNA-level response of a plethora of marine invertebrates exposed to an array of simulated stress scenarios, with the tacit and untested assumption being that the respective proteins show a corresponding trend. To better understand the degree of congruency between mRNA and protein expression in an endosymbiotic marine invertebrate, mRNAs and proteins were sequenced from the same samples of the common, Indo-Pacific coral Seriatopora hystrix exposed to stable or upwelling-simulating conditions for 1 week. Of the 167 proteins downregulated at variable temperature, only two were associated with mRNAs that were also differentially expressed between treatments. Of the 378 differentially expressed genes, none were associated with a differentially expressed protein. Collectively, these results highlight the inherent risk of inferring cellular behaviour based on mRNA expression data alone and challenge the current, mRNA-focused approach taken by most marine and many molecular biologists. © 2016 The Authors. Molecular Ecology Published by John Wiley & Sons Ltd.
Minas, Giorgos; Momiji, Hiroshi; Jenkins, Dafyd J; Costa, Maria J; Rand, David A; Finkenstädt, Bärbel
2017-06-26
Given the development of high-throughput experimental techniques, an increasing number of whole genome transcription profiling time series data sets, with good temporal resolution, are becoming available to researchers. The ReTrOS toolbox (Reconstructing Transcription Open Software) provides MATLAB-based implementations of two related methods, namely ReTrOS-Smooth and ReTrOS-Switch, for reconstructing the temporal transcriptional activity profile of a gene from given mRNA expression time series or protein reporter time series. The methods are based on fitting a differential equation model incorporating the processes of transcription, translation and degradation. The toolbox provides a framework for model fitting along with statistical analyses of the model with a graphical interface and model visualisation. We highlight several applications of the toolbox, including the reconstruction of the temporal cascade of transcriptional activity inferred from mRNA expression data and protein reporter data in the core circadian clock in Arabidopsis thaliana, and how such reconstructed transcription profiles can be used to study the effects of different cell lines and conditions. The ReTrOS toolbox allows users to analyse gene and/or protein expression time series where, with appropriate formulation of prior information about a minimum of kinetic parameters, in particular rates of degradation, users are able to infer timings of changes in transcriptional activity. Data from any organism and obtained from a range of technologies can be used as input due to the flexible and generic nature of the model and implementation. The output from this software provides a useful analysis of time series data and can be incorporated into further modelling approaches or in hypothesis generation.
Elaziz, Mohamed Abd; Hemdan, Ahmed Monem; Hassanien, AboulElla; Oliva, Diego; Xiong, Shengwu
2017-09-07
The current economics of the fish protein industry demand rapid, accurate and expressive prediction algorithms at every step of protein production especially with the challenge of global climate change. This help to predict and analyze functional and nutritional quality then consequently control food allergies in hyper allergic patients. As, it is quite expensive and time-consuming to know these concentrations by the lab experimental tests, especially to conduct large-scale projects. Therefore, this paper introduced a new intelligent algorithm using adaptive neuro-fuzzy inference system based on whale optimization algorithm. This algorithm is used to predict the concentration levels of bioactive amino acids in fish protein hydrolysates at different times during the year. The whale optimization algorithm is used to determine the optimal parameters in adaptive neuro-fuzzy inference system. The results of proposed algorithm are compared with others and it is indicated the higher performance of the proposed algorithm.
Generic comparison of protein inference engines.
Claassen, Manfred; Reiter, Lukas; Hengartner, Michael O; Buhmann, Joachim M; Aebersold, Ruedi
2012-04-01
Protein identifications, instead of peptide-spectrum matches, constitute the biologically relevant result of shotgun proteomics studies. How to appropriately infer and report protein identifications has triggered a still ongoing debate. This debate has so far suffered from the lack of appropriate performance measures that allow us to objectively assess protein inference approaches. This study describes an intuitive, generic and yet formal performance measure and demonstrates how it enables experimentalists to select an optimal protein inference strategy for a given collection of fragment ion spectra. We applied the performance measure to systematically explore the benefit of excluding possibly unreliable protein identifications, such as single-hit wonders. Therefore, we defined a family of protein inference engines by extending a simple inference engine by thousands of pruning variants, each excluding a different specified set of possibly unreliable identifications. We benchmarked these protein inference engines on several data sets representing different proteomes and mass spectrometry platforms. Optimally performing inference engines retained all high confidence spectral evidence, without posterior exclusion of any type of protein identifications. Despite the diversity of studied data sets consistently supporting this rule, other data sets might behave differently. In order to ensure maximal reliable proteome coverage for data sets arising in other studies we advocate abstaining from rigid protein inference rules, such as exclusion of single-hit wonders, and instead consider several protein inference approaches and assess these with respect to the presented performance measure in the specific application context.
The Evolution of Human Cells in Terms of Protein Innovation
Sardar, Adam J.; Oates, Matt E.; Fang, Hai; Forrest, Alistair R.R.; Kawaji, Hideya; Gough, Julian; Rackham, Owen J.L.
2014-01-01
Humans are composed of hundreds of cell types. As the genomic DNA of each somatic cell is identical, cell type is determined by what is expressed and when. Until recently, little has been reported about the determinants of human cell identity, particularly from the joint perspective of gene evolution and expression. Here, we chart the evolutionary past of all documented human cell types via the collective histories of proteins, the principal product of gene expression. FANTOM5 data provide cell-type–specific digital expression of human protein-coding genes and the SUPERFAMILY resource is used to provide protein domain annotation. The evolutionary epoch in which each protein was created is inferred by comparison with domain annotation of all other completely sequenced genomes. Studying the distribution across epochs of genes expressed in each cell type reveals insights into human cellular evolution in terms of protein innovation. For each cell type, its history of protein innovation is charted based on the genes it expresses. Combining the histories of all cell types enables us to create a timeline of cell evolution. This timeline identifies the possibility that our common ancestor Coelomata (cavity-forming animals) provided the innovation required for the innate immune system, whereas cells which now form the brain of human have followed a trajectory of continually accumulating novel proteins since Opisthokonta (boundary of animals and fungi). We conclude that exaptation of existing domain architectures into new contexts is the dominant source of cell-type–specific domain architectures. PMID:24692656
Erdem, Cemal; Nagle, Alison M.; Casa, Angelo J.; Litzenburger, Beate C.; Wang, Yu-fen; Taylor, D. Lansing; Lee, Adrian V.; Lezon, Timothy R.
2016-01-01
Insulin and insulin-like growth factor I (IGF1) influence cancer risk and progression through poorly understood mechanisms. To better understand the roles of insulin and IGF1 signaling in breast cancer, we combined proteomic screening with computational network inference to uncover differences in IGF1 and insulin induced signaling. Using reverse phase protein array, we measured the levels of 134 proteins in 21 breast cancer cell lines stimulated with IGF1 or insulin for up to 48 h. We then constructed directed protein expression networks using three separate methods: (i) lasso regression, (ii) conventional matrix inversion, and (iii) entropy maximization. These networks, named here as the time translation models, were analyzed and the inferred interactions were ranked by differential magnitude to identify pathway differences. The two top candidates, chosen for experimental validation, were shown to regulate IGF1/insulin induced phosphorylation events. First, acetyl-CoA carboxylase (ACC) knock-down was shown to increase the level of mitogen-activated protein kinase (MAPK) phosphorylation. Second, stable knock-down of E-Cadherin increased the phospho-Akt protein levels. Both of the knock-down perturbations incurred phosphorylation responses stronger in IGF1 stimulated cells compared with insulin. Overall, the time-translation modeling coupled to wet-lab experiments has proven to be powerful in inferring differential interactions downstream of IGF1 and insulin signaling, in vitro. PMID:27364358
DOE Office of Scientific and Technical Information (OSTI.GOV)
Webb-Robertson, Bobbie-Jo M.; Matzke, Melissa M.; Datta, Susmita
As the capability of mass spectrometry-based proteomics has matured, tens of thousands of peptides can be measured simultaneously, which has the benefit of offering a systems view of protein expression. However, a major challenge is that with an increase in throughput, protein quantification estimation from the native measured peptides has become a computational task. A limitation to existing computationally-driven protein quantification methods is that most ignore protein variation, such as alternate splicing of the RNA transcript and post-translational modifications or other possible proteoforms, which will affect a significant fraction of the proteome. The consequence of this assumption is that statisticalmore » inference at the protein level, and consequently downstream analyses, such as network and pathway modeling, have only limited power for biomarker discovery. Here, we describe a Bayesian model (BP-Quant) that uses statistically derived peptides signatures to identify peptides that are outside the dominant pattern, or the existence of multiple over-expressed patterns to improve relative protein abundance estimates. It is a research-driven approach that utilizes the objectives of the experiment, defined in the context of a standard statistical hypothesis, to identify a set of peptides exhibiting similar statistical behavior relating to a protein. This approach infers that changes in relative protein abundance can be used as a surrogate for changes in function, without necessarily taking into account the effect of differential post-translational modifications, processing, or splicing in altering protein function. We verify the approach using a dilution study from mouse plasma samples and demonstrate that BP-Quant achieves similar accuracy as the current state-of-the-art methods at proteoform identification with significantly better specificity. BP-Quant is available as a MatLab ® and R packages at https://github.com/PNNL-Comp-Mass-Spec/BP-Quant.« less
Cytoscape: a software environment for integrated models of biomolecular interaction networks.
Shannon, Paul; Markiel, Andrew; Ozier, Owen; Baliga, Nitin S; Wang, Jonathan T; Ramage, Daniel; Amin, Nada; Schwikowski, Benno; Ideker, Trey
2003-11-01
Cytoscape is an open source software project for integrating biomolecular interaction networks with high-throughput expression data and other molecular states into a unified conceptual framework. Although applicable to any system of molecular components and interactions, Cytoscape is most powerful when used in conjunction with large databases of protein-protein, protein-DNA, and genetic interactions that are increasingly available for humans and model organisms. Cytoscape's software Core provides basic functionality to layout and query the network; to visually integrate the network with expression profiles, phenotypes, and other molecular states; and to link the network to databases of functional annotations. The Core is extensible through a straightforward plug-in architecture, allowing rapid development of additional computational analyses and features. Several case studies of Cytoscape plug-ins are surveyed, including a search for interaction pathways correlating with changes in gene expression, a study of protein complexes involved in cellular recovery to DNA damage, inference of a combined physical/functional interaction network for Halobacterium, and an interface to detailed stochastic/kinetic gene regulatory models.
An integrative approach to inferring biologically meaningful gene modules.
Cho, Ji-Hoon; Wang, Kai; Galas, David J
2011-07-26
The ability to construct biologically meaningful gene networks and modules is critical for contemporary systems biology. Though recent studies have demonstrated the power of using gene modules to shed light on the functioning of complex biological systems, most modules in these networks have shown little association with meaningful biological function. We have devised a method which directly incorporates gene ontology (GO) annotation in construction of gene modules in order to gain better functional association. We have devised a method, Semantic Similarity-Integrated approach for Modularization (SSIM) that integrates various gene-gene pairwise similarity values, including information obtained from gene expression, protein-protein interactions and GO annotations, in the construction of modules using affinity propagation clustering. We demonstrated the performance of the proposed method using data from two complex biological responses: 1. the osmotic shock response in Saccharomyces cerevisiae, and 2. the prion-induced pathogenic mouse model. In comparison with two previously reported algorithms, modules identified by SSIM showed significantly stronger association with biological functions. The incorporation of semantic similarity based on GO annotation with gene expression and protein-protein interaction data can greatly enhance the functional relevance of inferred gene modules. In addition, the SSIM approach can also reveal the hierarchical structure of gene modules to gain a broader functional view of the biological system. Hence, the proposed method can facilitate comprehensive and in-depth analysis of high throughput experimental data at the gene network level.
Annotation of gene function in citrus using gene expression information and co-expression networks
2014-01-01
Background The genus Citrus encompasses major cultivated plants such as sweet orange, mandarin, lemon and grapefruit, among the world’s most economically important fruit crops. With increasing volumes of transcriptomics data available for these species, Gene Co-expression Network (GCN) analysis is a viable option for predicting gene function at a genome-wide scale. GCN analysis is based on a “guilt-by-association” principle whereby genes encoding proteins involved in similar and/or related biological processes may exhibit similar expression patterns across diverse sets of experimental conditions. While bioinformatics resources such as GCN analysis are widely available for efficient gene function prediction in model plant species including Arabidopsis, soybean and rice, in citrus these tools are not yet developed. Results We have constructed a comprehensive GCN for citrus inferred from 297 publicly available Affymetrix Genechip Citrus Genome microarray datasets, providing gene co-expression relationships at a genome-wide scale (33,000 transcripts). The comprehensive citrus GCN consists of a global GCN (condition-independent) and four condition-dependent GCNs that survey the sweet orange species only, all citrus fruit tissues, all citrus leaf tissues, or stress-exposed plants. All of these GCNs are clustered using genome-wide, gene-centric (guide) and graph clustering algorithms for flexibility of gene function prediction. For each putative cluster, gene ontology (GO) enrichment and gene expression specificity analyses were performed to enhance gene function, expression and regulation pattern prediction. The guide-gene approach was used to infer novel roles of genes involved in disease susceptibility and vitamin C metabolism, and graph-clustering approaches were used to investigate isoprenoid/phenylpropanoid metabolism in citrus peel, and citric acid catabolism via the GABA shunt in citrus fruit. Conclusions Integration of citrus gene co-expression networks, functional enrichment analysis and gene expression information provide opportunities to infer gene function in citrus. We present a publicly accessible tool, Network Inference for Citrus Co-Expression (NICCE, http://citrus.adelaide.edu.au/nicce/home.aspx), for the gene co-expression analysis in citrus. PMID:25023870
Prediction of virus-host protein-protein interactions mediated by short linear motifs.
Becerra, Andrés; Bucheli, Victor A; Moreno, Pedro A
2017-03-09
Short linear motifs in host organisms proteins can be mimicked by viruses to create protein-protein interactions that disable or control metabolic pathways. Given that viral linear motif instances of host motif regular expressions can be found by chance, it is necessary to develop filtering methods of functional linear motifs. We conduct a systematic comparison of linear motifs filtering methods to develop a computational approach for predicting motif-mediated protein-protein interactions between human and the human immunodeficiency virus 1 (HIV-1). We implemented three filtering methods to obtain linear motif sets: 1) conserved in viral proteins (C), 2) located in disordered regions (D) and 3) rare or scarce in a set of randomized viral sequences (R). The sets C,D,R are united and intersected. The resulting sets are compared by the number of protein-protein interactions correctly inferred with them - with experimental validation. The comparison is done with HIV-1 sequences and interactions from the National Institute of Allergy and Infectious Diseases (NIAID). The number of correctly inferred interactions allows to rank the interactions by the sets used to deduce them: D∪R and C. The ordering of the sets is descending on the probability of capturing functional interactions. With respect to HIV-1, the sets C∪R, D∪R, C∪D∪R infer all known interactions between HIV1 and human proteins mediated by linear motifs. We found that the majority of conserved linear motifs in the virus are located in disordered regions. We have developed a method for predicting protein-protein interactions mediated by linear motifs between HIV-1 and human proteins. The method only use protein sequences as inputs. We can extend the software developed to any other eukaryotic virus and host in order to find and rank candidate interactions. In future works we will use it to explore possible viral attack mechanisms based on linear motif mimicry.
Smoot, L M; Smoot, J C; Graham, M R; Somerville, G A; Sturdevant, D E; Migliaccio, C A; Sylva, G L; Musser, J M
2001-08-28
Pathogens are exposed to different temperatures during an infection cycle and must regulate gene expression accordingly. However, the extent to which virulent bacteria alter gene expression in response to temperatures encountered in the host is unknown. Group A Streptococcus (GAS) is a human-specific pathogen that is responsible for illnesses ranging from superficial skin infections and pharyngitis to severe invasive infections such as necrotizing fasciitis and streptococcal toxic shock syndrome. GAS survives and multiplies at different temperatures during human infection. DNA microarray analysis was used to investigate the influence of temperature on global gene expression in a serotype M1 strain grown to exponential phase at 29 degrees C and 37 degrees C. Approximately 9% of genes were differentially expressed by at least 1.5-fold at 29 degrees C relative to 37 degrees C, including genes encoding transporter proteins, proteins involved in iron homeostasis, transcriptional regulators, phage-associated proteins, and proteins with no known homologue. Relatively few known virulence genes were differentially expressed at this threshold. However, transcription of 28 genes encoding proteins with predicted secretion signal sequences was altered, indicating that growth temperature substantially influences the extracellular proteome. TaqMan real-time reverse transcription-PCR assays confirmed the microarray data. We also discovered that transcription of genes encoding hemolysins, and proteins with inferred roles in iron regulation, transport, and homeostasis, was influenced by growth at 40 degrees C. Thus, GAS profoundly alters gene expression in response to temperature. The data delineate the spectrum of temperature-regulated gene expression in an important human pathogen and provide many unforeseen lines of pathogenesis investigation.
2014-01-01
Background Non-small cell lung cancer (NSCLC) remains lethal despite the development of numerous drug therapy technologies. About 85% to 90% of lung cancers are NSCLC and the 5-year survival rate is at best still below 50%. Thus, it is important to find drugable target genes for NSCLC to develop an effective therapy for NSCLC. Results Integrated analysis of publically available gene expression and promoter methylation patterns of two highly aggressive NSCLC cell lines generated by in vivo selection was performed. We selected eleven critical genes that may mediate metastasis using recently proposed principal component analysis based unsupervised feature extraction. The eleven selected genes were significantly related to cancer diagnosis. The tertiary protein structure of the selected genes was inferred by Full Automatic Modeling System, a profile-based protein structure inference software, to determine protein functions and to specify genes that could be potential drug targets. Conclusions We identified eleven potentially critical genes that may mediate NSCLC metastasis using bioinformatic analysis of publically available data sets. These genes are potential target genes for the therapy of NSCLC. Among the eleven genes, TINAGL1 and B3GALNT1 are possible candidates for drug compounds that inhibit their gene expression. PMID:25521548
Turewicz, Michael; Kohl, Michael; Ahrens, Maike; Mayer, Gerhard; Uszkoreit, Julian; Naboulsi, Wael; Bracht, Thilo; Megger, Dominik A; Sitek, Barbara; Marcus, Katrin; Eisenacher, Martin
2017-11-10
The analysis of high-throughput mass spectrometry-based proteomics data must address the specific challenges of this technology. To this end, the comprehensive proteomics workflow offered by the de.NBI service center BioInfra.Prot provides indispensable components for the computational and statistical analysis of this kind of data. These components include tools and methods for spectrum identification and protein inference, protein quantification, expression analysis as well as data standardization and data publication. All particular methods of the workflow which address these tasks are state-of-the-art or cutting edge. As has been shown in previous publications, each of these methods is adequate to solve its specific task and gives competitive results. However, the methods included in the workflow are continuously reviewed, updated and improved to adapt to new scientific developments. All of these particular components and methods are available as stand-alone BioInfra.Prot services or as a complete workflow. Since BioInfra.Prot provides manifold fast communication channels to get access to all components of the workflow (e.g., via the BioInfra.Prot ticket system: bioinfraprot@rub.de) users can easily benefit from this service and get support by experts. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Audain, Enrique; Uszkoreit, Julian; Sachsenberg, Timo; Pfeuffer, Julianus; Liang, Xiao; Hermjakob, Henning; Sanchez, Aniel; Eisenacher, Martin; Reinert, Knut; Tabb, David L; Kohlbacher, Oliver; Perez-Riverol, Yasset
2017-01-06
In mass spectrometry-based shotgun proteomics, protein identifications are usually the desired result. However, most of the analytical methods are based on the identification of reliable peptides and not the direct identification of intact proteins. Thus, assembling peptides identified from tandem mass spectra into a list of proteins, referred to as protein inference, is a critical step in proteomics research. Currently, different protein inference algorithms and tools are available for the proteomics community. Here, we evaluated five software tools for protein inference (PIA, ProteinProphet, Fido, ProteinLP, MSBayesPro) using three popular database search engines: Mascot, X!Tandem, and MS-GF+. All the algorithms were evaluated using a highly customizable KNIME workflow using four different public datasets with varying complexities (different sample preparation, species and analytical instruments). We defined a set of quality control metrics to evaluate the performance of each combination of search engines, protein inference algorithm, and parameters on each dataset. We show that the results for complex samples vary not only regarding the actual numbers of reported protein groups but also concerning the actual composition of groups. Furthermore, the robustness of reported proteins when using databases of differing complexities is strongly dependant on the applied inference algorithm. Finally, merging the identifications of multiple search engines does not necessarily increase the number of reported proteins, but does increase the number of peptides per protein and thus can generally be recommended. Protein inference is one of the major challenges in MS-based proteomics nowadays. Currently, there are a vast number of protein inference algorithms and implementations available for the proteomics community. Protein assembly impacts in the final results of the research, the quantitation values and the final claims in the research manuscript. Even though protein inference is a crucial step in proteomics data analysis, a comprehensive evaluation of the many different inference methods has never been performed. Previously Journal of proteomics has published multiple studies about other benchmark of bioinformatics algorithms (PMID: 26585461; PMID: 22728601) in proteomics studies making clear the importance of those studies for the proteomics community and the journal audience. This manuscript presents a new bioinformatics solution based on the KNIME/OpenMS platform that aims at providing a fair comparison of protein inference algorithms (https://github.com/KNIME-OMICS). Six different algorithms - ProteinProphet, MSBayesPro, ProteinLP, Fido and PIA- were evaluated using the highly customizable workflow on four public datasets with varying complexities. Five popular database search engines Mascot, X!Tandem, MS-GF+ and combinations thereof were evaluated for every protein inference tool. In total >186 proteins lists were analyzed and carefully compare using three metrics for quality assessments of the protein inference results: 1) the numbers of reported proteins, 2) peptides per protein, and the 3) number of uniquely reported proteins per inference method, to address the quality of each inference method. We also examined how many proteins were reported by choosing each combination of search engines, protein inference algorithms and parameters on each dataset. The results show that using 1) PIA or Fido seems to be a good choice when studying the results of the analyzed workflow, regarding not only the reported proteins and the high-quality identifications, but also the required runtime. 2) Merging the identifications of multiple search engines gives almost always more confident results and increases the number of peptides per protein group. 3) The usage of databases containing not only the canonical, but also known isoforms of proteins has a small impact on the number of reported proteins. The detection of specific isoforms could, concerning the question behind the study, compensate for slightly shorter reports using the parsimonious reports. 4) The current workflow can be easily extended to support new algorithms and search engine combinations. Copyright © 2016. Published by Elsevier B.V.
Erdem, Cemal; Nagle, Alison M; Casa, Angelo J; Litzenburger, Beate C; Wang, Yu-Fen; Taylor, D Lansing; Lee, Adrian V; Lezon, Timothy R
2016-09-01
Insulin and insulin-like growth factor I (IGF1) influence cancer risk and progression through poorly understood mechanisms. To better understand the roles of insulin and IGF1 signaling in breast cancer, we combined proteomic screening with computational network inference to uncover differences in IGF1 and insulin induced signaling. Using reverse phase protein array, we measured the levels of 134 proteins in 21 breast cancer cell lines stimulated with IGF1 or insulin for up to 48 h. We then constructed directed protein expression networks using three separate methods: (i) lasso regression, (ii) conventional matrix inversion, and (iii) entropy maximization. These networks, named here as the time translation models, were analyzed and the inferred interactions were ranked by differential magnitude to identify pathway differences. The two top candidates, chosen for experimental validation, were shown to regulate IGF1/insulin induced phosphorylation events. First, acetyl-CoA carboxylase (ACC) knock-down was shown to increase the level of mitogen-activated protein kinase (MAPK) phosphorylation. Second, stable knock-down of E-Cadherin increased the phospho-Akt protein levels. Both of the knock-down perturbations incurred phosphorylation responses stronger in IGF1 stimulated cells compared with insulin. Overall, the time-translation modeling coupled to wet-lab experiments has proven to be powerful in inferring differential interactions downstream of IGF1 and insulin signaling, in vitro. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.
Integrated Module and Gene-Specific Regulatory Inference Implicates Upstream Signaling Networks
Roy, Sushmita; Lagree, Stephen; Hou, Zhonggang; Thomson, James A.; Stewart, Ron; Gasch, Audrey P.
2013-01-01
Regulatory networks that control gene expression are important in diverse biological contexts including stress response and development. Each gene's regulatory program is determined by module-level regulation (e.g. co-regulation via the same signaling system), as well as gene-specific determinants that can fine-tune expression. We present a novel approach, Modular regulatory network learning with per gene information (MERLIN), that infers regulatory programs for individual genes while probabilistically constraining these programs to reveal module-level organization of regulatory networks. Using edge-, regulator- and module-based comparisons of simulated networks of known ground truth, we find MERLIN reconstructs regulatory programs of individual genes as well or better than existing approaches of network reconstruction, while additionally identifying modular organization of the regulatory networks. We use MERLIN to dissect global transcriptional behavior in two biological contexts: yeast stress response and human embryonic stem cell differentiation. Regulatory modules inferred by MERLIN capture co-regulatory relationships between signaling proteins and downstream transcription factors thereby revealing the upstream signaling systems controlling transcriptional responses. The inferred networks are enriched for regulators with genetic or physical interactions, supporting the inference, and identify modules of functionally related genes bound by the same transcriptional regulators. Our method combines the strengths of per-gene and per-module methods to reveal new insights into transcriptional regulation in stress and development. PMID:24146602
Chim, Nicholas; Riley, Robert; The, Juliana; Im, Soyeon; Segelke, Brent; Lekin, Tim; Yu, Minmin; Hung, Li Wei; Terwilliger, Tom; Whitelegge, Julian P.; Goulding, Celia W.
2010-01-01
Disulfide bond forming (Dsb) proteins ensure correct folding and disulfide bond formation of secreted proteins. Previously, we showed that Mycobacterium tuberculosis DsbE (Mtb DsbE, Rv2878c) aids in vitro oxidative folding of proteins. Here we present structural, biochemical and gene expression analyses of another putative Mtb secreted disulfide bond isomerase protein homologous to Mtb DsbE, Mtb DsbF (Rv1677). The X-ray crystal structure of Mtb DsbF reveals a conserved thioredoxin fold although the active-site cysteines may be modeled in both oxidized and reduced forms, in contrast to the solely reduced form in Mtb DsbE. Furthermore, the shorter loop region in Mtb DsbF results in a more solvent-exposed active site. Biochemical analyses show that, similar to Mtb DsbE, Mtb DsbF can oxidatively refold reduced, unfolded hirudin and has a comparable pKa for the active-site solvent-exposed cysteine. However, contrary to Mtb DsbE, the Mtb DsbF redox potential is more oxidizing and its reduced state is more stable. From computational genomics analysis of the M. tuberculosis genome, we identified a potential Mtb DsbF interaction partner, Rv1676, a predicted peroxiredoxin. Complex formation is supported by protein co-expression studies and inferred by gene expression profiles, whereby Mtb DsbF and Rv1676 are upregulated under similar environments. Additionally, comparison of Mtb DsbF and Mtb DsbE gene expression data indicate anticorrelated gene expression patterns, suggesting that these two proteins and their functionally linked partners constitute analogous pathways that may function under different conditions. PMID:20060836
Yang, Lingjian; Ainali, Chrysanthi; Tsoka, Sophia; Papageorgiou, Lazaros G
2014-12-05
Applying machine learning methods on microarray gene expression profiles for disease classification problems is a popular method to derive biomarkers, i.e. sets of genes that can predict disease state or outcome. Traditional approaches where expression of genes were treated independently suffer from low prediction accuracy and difficulty of biological interpretation. Current research efforts focus on integrating information on protein interactions through biochemical pathway datasets with expression profiles to propose pathway-based classifiers that can enhance disease diagnosis and prognosis. As most of the pathway activity inference methods in literature are either unsupervised or applied on two-class datasets, there is good scope to address such limitations by proposing novel methodologies. A supervised multiclass pathway activity inference method using optimisation techniques is reported. For each pathway expression dataset, patterns of its constituent genes are summarised into one composite feature, termed pathway activity, and a novel mathematical programming model is proposed to infer this feature as a weighted linear summation of expression of its constituent genes. Gene weights are determined by the optimisation model, in a way that the resulting pathway activity has the optimal discriminative power with regards to disease phenotypes. Classification is then performed on the resulting low-dimensional pathway activity profile. The model was evaluated through a variety of published gene expression profiles that cover different types of disease. We show that not only does it improve classification accuracy, but it can also perform well in multiclass disease datasets, a limitation of other approaches from the literature. Desirable features of the model include the ability to control the maximum number of genes that may participate in determining pathway activity, which may be pre-specified by the user. Overall, this work highlights the potential of building pathway-based multi-phenotype classifiers for accurate disease diagnosis and prognosis problems.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mangelsen, Elke; Kilian, Joachim; Berendzen, Kenneth W.
2008-02-01
WRKY proteins belong to the WRKY-GCM1 superfamily of zinc finger transcription factors that have been subject to a large plant-specific diversification. For the cereal crop barley (Hordeum vulgare), three different WRKY proteins have been characterized so far, as regulators in sucrose signaling, in pathogen defense, and in response to cold and drought, respectively. However, their phylogenetic relationship remained unresolved. In this study, we used the available sequence information to identify a minimum number of 45 barley WRKY transcription factor (HvWRKY) genes. According to their structural features the HvWRKY factors were classified into the previously defined polyphyletic WRKY subgroups 1 tomore » 3. Furthermore, we could assign putative orthologs of the HvWRKY proteins in Arabidopsis and rice. While in most cases clades of orthologous proteins were formed within each group or subgroup, other clades were composed of paralogous proteins for the grasses and Arabidopsis only, which is indicative of specific gene radiation events. To gain insight into their putative functions, we examined expression profiles of WRKY genes from publicly available microarray data resources and found group specific expression patterns. While putative orthologs of the HvWRKY transcription factors have been inferred from phylogenetic sequence analysis, we performed a comparative expression analysis of WRKY genes in Arabidopsis and barley. Indeed, highly correlative expression profiles were found between some of the putative orthologs. HvWRKY genes have not only undergone radiation in monocot or dicot species, but exhibit evolutionary traits specific to grasses. HvWRKY proteins exhibited not only sequence similarities between orthologs with Arabidopsis, but also relatedness in their expression patterns. This correlative expression is indicative for a putative conserved function of related WRKY proteins in mono- and dicot species.« less
Reconstructing Dynamic Promoter Activity Profiles from Reporter Gene Data.
Kannan, Soumya; Sams, Thomas; Maury, Jérôme; Workman, Christopher T
2018-03-16
Accurate characterization of promoter activity is important when designing expression systems for systems biology and metabolic engineering applications. Promoters that respond to changes in the environment enable the dynamic control of gene expression without the necessity of inducer compounds, for example. However, the dynamic nature of these processes poses challenges for estimating promoter activity. Most experimental approaches utilize reporter gene expression to estimate promoter activity. Typically the reporter gene encodes a fluorescent protein that is used to infer a constant promoter activity despite the fact that the observed output may be dynamic and is a number of steps away from the transcription process. In fact, some promoters that are often thought of as constitutive can show changes in activity when growth conditions change. For these reasons, we have developed a system of ordinary differential equations for estimating dynamic promoter activity for promoters that change their activity in response to the environment that is robust to noise and changes in growth rate. Our approach, inference of dynamic promoter activity (PromAct), improves on existing methods by more accurately inferring known promoter activity profiles. This method is also capable of estimating the correct scale of promoter activity and can be applied to quantitative data sets to estimate quantitative rates.
An integrative approach to inferring biologically meaningful gene modules
2011-01-01
Background The ability to construct biologically meaningful gene networks and modules is critical for contemporary systems biology. Though recent studies have demonstrated the power of using gene modules to shed light on the functioning of complex biological systems, most modules in these networks have shown little association with meaningful biological function. We have devised a method which directly incorporates gene ontology (GO) annotation in construction of gene modules in order to gain better functional association. Results We have devised a method, Semantic Similarity-Integrated approach for Modularization (SSIM) that integrates various gene-gene pairwise similarity values, including information obtained from gene expression, protein-protein interactions and GO annotations, in the construction of modules using affinity propagation clustering. We demonstrated the performance of the proposed method using data from two complex biological responses: 1. the osmotic shock response in Saccharomyces cerevisiae, and 2. the prion-induced pathogenic mouse model. In comparison with two previously reported algorithms, modules identified by SSIM showed significantly stronger association with biological functions. Conclusions The incorporation of semantic similarity based on GO annotation with gene expression and protein-protein interaction data can greatly enhance the functional relevance of inferred gene modules. In addition, the SSIM approach can also reveal the hierarchical structure of gene modules to gain a broader functional view of the biological system. Hence, the proposed method can facilitate comprehensive and in-depth analysis of high throughput experimental data at the gene network level. PMID:21791051
Protein and gene model inference based on statistical modeling in k-partite graphs.
Gerster, Sarah; Qeli, Ermir; Ahrens, Christian H; Bühlmann, Peter
2010-07-06
One of the major goals of proteomics is the comprehensive and accurate description of a proteome. Shotgun proteomics, the method of choice for the analysis of complex protein mixtures, requires that experimentally observed peptides are mapped back to the proteins they were derived from. This process is also known as protein inference. We present Markovian Inference of Proteins and Gene Models (MIPGEM), a statistical model based on clearly stated assumptions to address the problem of protein and gene model inference for shotgun proteomics data. In particular, we are dealing with dependencies among peptides and proteins using a Markovian assumption on k-partite graphs. We are also addressing the problems of shared peptides and ambiguous proteins by scoring the encoding gene models. Empirical results on two control datasets with synthetic mixtures of proteins and on complex protein samples of Saccharomyces cerevisiae, Drosophila melanogaster, and Arabidopsis thaliana suggest that the results with MIPGEM are competitive with existing tools for protein inference.
Berger, Stephanie; Procko, Erik; Margineantu, Daciana; Lee, Erinna F; Shen, Betty W; Zelter, Alex; Silva, Daniel-Adriano; Chawla, Kusum; Herold, Marco J; Garnier, Jean-Marc; Johnson, Richard; MacCoss, Michael J; Lessene, Guillaume; Davis, Trisha N; Stayton, Patrick S; Stoddard, Barry L; Fairlie, W Douglas; Hockenbery, David M; Baker, David
2016-11-02
Many cancers overexpress one or more of the six human pro-survival BCL2 family proteins to evade apoptosis. To determine which BCL2 protein or proteins block apoptosis in different cancers, we computationally designed three-helix bundle protein inhibitors specific for each BCL2 pro-survival protein. Following in vitro optimization, each inhibitor binds its target with high picomolar to low nanomolar affinity and at least 300-fold specificity. Expression of the designed inhibitors in human cancer cell lines revealed unique dependencies on BCL2 proteins for survival which could not be inferred from other BCL2 profiling methods. Our results show that designed inhibitors can be generated for each member of a closely-knit protein family to probe the importance of specific protein-protein interactions in complex biological processes.
Chang, Chiou Ling; Geib, Scott; Cho, Il Kyu; Li, Qing X; Stanley, David
2014-08-01
Lufenuron (LFN), a chitin synthase inhibitor, impacts the fertility of Ceratitis capitata, Bactrocera dorsalis, B. cucurbitae, and B. latifrons. We posed the hypothesis that LFN curtails egg hatch in the solanaceous fruit fly, B. latifrons. In this study, newly emerged virgin adults were sexed and fed for 12 days with varying concentrations of LFN-laced agar diets until sexual maturation. Eggs were collected from 12-d-old adults and the egg hatch was assessed. Egg hatch decreased in adults reared on LFN-treated diets. LFN-treated media did not influence fertility after one gender was reared on experimental and the other on control media before mating. Exposure to LFN-treated medium after mating led to reduced egg hatch. We infer that LFN is not a permanent sterilant, and reduced egg hatch depends on continuous exposure to dietary LFN after mating. Proteomic analysis identified two differentially expressed proteins, a pheromone binding protein and a chitin binding protein, between adults maintained on LFN-treated and control diets. Expression of two genes encoding chitin synthase 2, and chitin binding protein, was altered in adults exposed to dietary LFN. LFN treatments also led to increased expression of two odorant binding proteins one in females and one in males. We surmise these data support our hypothesis and provide insight into LFN actions. © 2014 Wiley Periodicals, Inc.
Huang, Jinyu; Jiao, Jinzhen; Tan, Zhi-Liang; He, Zhixiong; Beauchemin, Karen A; Forster, Robert; Han, Xue-Feng; Tang, Shao-Xun; Kang, Jinghe; Zhou, Chuanshe
2016-09-14
Thirty-six Xiangdong black goats were used to investigate age-related mRNA and protein expression levels of some genes related to skeletal muscle structural proteins, MRFs and MEF2 family, and skeletal muscle fiber type and composition during skeletal muscle growth under grazing (G) and barn-fed (BF) feeding systems. Goats were slaughtered at six time points selected to reflect developmental changes of skeletal muscle during nonrumination (days 0, 7, and 14), transition (day 42), and rumination phases (days 56 and 70). It was observed that the number of type IIx in the longissimus dorsi was increased quickly while numbers of type IIa and IIb decreased slightly, indicating that these genes were coordinated during the rapid growth and development stages of skeletal muscle. No gene expression was affected (P > 0.05) by feeding system except Myf5 and Myf6. Protein expressions of MYOZ3 and MEF2C were affected (P < 0.05) by age, whereas PGC-1α was linearly decreased in the G group, and only MYOZ3 protein was affected (P < 0.001) by feeding system. Moreover, it was found that PGC-1α and MEF2C proteins may interact with each other in promoting muscle growth. The current results indicate that (1) skeletal muscle growth during days 0-70 after birth is mainly myofiber hypertrophy and differentiation, (2) weaning affects the expression of relevant genes of skeletal muscle structural proteins, skeletal muscle growth, and skeletal muscle fiber type and composition, and (3) nutrition or feeding regimen mainly influences the expression of skeletal muscle growth genes.
Anderson, Richard J; Guru, Siradanahalli; Weeratna, Risini; Makinen, Shawn; Falconer, Derek J; Sheppard, Neil C; Lang, Susanne; Chang, Bingsheng; Goenaga, Anne-Laure; Green, Bruce A; Merson, James R; Gracheck, Stephen J; Eyles, Jim E
2016-12-07
We evaluated 52 different E. coli expressed pneumococcal proteins as immunogens in a BALB/c mouse model of S. pneumoniae lung infection. Proteins were selected based on genetic conservation across disease-causing serotypes and bioinformatic prediction of antibody binding to the target antigen. Seven proteins induced protective responses, in terms of reduced lung burdens of the serotype 3 pneumococci. Three of the protective proteins were histidine triad protein family members (PhtB, PhtD and PhtE). Four other proteins, all bearing LPXTG linkage domains, also had activity in this model (PrtA, NanA, PavB and Eng). PrtA, NanA and Eng were also protective in a CBA/N mouse model of lethal pneumococcal infection. Despite data inferring widespread genomic conservation, flow-cytometer based antisera binding studies confirmed variable levels of antigen expression across a panel of pneumococcal serotypes. Finally, BALB/c mice were immunized and intranasally challenged with a viulent serotype 8 strain, to help understand the breadth of protection. Those mouse studies reaffirmed the effectiveness of the histidine triad protein grouping and a single LPXTG protein, PrtA. Copyright © 2016 Elsevier Ltd. All rights reserved.
BagReg: Protein inference through machine learning.
Zhao, Can; Liu, Dao; Teng, Ben; He, Zengyou
2015-08-01
Protein inference from the identified peptides is of primary importance in the shotgun proteomics. The target of protein inference is to identify whether each candidate protein is truly present in the sample. To date, many computational methods have been proposed to solve this problem. However, there is still no method that can fully utilize the information hidden in the input data. In this article, we propose a learning-based method named BagReg for protein inference. The method firstly artificially extracts five features from the input data, and then chooses each feature as the class feature to separately build models to predict the presence probabilities of proteins. Finally, the weak results from five prediction models are aggregated to obtain the final result. We test our method on six public available data sets. The experimental results show that our method is superior to the state-of-the-art protein inference algorithms. Copyright © 2015 Elsevier Ltd. All rights reserved.
Senthil Kumar, S; Muthuselvam, P; Pugalenthi, V; Subramanian, N; Ramkumar, K M; Suresh, T; Suzuki, T; Rajaguru, P
2018-08-01
Toxicoproteomic analysis of steel industry ambient particulate matter (PM) that contain high concentrations of PAHs and metals was done by treating human lung cancer cell-line, A549 and the cell lysates were analysed using quantitative label-free nano LC-MS/MS. A total of 18,562 peptides representing 1576 proteins were identified and quantified, with 196 proteins had significantly altered expression in the treated cells. Enrichment analyses revealed that proteins associated to redox homeostsis, metabolism, and cellular energy generation were inhibited while, proteins related to DNA damage and repair and other stresses were over expressed. Altered activities of several tumor associated proteins were observed. Protein-protein interaction network and biological pathway analysis of these differentially expressed proteins were carried out to obtain a systems level view of proteome changes. Together it could be inferred that PM exposure induced oxidative stress which could have lead into DNA damage and tumor related changes. However, lowering of cellular metabolism, and energy production could reduce its ability to overcome these stress. This kind of disequilibrium between the DNA damage and ability of the cells to repair the DNA damage may lead into genomic instability that is capable of acting as the driving force during PM induced carcinogenesis. Copyright © 2018 Elsevier Ltd. All rights reserved.
Berger, Stephanie; Procko, Erik; Margineantu, Daciana; Lee, Erinna F; Shen, Betty W; Zelter, Alex; Silva, Daniel-Adriano; Chawla, Kusum; Herold, Marco J; Garnier, Jean-Marc; Johnson, Richard; MacCoss, Michael J; Lessene, Guillaume; Davis, Trisha N; Stayton, Patrick S; Stoddard, Barry L; Fairlie, W Douglas; Hockenbery, David M; Baker, David
2016-01-01
Many cancers overexpress one or more of the six human pro-survival BCL2 family proteins to evade apoptosis. To determine which BCL2 protein or proteins block apoptosis in different cancers, we computationally designed three-helix bundle protein inhibitors specific for each BCL2 pro-survival protein. Following in vitro optimization, each inhibitor binds its target with high picomolar to low nanomolar affinity and at least 300-fold specificity. Expression of the designed inhibitors in human cancer cell lines revealed unique dependencies on BCL2 proteins for survival which could not be inferred from other BCL2 profiling methods. Our results show that designed inhibitors can be generated for each member of a closely-knit protein family to probe the importance of specific protein-protein interactions in complex biological processes. DOI: http://dx.doi.org/10.7554/eLife.20352.001 PMID:27805565
Backward-stochastic-differential-equation approach to modeling of gene expression
NASA Astrophysics Data System (ADS)
Shamarova, Evelina; Chertovskih, Roman; Ramos, Alexandre F.; Aguiar, Paulo
2017-03-01
In this article, we introduce a backward method to model stochastic gene expression and protein-level dynamics. The protein amount is regarded as a diffusion process and is described by a backward stochastic differential equation (BSDE). Unlike many other SDE techniques proposed in the literature, the BSDE method is backward in time; that is, instead of initial conditions it requires the specification of end-point ("final") conditions, in addition to the model parametrization. To validate our approach we employ Gillespie's stochastic simulation algorithm (SSA) to generate (forward) benchmark data, according to predefined gene network models. Numerical simulations show that the BSDE method is able to correctly infer the protein-level distributions that preceded a known final condition, obtained originally from the forward SSA. This makes the BSDE method a powerful systems biology tool for time-reversed simulations, allowing, for example, the assessment of the biological conditions (e.g., protein concentrations) that preceded an experimentally measured event of interest (e.g., mitosis, apoptosis, etc.).
Backward-stochastic-differential-equation approach to modeling of gene expression.
Shamarova, Evelina; Chertovskih, Roman; Ramos, Alexandre F; Aguiar, Paulo
2017-03-01
In this article, we introduce a backward method to model stochastic gene expression and protein-level dynamics. The protein amount is regarded as a diffusion process and is described by a backward stochastic differential equation (BSDE). Unlike many other SDE techniques proposed in the literature, the BSDE method is backward in time; that is, instead of initial conditions it requires the specification of end-point ("final") conditions, in addition to the model parametrization. To validate our approach we employ Gillespie's stochastic simulation algorithm (SSA) to generate (forward) benchmark data, according to predefined gene network models. Numerical simulations show that the BSDE method is able to correctly infer the protein-level distributions that preceded a known final condition, obtained originally from the forward SSA. This makes the BSDE method a powerful systems biology tool for time-reversed simulations, allowing, for example, the assessment of the biological conditions (e.g., protein concentrations) that preceded an experimentally measured event of interest (e.g., mitosis, apoptosis, etc.).
Serang, Oliver; Noble, William Stafford
2012-01-01
The problem of identifying the proteins in a complex mixture using tandem mass spectrometry can be framed as an inference problem on a graph that connects peptides to proteins. Several existing protein identification methods make use of statistical inference methods for graphical models, including expectation maximization, Markov chain Monte Carlo, and full marginalization coupled with approximation heuristics. We show that, for this problem, the majority of the cost of inference usually comes from a few highly connected subgraphs. Furthermore, we evaluate three different statistical inference methods using a common graphical model, and we demonstrate that junction tree inference substantially improves rates of convergence compared to existing methods. The python code used for this paper is available at http://noble.gs.washington.edu/proj/fido. PMID:22331862
Extraction of intracellular protein from Glaciozyma antarctica for proteomics analysis
NASA Astrophysics Data System (ADS)
Faizura, S. Nor; Farahayu, K.; Faizal, A. B. Mohd; Asmahani, A. A. S.; Amir, R.; Nazalan, N.; Diba, A. B. Farah; Muhammad, M. Nor; Munir, A. M. Abdul
2013-11-01
Two preparation methods of crude extracts of psychrophilic yeast Glaciozyma antarctica were compared in order to obtain a good recovery of intracellular proteins. Extraction with mechanical procedures using sonication was found to be more effective for obtaining good yield compare to alkaline treatment method. The procedure is simple, rapid, and produce better yield. A total of 52 proteins were identified by combining both extraction methods. Most of the proteins identified in this study involves in the metabolic process including glycolysis pathway, pentose phosphate pathway, pyruyate decarboxylation and also urea cyle. Several chaperons were identified including probable cpr1-cyclophilin (peptidylprolyl isomerase), macrolide-binding protein fkbp12 and heat shock proteins which were postulate to accelerate proper protein folding. Characteristic of the fundamental cellular processes inferred from the expressed-proteome highlight the evolutionary and functional complexity existing in this domain of life.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wilkins, Michael J.; Wrighton, Kelly C.; Nicora, Carrie D.
2013-03-05
While microbial activities in environmental systems play a key role in the utilization and cycling of essential elements and compounds, microbial activity and growth frequently fluctuates in response to environmental stimuli and perturbations. To investigate these fluctuations within a saturated aquifer system, we monitored a carbon-stimulated in situ Geobacter population while iron reduction was occurring, using 16S rRNA abundances and high-resolution tandem mass spectrometry proteome measurements. Following carbon amendment, 16S rRNA analysis of temporally separated samples revealed the rapid enrichment of Geobacter-like environmental strains with strong similarity to G. bemidjiensis. Tandem mass spectrometry proteomics measurements suggest high carbon flux throughmore » Geobacter respiratory pathways, and the synthesis of anapleurotic four carbon compounds from acetyl-CoA via pyruvate ferredoxin oxidoreductase activity. Across a 40-day period where Fe(III) reduction was occurring, fluctuations in protein expression reflected changes in anabolic versus catabolic reactions, with increased levels of biosynthesis occurring soon after acetate arrival in the aquifer. In addition, localized shifts in nutrient limitation were inferred based on expression of nitrogenase enzymes and phosphate uptake proteins. These temporal data offer the first example of differing microbial protein expression associated with changing geochemical conditions in a subsurface environment.« less
Boulila, Moncef; Ben Tiba, Sawssen; Jilani, Saoussen
2013-04-01
The sequence alignments of five Tunisian isolates of Prunus necrotic ringspot virus (PNRSV) were searched for evidence of recombination and diversifying selection. Since failing to account for recombination can elevate the false positive error rate in positive selection inference, a genetic algorithm (GARD) was used first and led to the detection of potential recombination events in the coat protein-encoding gene of that virus. The Recco algorithm confirmed these results by identifying, additionally, the potential recombinants. For neutrality testing and evaluation of nucleotide polymorphism in PNRSV CP gene, Tajima's D, and Fu and Li's D and F statistical tests were used. About selection inference, eight algorithms (SLAC, FEL, IFEL, REL, FUBAR, MEME, PARRIS, and GA branch) incorporated in HyPhy package were utilized to assess the selection pressure exerted on the expression of PNRSV capsid. Inferred phylogenies pointed out, in addition to the three classical groups (PE-5, PV-32, and PV-96), the delineation of a fourth cluster having the new proposed designation SW6, and a fifth clade comprising four Tunisian PNRSV isolates which underwent recombination and selective pressure and to which the name Tunisian outgroup was allocated.
Predicting the binding preference of transcription factors to individual DNA k-mers.
Alleyne, Trevis M; Peña-Castillo, Lourdes; Badis, Gwenael; Talukder, Shaheynoor; Berger, Michael F; Gehrke, Andrew R; Philippakis, Anthony A; Bulyk, Martha L; Morris, Quaid D; Hughes, Timothy R
2009-04-15
Recognition of specific DNA sequences is a central mechanism by which transcription factors (TFs) control gene expression. Many TF-binding preferences, however, are unknown or poorly characterized, in part due to the difficulty associated with determining their specificity experimentally, and an incomplete understanding of the mechanisms governing sequence specificity. New techniques that estimate the affinity of TFs to all possible k-mers provide a new opportunity to study DNA-protein interaction mechanisms, and may facilitate inference of binding preferences for members of a given TF family when such information is available for other family members. We employed a new dataset consisting of the relative preferences of mouse homeodomains for all eight-base DNA sequences in order to ask how well we can predict the binding profiles of homeodomains when only their protein sequences are given. We evaluated a panel of standard statistical inference techniques, as well as variations of the protein features considered. Nearest neighbour among functionally important residues emerged among the most effective methods. Our results underscore the complexity of TF-DNA recognition, and suggest a rational approach for future analyses of TF families.
VisANT 3.0: new modules for pathway visualization, editing, prediction and construction.
Hu, Zhenjun; Ng, David M; Yamada, Takuji; Chen, Chunnuan; Kawashima, Shuichi; Mellor, Joe; Linghu, Bolan; Kanehisa, Minoru; Stuart, Joshua M; DeLisi, Charles
2007-07-01
With the integration of the KEGG and Predictome databases as well as two search engines for coexpressed genes/proteins using data sets obtained from the Stanford Microarray Database (SMD) and Gene Expression Omnibus (GEO) database, VisANT 3.0 supports exploratory pathway analysis, which includes multi-scale visualization of multiple pathways, editing and annotating pathways using a KEGG compatible visual notation and visualization of expression data in the context of pathways. Expression levels are represented either by color intensity or by nodes with an embedded expression profile. Multiple experiments can be navigated or animated. Known KEGG pathways can be enriched by querying either coexpressed components of known pathway members or proteins with known physical interactions. Predicted pathways for genes/proteins with unknown functions can be inferred from coexpression or physical interaction data. Pathways produced in VisANT can be saved as computer-readable XML format (VisML), graphic images or high-resolution Scalable Vector Graphics (SVG). Pathways in the format of VisML can be securely shared within an interested group or published online using a simple Web link. VisANT is freely available at http://visant.bu.edu.
Computational Prediction and Validation of BAHD1 as a Novel Molecule for Ulcerative Colitis
NASA Astrophysics Data System (ADS)
Zhu, Huatuo; Wan, Xingyong; Li, Jing; Han, Lu; Bo, Xiaochen; Chen, Wenguo; Lu, Chao; Shen, Zhe; Xu, Chenfu; Chen, Lihua; Yu, Chaohui; Xu, Guoqiang
2015-07-01
Ulcerative colitis (UC) is a common inflammatory bowel disease (IBD) producing intestinal inflammation and tissue damage. The precise aetiology of UC remains unknown. In this study, we applied a rank-based expression profile comparative algorithm, gene set enrichment analysis (GSEA), to evaluate the expression profiles of UC patients and small interfering RNA (siRNA)-perturbed cells to predict proteins that might be essential in UC from publicly available expression profiles. We used quantitative PCR (qPCR) to characterize the expression levels of those genes predicted to be the most important for UC in dextran sodium sulphate (DSS)-induced colitic mice. We found that bromo-adjacent homology domain (BAHD1), a novel heterochromatinization factor in vertebrates, was the most downregulated gene. We further validated a potential role of BAHD1 as a regulatory factor for inflammation through the TNF signalling pathway in vitro. Our findings indicate that computational approaches leveraging public gene expression data can be used to infer potential genes or proteins for diseases, and BAHD1 might act as an indispensable factor in regulating the cellular inflammatory response in UC.
Protein Inference from the Integration of Tandem MS Data and Interactome Networks.
Zhong, Jiancheng; Wang, Jianxing; Ding, Xiaojun; Zhang, Zhen; Li, Min; Wu, Fang-Xiang; Pan, Yi
2017-01-01
Since proteins are digested into a mixture of peptides in the preprocessing step of tandem mass spectrometry (MS), it is difficult to determine which specific protein a shared peptide belongs to. In recent studies, besides tandem MS data and peptide identification information, some other information is exploited to infer proteins. Different from the methods which first use only tandem MS data to infer proteins and then use network information to refine them, this study proposes a protein inference method named TMSIN, which uses interactome networks directly. As two interacting proteins should co-exist, it is reasonable to assume that if one of the interacting proteins is confidently inferred in a sample, its interacting partners should have a high probability in the same sample, too. Therefore, we can use the neighborhood information of a protein in an interactome network to adjust the probability that the shared peptide belongs to the protein. In TMSIN, a multi-weighted graph is constructed by incorporating the bipartite graph with interactome network information, where the bipartite graph is built with the peptide identification information. Based on multi-weighted graphs, TMSIN adopts an iterative workflow to infer proteins. At each iterative step, the probability that a shared peptide belongs to a specific protein is calculated by using the Bayes' law based on the neighbor protein support scores of each protein which are mapped by the shared peptides. We carried out experiments on yeast data and human data to evaluate the performance of TMSIN in terms of ROC, q-value, and accuracy. The experimental results show that AUC scores yielded by TMSIN are 0.742 and 0.874 in yeast dataset and human dataset, respectively, and TMSIN yields the maximum number of true positives when q-value less than or equal to 0.05. The overlap analysis shows that TMSIN is an effective complementary approach for protein inference.
Wang, Lin; Meng, Jie; Cao, Weipeng; Li, Qizhai; Qiu, Yuqing; Sun, Baoyun; Li, Lei M
2014-06-01
The nanoparticle gadolinium endohedral metallofullerenol [Gd@C82(OH)22]n is a new candidate for cancer treatment with low toxicity. However, its anti-cancer mechanisms remain mostly unknown. In this study, we took a systems biology view of the gene expression profiles of human breast cancer cells (MCF-7) and human umbilical vein endothelial cells (ECV304) treated with and without [Gd@C82(OH)22]n, respectively, measured by the Agilent Gene Chip G4112F. To properly analyze these data, we modified a suit of statistical methods we developed. For the first time we applied the sub-sub normalization to Agilent two-color microarrays. Instead of a simple linear regression, we proposed to use a one-knot SPLINE model in the sub-sub normalization to account for nonlinear spatial effects. The parameters estimated by least trimmed squares- and S-estimators show similar normalization results. We made several kinds of inferences by integrating the expression profiles with the bioinformatic knowledge in KEGG pathways, Gene Ontology, JASPAR, and TRANSFAC. In the transcriptional inference, we proposed the BASE2.0 method to infer a transcription factor's up-regulation and down-regulation activities separately. Overall, [Gd@C82(OH)22]n induces more differentiation in MCF-7 cells than in ECV304 cells, particularly in the reduction of protein processing such as protein glucosylation, folding, targeting, exporting, and transporting. Among the KEGG pathways, the ErbB signaling pathway is up-regulated, whereas protein processing in endoplasmic reticulum (ER) is down-regulated. CHOP, a key pro-apoptotic gene downstream of the ER stress pathway, increases to nine folds in MCF-7 cells after treatment. These findings indicate that ER stress may be one important factor that induces apoptosis in MCF-7 cells after [Gd@C82(OH)22]n treatment. The expression profiles of genes associated with ER stress and apoptosis are statistically consistent with other profiles reported in the literature, such as those of HEK293T and MCF-7 cells induced by the miR-23a∼27a∼24-2 cluster. Furthermore, one of the inferred regulatory mechanisms comprises the apoptosis network centered around TP53, whose effective regulation of apoptosis is somehow reestablished after [Gd@C82(OH)22]n treatment. These results elucidate the application and development of [Gd@C82(OH)22]n and other fullerene derivates. Copyright © 2014 Elsevier Inc. All rights reserved.
Inferring gene and protein interactions using PubMed citations and consensus Bayesian networks.
Deeter, Anthony; Dalman, Mark; Haddad, Joseph; Duan, Zhong-Hui
2017-01-01
The PubMed database offers an extensive set of publication data that can be useful, yet inherently complex to use without automated computational techniques. Data repositories such as the Genomic Data Commons (GDC) and the Gene Expression Omnibus (GEO) offer experimental data storage and retrieval as well as curated gene expression profiles. Genetic interaction databases, including Reactome and Ingenuity Pathway Analysis, offer pathway and experiment data analysis using data curated from these publications and data repositories. We have created a method to generate and analyze consensus networks, inferring potential gene interactions, using large numbers of Bayesian networks generated by data mining publications in the PubMed database. Through the concept of network resolution, these consensus networks can be tailored to represent possible genetic interactions. We designed a set of experiments to confirm that our method is stable across variation in both sample and topological input sizes. Using gene product interactions from the KEGG pathway database and data mining PubMed publication abstracts, we verify that regardless of the network resolution or the inferred consensus network, our method is capable of inferring meaningful gene interactions through consensus Bayesian network generation with multiple, randomized topological orderings. Our method can not only confirm the existence of currently accepted interactions, but has the potential to hypothesize new ones as well. We show our method confirms the existence of known gene interactions such as JAK-STAT-PI3K-AKT-mTOR, infers novel gene interactions such as RAS- Bcl-2 and RAS-AKT, and found significant pathway-pathway interactions between the JAK-STAT signaling and Cardiac Muscle Contraction KEGG pathways.
Koda, Satoru; Onda, Yoshihiko; Matsui, Hidetoshi; Takahagi, Kotaro; Yamaguchi-Uehara, Yukiko; Shimizu, Minami; Inoue, Komaki; Yoshida, Takuhiro; Sakurai, Tetsuya; Honda, Hiroshi; Eguchi, Shinto; Nishii, Ryuei; Mochida, Keiichi
2017-01-01
We report the comprehensive identification of periodic genes and their network inference, based on a gene co-expression analysis and an Auto-Regressive eXogenous (ARX) model with a group smoothly clipped absolute deviation (SCAD) method using a time-series transcriptome dataset in a model grass, Brachypodium distachyon . To reveal the diurnal changes in the transcriptome in B. distachyon , we performed RNA-seq analysis of its leaves sampled through a diurnal cycle of over 48 h at 4 h intervals using three biological replications, and identified 3,621 periodic genes through our wavelet analysis. The expression data are feasible to infer network sparsity based on ARX models. We found that genes involved in biological processes such as transcriptional regulation, protein degradation, and post-transcriptional modification and photosynthesis are significantly enriched in the periodic genes, suggesting that these processes might be regulated by circadian rhythm in B. distachyon . On the basis of the time-series expression patterns of the periodic genes, we constructed a chronological gene co-expression network and identified putative transcription factors encoding genes that might be involved in the time-specific regulatory transcriptional network. Moreover, we inferred a transcriptional network composed of the periodic genes in B. distachyon , aiming to identify genes associated with other genes through variable selection by grouping time points for each gene. Based on the ARX model with the group SCAD regularization using our time-series expression datasets of the periodic genes, we constructed gene networks and found that the networks represent typical scale-free structure. Our findings demonstrate that the diurnal changes in the transcriptome in B. distachyon leaves have a sparse network structure, demonstrating the spatiotemporal gene regulatory network over the cyclic phase transitions in B. distachyon diurnal growth.
Fink, Annette; Büttner, Julia K; Thomas, Doris; Holtappels, Rafaela; Reddehase, Matthias J; Lemmermann, Niels A W
2014-02-14
Viral CD8 T-cell epitopes, represented by viral peptides bound to major histocompatibility complex class-I (MHC-I) glycoproteins, are often identified by "reverse immunology", a strategy not requiring biochemical and structural knowledge of the actual viral protein from which they are derived by antigen processing. Instead, bioinformatic algorithms predicting the probability of C-terminal cleavage in the proteasome, as well as binding affinity to the presenting MHC-I molecules, are applied to amino acid sequences deduced from predicted open reading frames (ORFs) based on the genomic sequence. If the protein corresponding to an antigenic ORF is known, it is usually inferred that the kinetic class of the protein also defines the phase in the viral replicative cycle during which the respective antigenic peptide is presented for recognition by CD8 T cells. We have previously identified a nonapeptide from the predicted ORFm164 of murine cytomegalovirus that is presented by the MHC-I allomorph H-2 Dd and that is immunodominant in BALB/c (H-2d haplotype) mice. Surprisingly, although the ORFm164 protein gp36.5 is expressed as an Early (E) phase protein, the m164 epitope is presented already during the Immediate Early (IE) phase, based on the expression of an upstream mRNA starting within ORFm167 and encompassing ORFm164.
Feinauer, Christoph; Procaccini, Andrea; Zecchina, Riccardo; Weigt, Martin; Pagnani, Andrea
2014-01-01
In the course of evolution, proteins show a remarkable conservation of their three-dimensional structure and their biological function, leading to strong evolutionary constraints on the sequence variability between homologous proteins. Our method aims at extracting such constraints from rapidly accumulating sequence data, and thereby at inferring protein structure and function from sequence information alone. Recently, global statistical inference methods (e.g. direct-coupling analysis, sparse inverse covariance estimation) have achieved a breakthrough towards this aim, and their predictions have been successfully implemented into tertiary and quaternary protein structure prediction methods. However, due to the discrete nature of the underlying variable (amino-acids), exact inference requires exponential time in the protein length, and efficient approximations are needed for practical applicability. Here we propose a very efficient multivariate Gaussian modeling approach as a variant of direct-coupling analysis: the discrete amino-acid variables are replaced by continuous Gaussian random variables. The resulting statistical inference problem is efficiently and exactly solvable. We show that the quality of inference is comparable or superior to the one achieved by mean-field approximations to inference with discrete variables, as done by direct-coupling analysis. This is true for (i) the prediction of residue-residue contacts in proteins, and (ii) the identification of protein-protein interaction partner in bacterial signal transduction. An implementation of our multivariate Gaussian approach is available at the website http://areeweb.polito.it/ricerca/cmp/code. PMID:24663061
Wenger, Yvan; Galliot, Brigitte
2013-01-01
Phenotypic traits derive from the selective recruitment of genetic materials over macroevolutionary times, and protein-coding genes constitute an essential component of these materials. We took advantage of the recent production of genomic scale data from sponges and cnidarians, sister groups from eumetazoans and bilaterians, respectively, to date the emergence of human proteins and to infer the timing of acquisition of novel traits through metazoan evolution. Comparing the proteomes of 23 eukaryotes, we find that 33% human proteins have an ortholog in nonmetazoan species. This premetazoan proteome associates with 43% of all annotated human biological processes. Subsequently, four major waves of innovations can be inferred in the last common ancestors of eumetazoans, bilaterians, euteleostomi (bony vertebrates), and hominidae, largely specific to each epoch, whereas early branching deuterostome and chordate phyla show very few innovations. Interestingly, groups of proteins that act together in their modern human functions often originated concomitantly, although the corresponding human phenotypes frequently emerged later. For example, the three cnidarians Acropora, Nematostella, and Hydra express a highly similar protein inventory, and their protein innovations can be affiliated either to traits shared by all eumetazoans (gut differentiation, neurogenesis); or to bilaterian traits present in only some cnidarians (eyes, striated muscle); or to traits not identified yet in this phylum (mesodermal layer, endocrine glands). The variable correspondence between phenotypes predicted from protein enrichments and observed phenotypes suggests that a parallel mechanism repeatedly produce similar phenotypes, thanks to novel regulatory events that independently tie preexisting conserved genetic modules. PMID:24065732
Salicylic acid interferes with GFP fluorescence in vivo
de Jonge, Jennifer; Hofius, Daniel
2017-01-01
Abstract Fluorescent proteins have become essential tools for cell biologists. They are routinely used by plant biologists for protein and promoter fusions to infer protein localization, tissue‐specific expression and protein abundance. When studying the effects of biotic stress on chromatin, we unexpectedly observed a decrease in GFP signal intensity upon salicylic acid (SA) treatment in Arabidopsis lines expressing histone H1-GFP fusions. This GFP signal decrease was dependent on SA concentration. The effect was not specific to the linker histone H1-GFP fusion but was also observed for the nucleosomal histone H2A-GFP fusion. This result prompted us to investigate a collection of fusion proteins, which included different promoters, subcellular localizations and fluorophores. In all cases, fluorescence signals declined strongly or disappeared after SA application. No changes were detected in GFP‐fusion protein abundance when fluorescence signals were lost indicating that SA does not interfere with protein stability but GFP fluorescence. In vitro experiments showed that SA caused GFP fluorescence reduction only in vivo but not in vitro, suggesting that SA requires cellular components to cause fluorescence reduction. Together, we conclude that SA can interfere with the fluorescence of various GFP‐derived reporter constructs in vivo. Assays that measure relocation or turnover of GFP‐tagged proteins upon SA treatment should therefore be evaluated with caution. PMID:28369601
Vella, Danila; Zoppis, Italo; Mauri, Giancarlo; Mauri, Pierluigi; Di Silvestre, Dario
2017-12-01
The reductionist approach of dissecting biological systems into their constituents has been successful in the first stage of the molecular biology to elucidate the chemical basis of several biological processes. This knowledge helped biologists to understand the complexity of the biological systems evidencing that most biological functions do not arise from individual molecules; thus, realizing that the emergent properties of the biological systems cannot be explained or be predicted by investigating individual molecules without taking into consideration their relations. Thanks to the improvement of the current -omics technologies and the increasing understanding of the molecular relationships, even more studies are evaluating the biological systems through approaches based on graph theory. Genomic and proteomic data are often combined with protein-protein interaction (PPI) networks whose structure is routinely analyzed by algorithms and tools to characterize hubs/bottlenecks and topological, functional, and disease modules. On the other hand, co-expression networks represent a complementary procedure that give the opportunity to evaluate at system level including organisms that lack information on PPIs. Based on these premises, we introduce the reader to the PPI and to the co-expression networks, including aspects of reconstruction and analysis. In particular, the new idea to evaluate large-scale proteomic data by means of co-expression networks will be discussed presenting some examples of application. Their use to infer biological knowledge will be shown, and a special attention will be devoted to the topological and module analysis.
Wang, Junbai; Wu, Qianqian; Hu, Xiaohua Tony; Tian, Tianhai
2016-11-01
Investigating the dynamics of genetic regulatory networks through high throughput experimental data, such as microarray gene expression profiles, is a very important but challenging task. One of the major hindrances in building detailed mathematical models for genetic regulation is the large number of unknown model parameters. To tackle this challenge, a new integrated method is proposed by combining a top-down approach and a bottom-up approach. First, the top-down approach uses probabilistic graphical models to predict the network structure of DNA repair pathway that is regulated by the p53 protein. Two networks are predicted, namely a network of eight genes with eight inferred interactions and an extended network of 21 genes with 17 interactions. Then, the bottom-up approach using differential equation models is developed to study the detailed genetic regulations based on either a fully connected regulatory network or a gene network obtained by the top-down approach. Model simulation error, parameter identifiability and robustness property are used as criteria to select the optimal network. Simulation results together with permutation tests of input gene network structures indicate that the prediction accuracy and robustness property of the two predicted networks using the top-down approach are better than those of the corresponding fully connected networks. In particular, the proposed approach reduces computational cost significantly for inferring model parameters. Overall, the new integrated method is a promising approach for investigating the dynamics of genetic regulation. Copyright © 2016 Elsevier Inc. All rights reserved.
Xue, Mingming; Qiqige, Chaolumen; Zhang, Qi; Zhao, Haixia; Su, Liping; Sun, Peng; Zhao, Pengwei
2018-06-27
BACKGROUND The aim of this study was to investigate the effects of TNF-α and IL-10 on the expression of ICAM-1 and CD31 in human coronary artery endothelial cells (HCAEC). MATERIAL AND METHODS HCAEC was treated with 0, 2.5 μg/l, 5 μg/l, and 10 μg/l of TNF-α for 2 h, 6 h, and 10 h, and with 0 μg/l, 10 μg/l, 100 μg/l, and 200 μg/l of IL-10 for 5 h, 10 h and 15 h, respectively. RNA inference of TNF-αR was performed with siRNA. Real-time PCR, Western blot analysis, and ELSA were used to detect the mRNA level and protein level of ICAM-1 and CD31. RESULTS TNF-α significantly increased the mRNA and protein expression of ICAM-1 (P<0.05), and 2.5 μg/l TNF-α had the most obvious effect. RNAi of TNF-aR reduced the induction of TNF-α on the mRNA and protein expression of ICAM-1 (P<0.05). TNF-α significantly decreased the CD31 in the supernatant (P<0.05), and 2.5 μg/l TNF-a had the most obvious effect. IL-10 significantly decreased the ICAM-1 protein level. IL-10 decreased the mRNA expression and the protein expression of CD31. The effect on mRNA was not significant (P>0.05), while the effect on the protein expression was significant (P<0.05). CONCLUSIONS TNF-α and IL-10 treatment can affect the expression of ICAM-1 and CD31 in HCAEC.
Pan, Yu; Bradley, Glyn; Pyke, Kevin; Ball, Graham; Lu, Chungui; Fray, Rupert; Marshall, Alexandra; Jayasuta, Subhalai; Baxter, Charles; van Wijk, Rik; Boyden, Laurie; Cade, Rebecca; Chapman, Natalie H; Fraser, Paul D; Hodgman, Charlie; Seymour, Graham B
2013-03-01
Carotenoids represent some of the most important secondary metabolites in the human diet, and tomato (Solanum lycopersicum) is a rich source of these health-promoting compounds. In this work, a novel and fruit-related regulator of pigment accumulation in tomato has been identified by artificial neural network inference analysis and its function validated in transgenic plants. A tomato fruit gene regulatory network was generated using artificial neural network inference analysis and transcription factor gene expression profiles derived from fruits sampled at various points during development and ripening. One of the transcription factor gene expression profiles with a sequence related to an Arabidopsis (Arabidopsis thaliana) ARABIDOPSIS PSEUDO RESPONSE REGULATOR2-LIKE gene (APRR2-Like) was up-regulated at the breaker stage in wild-type tomato fruits and, when overexpressed in transgenic lines, increased plastid number, area, and pigment content, enhancing the levels of chlorophyll in immature unripe fruits and carotenoids in red ripe fruits. Analysis of the transcriptome of transgenic lines overexpressing the tomato APPR2-Like gene revealed up-regulation of several ripening-related genes in the overexpression lines, providing a link between the expression of this tomato gene and the ripening process. A putative ortholog of the tomato APPR2-Like gene in sweet pepper (Capsicum annuum) was associated with pigment accumulation in fruit tissues. We conclude that the function of this gene is conserved across taxa and that it encodes a protein that has an important role in ripening.
Accurate and sensitive quantification of protein-DNA binding affinity.
Rastogi, Chaitanya; Rube, H Tomas; Kribelbauer, Judith F; Crocker, Justin; Loker, Ryan E; Martini, Gabriella D; Laptenko, Oleg; Freed-Pastor, William A; Prives, Carol; Stern, David L; Mann, Richard S; Bussemaker, Harmen J
2018-04-17
Transcription factors (TFs) control gene expression by binding to genomic DNA in a sequence-specific manner. Mutations in TF binding sites are increasingly found to be associated with human disease, yet we currently lack robust methods to predict these sites. Here, we developed a versatile maximum likelihood framework named No Read Left Behind (NRLB) that infers a biophysical model of protein-DNA recognition across the full affinity range from a library of in vitro selected DNA binding sites. NRLB predicts human Max homodimer binding in near-perfect agreement with existing low-throughput measurements. It can capture the specificity of the p53 tetramer and distinguish multiple binding modes within a single sample. Additionally, we confirm that newly identified low-affinity enhancer binding sites are functional in vivo, and that their contribution to gene expression matches their predicted affinity. Our results establish a powerful paradigm for identifying protein binding sites and interpreting gene regulatory sequences in eukaryotic genomes. Copyright © 2018 the Author(s). Published by PNAS.
Accurate and sensitive quantification of protein-DNA binding affinity
Rastogi, Chaitanya; Rube, H. Tomas; Kribelbauer, Judith F.; Crocker, Justin; Loker, Ryan E.; Martini, Gabriella D.; Laptenko, Oleg; Freed-Pastor, William A.; Prives, Carol; Stern, David L.; Mann, Richard S.; Bussemaker, Harmen J.
2018-01-01
Transcription factors (TFs) control gene expression by binding to genomic DNA in a sequence-specific manner. Mutations in TF binding sites are increasingly found to be associated with human disease, yet we currently lack robust methods to predict these sites. Here, we developed a versatile maximum likelihood framework named No Read Left Behind (NRLB) that infers a biophysical model of protein-DNA recognition across the full affinity range from a library of in vitro selected DNA binding sites. NRLB predicts human Max homodimer binding in near-perfect agreement with existing low-throughput measurements. It can capture the specificity of the p53 tetramer and distinguish multiple binding modes within a single sample. Additionally, we confirm that newly identified low-affinity enhancer binding sites are functional in vivo, and that their contribution to gene expression matches their predicted affinity. Our results establish a powerful paradigm for identifying protein binding sites and interpreting gene regulatory sequences in eukaryotic genomes. PMID:29610332
Dietary TiO2 particles modulate expression of hormone-related genes in Bombyx mori.
Shi, Guofang; Zhan, Pengfei; Jin, Weiming; Fei, JianMing; Zhao, Lihua
2017-08-01
Silkworm (Bombyx mori) is an economically beneficial insect. Its growth and development are regulated by endogenous hormones. In the present study, we found that feeding titanium dioxide nanoparticles (TiO 2 NP) caused a significant increase of body size. TiO 2 NP stimulated the transcription of several genes, including the insulin-related hormone bombyxin, PI3K/Akt/TOR (where PI3K is phosphatidylinositol 3-kinase and TOR is target of rapamycin), and the adenosine 5'-monophosphateactivated protein kinase (AMPK)/target of rapamycin (TOR) pathways. Differentially expressed gene (DEG) analysis documented 26 developmental hormone signaling related genes that were differentially expressed following dietary TiO 2 NP treatment. qPCR analysis confirmed the upregulation of insulin/ecdysteroid signaling genes, such as bombyxin B-1, bombyxin B-4, bombyxin B-7, MAPK, P70S6K, PI3k, eIF4E, E75, ecdysteroid receptor (EcR), and insulin-related peptide binding protein precursor 2 (IBP2). We infer from the upregulated expression of bombyxins and the signaling network that they act in bombyxin-stimulated ecdysteroidogenesis. © 2017 Wiley Periodicals, Inc.
Raju, Hemalatha B; Tsinoremas, Nicholas F; Capobianco, Enrico
2016-01-01
Regeneration of injured nerves is likely occurring in the peripheral nervous system, but not in the central nervous system. Although protein-coding gene expression has been assessed during nerve regeneration, little is currently known about the role of non-coding RNAs (ncRNAs). This leaves open questions about the potential effects of ncRNAs at transcriptome level. Due to the limited availability of human neuropathic pain (NP) data, we have identified the most comprehensive time-course gene expression profile referred to sciatic nerve (SN) injury and studied in a rat model using two neuronal tissues, namely dorsal root ganglion (DRG) and SN. We have developed a methodology to identify differentially expressed bioentities starting from microarray probes and repurposing them to annotate ncRNAs, while analyzing the expression profiles of protein-coding genes. The approach is designed to reuse microarray data and perform first profiling and then meta-analysis through three main steps. First, we used contextual analysis to identify what we considered putative or potential protein-coding targets for selected ncRNAs. Relevance was therefore assigned to differential expression of neighbor protein-coding genes, with neighborhood defined by a fixed genomic distance from long or antisense ncRNA loci, and of parental genes associated with pseudogenes. Second, connectivity among putative targets was used to build networks, in turn useful to conduct inference at interactomic scale. Last, network paths were annotated to assess relevance to NP. We found significant differential expression in long-intergenic ncRNAs (32 lincRNAs in SN and 8 in DRG), antisense RNA (31 asRNA in SN and 12 in DRG), and pseudogenes (456 in SN and 56 in DRG). In particular, contextual analysis centered on pseudogenes revealed some targets with known association to neurodegeneration and/or neurogenesis processes. While modules of the olfactory receptors were clearly identified in protein-protein interaction networks, other connectivity paths were identified between proteins already investigated in studies on disorders, such as Parkinson, Down syndrome, Huntington disease, and Alzheimer. Our findings suggest the importance of reusing gene expression data by meta-analysis approaches.
Inferring gene and protein interactions using PubMed citations and consensus Bayesian networks
Dalman, Mark; Haddad, Joseph; Duan, Zhong-Hui
2017-01-01
The PubMed database offers an extensive set of publication data that can be useful, yet inherently complex to use without automated computational techniques. Data repositories such as the Genomic Data Commons (GDC) and the Gene Expression Omnibus (GEO) offer experimental data storage and retrieval as well as curated gene expression profiles. Genetic interaction databases, including Reactome and Ingenuity Pathway Analysis, offer pathway and experiment data analysis using data curated from these publications and data repositories. We have created a method to generate and analyze consensus networks, inferring potential gene interactions, using large numbers of Bayesian networks generated by data mining publications in the PubMed database. Through the concept of network resolution, these consensus networks can be tailored to represent possible genetic interactions. We designed a set of experiments to confirm that our method is stable across variation in both sample and topological input sizes. Using gene product interactions from the KEGG pathway database and data mining PubMed publication abstracts, we verify that regardless of the network resolution or the inferred consensus network, our method is capable of inferring meaningful gene interactions through consensus Bayesian network generation with multiple, randomized topological orderings. Our method can not only confirm the existence of currently accepted interactions, but has the potential to hypothesize new ones as well. We show our method confirms the existence of known gene interactions such as JAK-STAT-PI3K-AKT-mTOR, infers novel gene interactions such as RAS- Bcl-2 and RAS-AKT, and found significant pathway-pathway interactions between the JAK-STAT signaling and Cardiac Muscle Contraction KEGG pathways. PMID:29049295
Protein-DNA binding dynamics predict transcriptional response to nutrients in archaea.
Todor, Horia; Sharma, Kriti; Pittman, Adrianne M C; Schmid, Amy K
2013-10-01
Organisms across all three domains of life use gene regulatory networks (GRNs) to integrate varied stimuli into coherent transcriptional responses to environmental pressures. However, inferring GRN topology and regulatory causality remains a central challenge in systems biology. Previous work characterized TrmB as a global metabolic transcription factor in archaeal extremophiles. However, it remains unclear how TrmB dynamically regulates its ∼100 metabolic enzyme-coding gene targets. Using a dynamic perturbation approach, we elucidate the topology of the TrmB metabolic GRN in the model archaeon Halobacterium salinarum. Clustering of dynamic gene expression patterns reveals that TrmB functions alone to regulate central metabolic enzyme-coding genes but cooperates with various regulators to control peripheral metabolic pathways. Using a dynamical model, we predict gene expression patterns for some TrmB-dependent promoters and infer secondary regulators for others. Our data suggest feed-forward gene regulatory topology for cobalamin biosynthesis. In contrast, purine biosynthesis appears to require TrmB-independent regulators. We conclude that TrmB is an important component for mediating metabolic modularity, integrating nutrient status and regulating gene expression dynamics alone and in concert with secondary regulators.
Lubbock, Alexander L. R.; Katz, Elad; Harrison, David J.; Overton, Ian M.
2013-01-01
Tissue microarrays (TMAs) allow multiplexed analysis of tissue samples and are frequently used to estimate biomarker protein expression in tumour biopsies. TMA Navigator (www.tmanavigator.org) is an open access web application for analysis of TMA data and related information, accommodating categorical, semi-continuous and continuous expression scores. Non-biological variation, or batch effects, can hinder data analysis and may be mitigated using the ComBat algorithm, which is incorporated with enhancements for automated application to TMA data. Unsupervised grouping of samples (patients) is provided according to Gaussian mixture modelling of marker scores, with cardinality selected by Bayesian information criterion regularization. Kaplan–Meier survival analysis is available, including comparison of groups identified by mixture modelling using the Mantel-Cox log-rank test. TMA Navigator also supports network inference approaches useful for TMA datasets, which often constitute comparatively few markers. Tissue and cell-type specific networks derived from TMA expression data offer insights into the molecular logic underlying pathophenotypes, towards more effective and personalized medicine. Output is interactive, and results may be exported for use with external programs. Private anonymous access is available, and user accounts may be generated for easier data management. PMID:23761446
Wang, Lun; Deng, Xiuxin
2015-01-01
Globular and crystalloid chromoplasts were observed to be region specifically formed in sweet orange (Citrus sinensis) flesh and converted from amyloplasts during fruit maturation, which was associated with the composition of specific carotenoids and the expression of carotenogenic genes. Subsequent isobaric tag for relative and absolute quantitation (iTRAQ)-based quantitative proteomic analyses of purified plastids from the flesh during chromoplast differentiation and senescence identified 1,386 putative plastid-localized proteins, 1,016 of which were quantified by spectral counting. The iTRAQ values reflecting the expression abundance of three identified proteins were validated by immunoblotting. Based on iTRAQ data, chromoplastogenesis appeared to be associated with three major protein expression patterns: (1) marked decrease in abundance of the proteins participating in the translation machinery through ribosome assembly; (2) increase in abundance of the proteins involved in terpenoid biosynthesis (including carotenoids), stress responses (redox, ascorbate, and glutathione), and development; and (3) maintenance of the proteins for signaling and DNA and RNA. Interestingly, a strong increase in abundance of several plastoglobule-localized proteins coincided with the formation of plastoglobules in the chromoplast. The proteomic data also showed that stable functioning of protein import, suppression of ribosome assembly, and accumulation of chromoplast proteases are correlated with the amyloplast-to-chromoplast transition; thus, these processes may play a collective role in chromoplast biogenesis and differentiation. By contrast, the chromoplast senescence process was inferred to be associated with significant increases in stress response and energy supply. In conclusion, this comprehensive proteomic study identified many potentially new plastid-localized proteins and provides insights into the potential developmental and molecular mechanisms underlying chromoplast biogenesis, differentiation, and senescence in sweet orange flesh. PMID:26056088
Computational synchronization of microarray data with application to Plasmodium falciparum.
Zhao, Wei; Dauwels, Justin; Niles, Jacquin C; Cao, Jianshu
2012-06-21
Microarrays are widely used to investigate the blood stage of Plasmodium falciparum infection. Starting with synchronized cells, gene expression levels are continually measured over the 48-hour intra-erythrocytic cycle (IDC). However, the cell population gradually loses synchrony during the experiment. As a result, the microarray measurements are blurred. In this paper, we propose a generalized deconvolution approach to reconstruct the intrinsic expression pattern, and apply it to P. falciparum IDC microarray data. We develop a statistical model for the decay of synchrony among cells, and reconstruct the expression pattern through statistical inference. The proposed method can handle microarray measurements with noise and missing data. The original gene expression patterns become more apparent in the reconstructed profiles, making it easier to analyze and interpret the data. We hypothesize that reconstructed gene expression patterns represent better temporally resolved expression profiles that can be probabilistically modeled to match changes in expression level to IDC transitions. In particular, we identify transcriptionally regulated protein kinases putatively involved in regulating the P. falciparum IDC. By analyzing publicly available microarray data sets for the P. falciparum IDC, protein kinases are ranked in terms of their likelihood to be involved in regulating transitions between the ring, trophozoite and schizont developmental stages of the P. falciparum IDC. In our theoretical framework, a few protein kinases have high probability rankings, and could potentially be involved in regulating these developmental transitions. This study proposes a new methodology for extracting intrinsic expression patterns from microarray data. By applying this method to P. falciparum microarray data, several protein kinases are predicted to play a significant role in the P. falciparum IDC. Earlier experiments have indeed confirmed that several of these kinases are involved in this process. Overall, these results indicate that further functional analysis of these additional putative protein kinases may reveal new insights into how the P. falciparum IDC is regulated.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Weiwen; Culley, David E.; Gritsenko, Marina A.
2006-11-03
ABSTRACT In the previous study, the whole-genome gene expression profiles of D. vulgaris in response to oxidative stress and heat shock were determined. The results showed 24-28% of the responsive genes were hypothetical proteins that have not been experimentally characterized or whose function can not be deduced by simple sequence comparison. To further explore the protecting mechanisms employed in D. vulgaris against the oxidative stress and heat shock, attempt was made in this study to infer functions of these hypothetical proteins by phylogenomic profiling along with detailed sequence comparison against various publicly available databases. By this approach we were abletomore » assign possible functions to 25 responsive hypothetical proteins. The findings included that DVU0725, induced by oxidative stress, may be involved in lipopolysaccharide biosynthesis, implying that the alternation of lipopolysaccharide on cell surface might service as a mechanism against oxidative stress in D. vulgaris. In addition, two responsive proteins, DVU0024 encoding a putative transcriptional regulator and DVU1670 encoding predicted redox protein, were sharing co-evolution atterns with rubrerythrin in Archaeoglobus fulgidus and Clostridium perfringens, respectively, implying that they might be part of the stress response and protective systems in D. vulgaris. The study demonstrated that phylogenomic profiling is a useful tool in interpretation of experimental genomics data, and also provided further insight on cellular response to oxidative stress and heat shock in D. vulgaris.« less
Hill, Rachel A; Klug, Maren; Kiss Von Soly, Szerenke; Binder, Michele D; Hannan, Anthony J; van den Buuse, Maarten
2014-10-01
Post-mortem studies have demonstrated reduced expression of brain-derived neurotrophic factor (BDNF) in the hippocampus of schizophrenia and major depression patients. The "two hit" hypothesis proposes that two or more major disruptions at specific time points during development are involved in the pathophysiology of these mental illnesses. However, the role of BDNF in these "two hit" effects is unclear. Our aim was to behaviorally characterize a "two hit" rat model of developmental stress accompanied by an in-depth assessment of BDNF expression and signalling. Wistar rats were exposed to neonatal maternal separation (MS) stress and/or adolescent/young-adult corticosterone (CORT) treatment. In adulthood, models of cognitive and negative symptoms of mental illness were analyzed. The hippocampus was then dissected into dorsal (DHP) and ventral (VHP) regions and analyzed by qPCR for exon-specific BDNF gene expression or by Western blot for BDNF protein expression and downstream signaling. Male "two hit" rats showed marked disruptions in short-term spatial memory (Y-maze) which were absent in females. However, female "two hit" rats showed signs of anhedonia (sucrose preference test), which were absent in males. Novel object recognition and anxiety (elevated plus maze) were unchanged by either of the two "hits". In the DHP, MS caused a male-specific increase in BDNF Exons I, II, IV, VII, and IX mRNA but a decrease in mature BDNF and phosphorylated TrkB (pTrkB) protein expression in adulthood. In the VHP, BDNF transcript expression was unchanged; however, in female rats only, MS significantly decreased mature BDNF and pTrkB protein expression in adulthood. These data demonstrate that MS causes region-specific and sex-specific long-term effects on BDNF expression and signaling and, importantly, mRNA expression does not always infer protein expression. Alterations to BDNF signaling may mediate the sex-specific effects of developmental stress on anhedonic behaviors. © 2014 Wiley Periodicals, Inc.
Teichmann, Aline; Vargas, Daiani M; Monteiro, Karina M; Meneghetti, Bruna V; Dutra, Cristine S; Paredes, Rodolfo; Galanti, Norbel; Zaha, Arnaldo; Ferreira, Henrique B
2015-04-03
The 14-3-3 protein family of eukaryotic regulators was studied in Echinococcus granulosus, the causative agent of cystic hydatid disease. These proteins mediate important cellular processes in eukaryotes and are expected to play important roles in parasite biology. Six isoforms of E. granulosus 14-3-3 genes and proteins (Eg14-3-3.1-6) were analyzed, and their phylogenetic relationships were established with bona fide 14-3-3 orthologous proteins from eukaryotic species. Eg14-3-3 isoforms with previous evidence of expression (Eg14-3-3.1-4) in E. granulosus pathogenic larval stage (metacestode) were cloned, and recombinant proteins were used for functional studies. These protein isoforms were detected in different components of E. granulosus metacestode, including interface components with the host. The roles that are played by Eg14-3-3 proteins in parasite biology were inferred from the repertoires of interacting proteins with each isoform, as assessed by gel overlay, cross-linking, and affinity chromatography assays. A total of 95 Eg14-3-3 protein ligands were identified by mass spectrometry. Eg14-3-3 isoforms have shared partners (44 proteins), indicating some overlapping functions; however, they also bind exclusive partners (51 proteins), suggesting Eg14-3-3 functional specialization. These ligand repertoires indicate the involvement of Eg14-3-3 proteins in multiple biochemical pathways in the E. granulosus metacestode and note some degree of isoform specialization.
Raju, Hemalatha B.; Tsinoremas, Nicholas F.; Capobianco, Enrico
2016-01-01
Regeneration of injured nerves is likely occurring in the peripheral nervous system, but not in the central nervous system. Although protein-coding gene expression has been assessed during nerve regeneration, little is currently known about the role of non-coding RNAs (ncRNAs). This leaves open questions about the potential effects of ncRNAs at transcriptome level. Due to the limited availability of human neuropathic pain (NP) data, we have identified the most comprehensive time-course gene expression profile referred to sciatic nerve (SN) injury and studied in a rat model using two neuronal tissues, namely dorsal root ganglion (DRG) and SN. We have developed a methodology to identify differentially expressed bioentities starting from microarray probes and repurposing them to annotate ncRNAs, while analyzing the expression profiles of protein-coding genes. The approach is designed to reuse microarray data and perform first profiling and then meta-analysis through three main steps. First, we used contextual analysis to identify what we considered putative or potential protein-coding targets for selected ncRNAs. Relevance was therefore assigned to differential expression of neighbor protein-coding genes, with neighborhood defined by a fixed genomic distance from long or antisense ncRNA loci, and of parental genes associated with pseudogenes. Second, connectivity among putative targets was used to build networks, in turn useful to conduct inference at interactomic scale. Last, network paths were annotated to assess relevance to NP. We found significant differential expression in long-intergenic ncRNAs (32 lincRNAs in SN and 8 in DRG), antisense RNA (31 asRNA in SN and 12 in DRG), and pseudogenes (456 in SN and 56 in DRG). In particular, contextual analysis centered on pseudogenes revealed some targets with known association to neurodegeneration and/or neurogenesis processes. While modules of the olfactory receptors were clearly identified in protein–protein interaction networks, other connectivity paths were identified between proteins already investigated in studies on disorders, such as Parkinson, Down syndrome, Huntington disease, and Alzheimer. Our findings suggest the importance of reusing gene expression data by meta-analysis approaches. PMID:27803687
He, Feng; Zeng, An-Ping
2006-01-01
Background The increasing availability of time-series expression data opens up new possibilities to study functional linkages of genes. Present methods used to infer functional linkages between genes from expression data are mainly based on a point-to-point comparison. Change trends between consecutive time points in time-series data have been so far not well explored. Results In this work we present a new method based on extracting main features of the change trend and level of gene expression between consecutive time points. The method, termed as trend correlation (TC), includes two major steps: 1, calculating a maximal local alignment of change trend score by dynamic programming and a change trend correlation coefficient between the maximal matched change levels of each gene pair; 2, inferring relationships of gene pairs based on two statistical extraction procedures. The new method considers time shifts and inverted relationships in a similar way as the local clustering (LC) method but the latter is merely based on a point-to-point comparison. The TC method is demonstrated with data from yeast cell cycle and compared with the LC method and the widely used Pearson correlation coefficient (PCC) based clustering method. The biological significance of the gene pairs is examined with several large-scale yeast databases. Although the TC method predicts an overall lower number of gene pairs than the other two methods at a same p-value threshold, the additional number of gene pairs inferred by the TC method is considerable: e.g. 20.5% compared with the LC method and 49.6% with the PCC method for a p-value threshold of 2.7E-3. Moreover, the percentage of the inferred gene pairs consistent with databases by our method is generally higher than the LC method and similar to the PCC method. A significant number of the gene pairs only inferred by the TC method are process-identity or function-similarity pairs or have well-documented biological interactions, including 443 known protein interactions and some known cell cycle related regulatory interactions. It should be emphasized that the overlapping of gene pairs detected by the three methods is normally not very high, indicating a necessity of combining the different methods in search of functional association of genes from time-series data. For a p-value threshold of 1E-5 the percentage of process-identity and function-similarity gene pairs among the shared part of the three methods reaches 60.2% and 55.6% respectively, building a good basis for further experimental and functional study. Furthermore, the combined use of methods is important to infer more complete regulatory circuits and network as exemplified in this study. Conclusion The TC method can significantly augment the current major methods to infer functional linkages and biological network and is well suitable for exploring temporal relationships of gene expression in time-series data. PMID:16478547
Computational approaches to protein inference in shotgun proteomics
2012-01-01
Shotgun proteomics has recently emerged as a powerful approach to characterizing proteomes in biological samples. Its overall objective is to identify the form and quantity of each protein in a high-throughput manner by coupling liquid chromatography with tandem mass spectrometry. As a consequence of its high throughput nature, shotgun proteomics faces challenges with respect to the analysis and interpretation of experimental data. Among such challenges, the identification of proteins present in a sample has been recognized as an important computational task. This task generally consists of (1) assigning experimental tandem mass spectra to peptides derived from a protein database, and (2) mapping assigned peptides to proteins and quantifying the confidence of identified proteins. Protein identification is fundamentally a statistical inference problem with a number of methods proposed to address its challenges. In this review we categorize current approaches into rule-based, combinatorial optimization and probabilistic inference techniques, and present them using integer programing and Bayesian inference frameworks. We also discuss the main challenges of protein identification and propose potential solutions with the goal of spurring innovative research in this area. PMID:23176300
Discovering causal signaling pathways through gene-expression patterns
Parikh, Jignesh R.; Klinger, Bertram; Xia, Yu; Marto, Jarrod A.; Blüthgen, Nils
2010-01-01
High-throughput gene-expression studies result in lists of differentially expressed genes. Most current meta-analyses of these gene lists include searching for significant membership of the translated proteins in various signaling pathways. However, such membership enrichment algorithms do not provide insight into which pathways caused the genes to be differentially expressed in the first place. Here, we present an intuitive approach for discovering upstream signaling pathways responsible for regulating these differentially expressed genes. We identify consistently regulated signature genes specific for signal transduction pathways from a panel of single-pathway perturbation experiments. An algorithm that detects overrepresentation of these signature genes in a gene group of interest is used to infer the signaling pathway responsible for regulation. We expose our novel resource and algorithm through a web server called SPEED: Signaling Pathway Enrichment using Experimental Data sets. SPEED can be freely accessed at http://speed.sys-bio.net/. PMID:20494976
González-Thuillier, Irene; Venegas-Calerón, Mónica; Sánchez, Rosario; Garcés, Rafael; von Wettstein-Knowles, Penny; Martínez-Force, Enrique
2016-02-01
Two sunflower hydroxyacyl-[acyl carrier protein] dehydratases evolved into two different isoenzymes showing distinctive expression levels and kinetics' efficiencies. β-Hydroxyacyl-[acyl carrier protein (ACP)]-dehydratase (HAD) is a component of the type II fatty acid synthase complex involved in 'de novo' fatty acid biosynthesis in plants. This complex, formed by four intraplastidial proteins, is responsible for the sequential condensation of two-carbon units, leading to 16- and 18-C acyl-ACP. HAD dehydrates 3-hydroxyacyl-ACP generating trans-2-enoyl-ACP. With the aim of a further understanding of fatty acid biosynthesis in sunflower (Helianthus annuus) seeds, two β-hydroxyacyl-[ACP] dehydratase genes have been cloned from developing seeds, HaHAD1 (GenBank HM044767) and HaHAD2 (GenBank GU595454). Genomic DNA gel blot analyses suggest that both are single copy genes. Differences in their expression patterns across plant tissues were detected. Higher levels of HaHAD2 in the initial stages of seed development inferred its key role in seed storage fatty acid synthesis. That HaHAD1 expression levels remained constant across most tissues suggest a housekeeping function. Heterologous expression of these genes in E. coli confirmed both proteins were functional and able to interact with the bacterial complex 'in vivo'. The large increase of saturated fatty acids in cells expressing HaHAD1 and HaHAD2 supports the idea that these HAD genes are closely related to the E. coli FabZ gene. The proposed three-dimensional models of HaHAD1 and HaHAD2 revealed differences at the entrance to the catalytic tunnel attributable to Phe166/Val1159, respectively. HaHAD1 F166V was generated to study the function of this residue. The 'in vitro' enzymatic characterization of the three HAD proteins demonstrated all were active, with the mutant having intermediate K m and V max values to the wild-type proteins.
Emmert-Streib, Frank; Glazko, Galina V.; Altay, Gökmen; de Matos Simoes, Ricardo
2012-01-01
In this paper, we present a systematic and conceptual overview of methods for inferring gene regulatory networks from observational gene expression data. Further, we discuss two classic approaches to infer causal structures and compare them with contemporary methods by providing a conceptual categorization thereof. We complement the above by surveying global and local evaluation measures for assessing the performance of inference algorithms. PMID:22408642
Salicylic acid interferes with GFP fluorescence in vivo.
de Jonge, Jennifer; Hofius, Daniel; Hennig, Lars
2017-03-01
Fluorescent proteins have become essential tools for cell biologists. They are routinely used by plant biologists for protein and promoter fusions to infer protein localization, tissue-specific expression and protein abundance. When studying the effects of biotic stress on chromatin, we unexpectedly observed a decrease in GFP signal intensity upon salicylic acid (SA) treatment in Arabidopsis lines expressing histone H1-GFP fusions. This GFP signal decrease was dependent on SA concentration. The effect was not specific to the linker histone H1-GFP fusion but was also observed for the nucleosomal histone H2A-GFP fusion. This result prompted us to investigate a collection of fusion proteins, which included different promoters, subcellular localizations and fluorophores. In all cases, fluorescence signals declined strongly or disappeared after SA application. No changes were detected in GFP-fusion protein abundance when fluorescence signals were lost indicating that SA does not interfere with protein stability but GFP fluorescence. In vitro experiments showed that SA caused GFP fluorescence reduction only in vivo but not in vitro, suggesting that SA requires cellular components to cause fluorescence reduction. Together, we conclude that SA can interfere with the fluorescence of various GFP-derived reporter constructs in vivo. Assays that measure relocation or turnover of GFP-tagged proteins upon SA treatment should therefore be evaluated with caution. © The Author 2017. Published by Oxford University Press on behalf of the Society for Experimental Biology.
Faulon, Jean-Loup; Misra, Milind; Martin, Shawn; ...
2007-11-23
Motivation: Identifying protein enzymatic or pharmacological activities are important areas of research in biology and chemistry. Biological and chemical databases are increasingly being populated with linkages between protein sequences and chemical structures. Additionally, there is now sufficient information to apply machine-learning techniques to predict interactions between chemicals and proteins at a genome scale. Current machine-learning techniques use as input either protein sequences and structures or chemical information. We propose here a method to infer protein–chemical interactions using heterogeneous input consisting of both protein sequence and chemical information. Results: Our method relies on expressing proteins and chemicals with a common cheminformaticsmore » representation. We demonstrate our approach by predicting whether proteins can catalyze reactions not present in training sets. We also predict whether a given drug can bind a target, in the absence of prior binding information for that drug and target. Lastly, such predictions cannot be made with current machine-learning techniques requiring binding information for individual reactions or individual targets.« less
Khodnapur, Bharati S; Inamdar, Laxmi S; Nindi, Robertraj S; Math, Shivkumar A; Mulimani, B G; Inamdar, Sanjeev R
2015-02-01
To examine the impact of ultraviolet (UV) laser radiation on the embryos of Calotes versicolor in terms of its effects on the protein profile of the adrenal-kidney-gonadal complex (AKG), sex determination and differentiation, embryonic development and hatching synchrony. The eggs of C. versicolor, during thermo-sensitive period (TSP), were exposed to third harmonic laser pulses at 355 nm from a Q-switched Nd:YAG laser for 180 sec. Subsequent to the exposure they were incubated at the male-producing temperature (MPT) of 25.5 ± 0.5°C. The AKG of hatchlings was subjected to protein analysis by sodium dodecyl sulphate-polyacrylamide gel electrophoresis (SDS-PAGE) and to histology. The UV laser radiation altered the expression of the protein banding pattern in the AKG complex of hatchlings and it also affected the gonadal sex differentiation. SDS-PAGE of AKG of one-day-old hatchlings revealed a total of nine protein bands in the control group whereas UV laser irradiated hatchlings expressed a total of seven protein bands only one of which had the same Rf as a control band. The UV laser treated hatchlings have an ovotestes kind of gonad exhibiting a tendency towards femaleness instead of the typical testes. It is inferred that 355 nm UV laser radiation during TSP induces changes in the expression of proteins as well as their secretions. UV laser radiation had an impact on the gonadal differentiation pathway but no morphological anomalies were noticed.
Solano, Cristina; García, Begoña; Latasa, Cristina; Toledo-Arana, Alejandro; Zorraquino, Violeta; Valle, Jaione; Casals, Joan; Pedroso, Enrique; Lasa, Iñigo
2009-01-01
Bacteria have developed an exclusive signal transduction system involving multiple diguanylate cyclase and phosphodiesterase domain-containing proteins (GGDEF and EAL/HD-GYP, respectively) that modulate the levels of the same diffusible molecule, 3′-5′-cyclic diguanylic acid (c-di-GMP), to transmit signals and obtain specific cellular responses. Current knowledge about c-di-GMP signaling has been inferred mainly from the analysis of recombinant bacteria that either lack or overproduce individual members of the pathway, without addressing potential compensatory effects or interferences between them. Here, we dissected c-di-GMP signaling by constructing a Salmonella strain lacking all GGDEF-domain proteins and then producing derivatives, each restoring 1 protein. Our analysis showed that most GGDEF proteins are constitutively expressed and that their expression levels are not interdependent. Complete deletion of genes encoding GGDEF-domain proteins abrogated virulence, motility, long-term survival, and cellulose and fimbriae synthesis. Separate restoration revealed that 4 proteins from Salmonella and 1 from Yersinia pestis exclusively restored cellulose synthesis in a c-di-GMP–dependent manner, indicating that c-di-GMP produced by different GGDEF proteins can activate the same target. However, the restored strain containing the STM4551-encoding gene recovered all other phenotypes by means of gene expression modulation independently of c-di-GMP. Specifically, fimbriae synthesis and virulence were recovered through regulation of csgD and the plasmid-encoded spvAB mRNA levels, respectively. This study provides evidence that the regulation of the GGDEF-domain proteins network occurs at 2 levels: a level that strictly requires c-di-GMP to control enzymatic activities directly, restricted to cellulose synthesis in our experimental conditions, and another that involves gene regulation for which c-di-GMP synthesis can be dispensable. PMID:19416883
Al-Qudah, M.; Alkahtani, R.; Akbarali, H.I.; Murthy, K.S.; Grider, J.R.
2015-01-01
Background Brain-derived neurotrophic factor (BDNF) is a neurotrophin present in the intestine where it participates in survival and growth of enteric neurons, augmentation of enteric circuits, and stimulation of intestinal peristalsis and propulsion. Previous studies largely focused on the role of neural and mucosal BDNF. The expression and release of BDNF from intestinal smooth muscle and the interaction with enteric neuropeptides has not been studied in gut. Methods The expression and secretion of BDNF from smooth muscle cultured from rabbit longitudinal intestinal muscle in response to substance P and pituitary adenylate cyclase activating peptide (PACAP) was measured by western blot and ELISA. BDNF mRNA was measured by rt-PCR. Key Results The expression of BNDF protein and mRNA was greater in smooth muscle cells from the longitudinal muscle than from circular muscle layer. PACAP and substance P increased the expression of BDNF protein and mRNA in cultured longitudinal smooth muscle cells. PACAP and substance P also stimulated the secretion of BDNF from cultured longitudinal smooth muscle cells. Chelation of intracellular calcium with BAPTA prevented substance P-induced increase in BDNF mRNA and protein expression as well as substance P-induced secretion of BDNF. Conclusions & Inferences Neuropeptides known to be present in enteric neurons innervating the longitudinal layer increase the expression of BDNF mRNA and protein in smooth muscle cells and stimulate the release of BDNF. Considering the ability of BDNF to enhance smooth muscle contraction, this autocrine loop may partially explain the characteristic hypercontractility of longitudinal muscle in inflammatory bowel disease. PMID:26088546
Pan, Yu; Bradley, Glyn; Pyke, Kevin; Ball, Graham; Lu, Chungui; Fray, Rupert; Marshall, Alexandra; Jayasuta, Subhalai; Baxter, Charles; van Wijk, Rik; Boyden, Laurie; Cade, Rebecca; Chapman, Natalie H.; Fraser, Paul D.; Hodgman, Charlie; Seymour, Graham B.
2013-01-01
Carotenoids represent some of the most important secondary metabolites in the human diet, and tomato (Solanum lycopersicum) is a rich source of these health-promoting compounds. In this work, a novel and fruit-related regulator of pigment accumulation in tomato has been identified by artificial neural network inference analysis and its function validated in transgenic plants. A tomato fruit gene regulatory network was generated using artificial neural network inference analysis and transcription factor gene expression profiles derived from fruits sampled at various points during development and ripening. One of the transcription factor gene expression profiles with a sequence related to an Arabidopsis (Arabidopsis thaliana) ARABIDOPSIS PSEUDO RESPONSE REGULATOR2-LIKE gene (APRR2-Like) was up-regulated at the breaker stage in wild-type tomato fruits and, when overexpressed in transgenic lines, increased plastid number, area, and pigment content, enhancing the levels of chlorophyll in immature unripe fruits and carotenoids in red ripe fruits. Analysis of the transcriptome of transgenic lines overexpressing the tomato APPR2-Like gene revealed up-regulation of several ripening-related genes in the overexpression lines, providing a link between the expression of this tomato gene and the ripening process. A putative ortholog of the tomato APPR2-Like gene in sweet pepper (Capsicum annuum) was associated with pigment accumulation in fruit tissues. We conclude that the function of this gene is conserved across taxa and that it encodes a protein that has an important role in ripening. PMID:23292788
Magee, Joe C; Tiedens, Larissa Z
2006-12-01
In three studies, observers based inferences about the cohesiveness and common fate of groups on the emotions expressed by group members. The valence of expressions affected cohesiveness inferences, whereas the consistency of expressions affected inferences of whether members have common fate. These emotion composition effects were stronger than those due to the race or sex composition of the group. Furthermore, the authors show that emotion valence and consistency are differentially involved in judgments about the degree to which the group as a whole was responsible for group performance. Finally, it is demonstrated that valence-cohesiveness effects are mediated by inferences of interpersonal liking and that consistency-common fate effects are mediated by inferences of psychological similarity. These findings have implications for the literature on entitativity and regarding the function of emotions in social contexts.
Proteomic Signatures of Human Oral Epithelial Cells in HIV-Infected Subjects
Yohannes, Elizabeth; Ghosh, Santosh K.; Jiang, Bin; McCormick, Thomas S.; Weinberg, Aaron; Hill, Edward; Faddoul, Faddy; Chance, Mark R.
2011-01-01
The oral epithelium, the most abundant structural tissue lining the oral mucosa, is an important line of defense against infectious microorganisms. HIV infected subjects on highly active antiretroviral therapy (HAART) are susceptible to comorbid viral, bacterial and fungal infections in the oral cavity. To provide an assessment of the molecular alterations of oral epithelia potentially associated with susceptibility to comorbid infections in such subjects, we performed various proteomic studies on over twenty HIV infected and healthy subjects. In a discovery phase two Dimensional Difference Gel Electrophoresis (2-D DIGE) analyses of human oral gingival epithelial cell (HOEC) lysates were carried out; this identified 61 differentially expressed proteins between HIV-infected on HAART subjects and healthy controls. Down regulated proteins in HIV-infected subjects include proteins associated with maintenance of protein folding and pro- and anti-inflammatory responses (e.g., heat-shock proteins, Cryab, Calr, IL-1RA, and Galectin-3-binding protein) as well as proteins involved in redox homeostasis and detoxification (e.g., Gstp1, Prdx1, and Ero1). Up regulated proteins include: protein disulfide isomerases, proteins whose expression is negatively regulated by Hsp90 (e.g., Ndrg1), and proteins that maintain cellular integrity (e.g., Vimentin). In a verification phase, proteins identified in the protein profiling experiments and those inferred from Ingenuity Pathway Analysis were analyzed using Western blotting analysis on separate HOEC lysate samples, confirming many of the discovery findings. Additionally in HIV-infected patient samples Heat Shock Factor 1 is down regulated, which explains the reduced heat shock responses, while activation of the MAPK signal transduction cascade is observed. Overall, HAART therapy provides an incomplete immune recovery of the oral epithelial cells of the oral cavity for HIV-infected subjects, and the toxic side effects of HAART and/or HIV chronicity silence expression of multiple proteins that in healthy subjects function to provide robust innate immune responses and combat cellular stress. PMID:22114700
Personat, José-María; Tejedor-Cano, Javier; Lindahl, Marika; Diaz-Espejo, Antonio; Jordano, Juan
2012-01-01
A genetic program that in sunflower seeds is activated by Heat Shock transcription Factor A9 (HaHSFA9) has been analyzed in transgenic tobacco seedlings. The ectopic overexpression of the HSFA9 program protected photosynthetic membranes, which resisted extreme dehydration and oxidative stress conditions. In contrast, heat acclimation of seedlings induced thermotolerance but not resistance to the harsh stress conditions employed. The HSFA9 program was found to include the expression of plastidial small Heat Shock Proteins that accumulate only at lower abundance in heat-stressed vegetative organs. Photosystem II (PSII) maximum quantum yield was higher for transgenic seedlings than for non-transgenic seedlings, after either stress treatment. Furthermore, protection of both PSII and Photosystem I (PSI) membrane protein complexes was observed in the transgenic seedlings, leading to their survival after the stress treatments. It was also shown that the plastidial D1 protein, a labile component of the PSII reaction center, and the PSI core protein PsaB were shielded from oxidative damage and degradation. We infer that natural expression of the HSFA9 program during embryogenesis may protect seed pro-plastids from developmental desiccation. PMID:23227265
Differentially expressed transcripts in stomach of Penaeus monodon in response to AHPND infection.
Soonthornchai, Wipasiri; Chaiyapechara, Sage; Klinbunga, Sirawut; Thongda, Wilawan; Tangphatsornruang, Sithichoke; Yoocha, Thippawan; Jarayabhand, Padermsak; Jiravanichpaisal, Pikul
2016-12-01
Acute Hepatopancreatic Necrosis Disease (AHPND) is an emerging disease in aquacultured shrimp caused by a pathogenic strain of Vibrio parahaemolyticus. As with several pathogenic bacteria, colonization of the stomach appeared to be the initial step of the infection for AHPND-causing Vibrio. To understand the immune responses in the stomach of black tiger shrimp (Penaeus monodon), differentially expressed transcripts (DETs) in the stomach during V. parahaemolyticus strain 3HP (VP3HP) infection was examined using Ion Torrent sequencing. From the total 42,998 contigs obtained, 1585 contigs representing 1513 unigenes were significantly differentially expressed with 1122 and 391 unigenes up- and down-regulated, respectively. Among the DETs, there were 141 immune-related unigenes in 10 functional categories: antimicrobial peptide, signal transduction pathway, proPO system, oxidative stress, proteinases/proteinase inhibitors, apoptotic tumor-related protein, pathogen recognition immune regulator, blood clotting system, adhesive protein and heat shock protein. Expression profiles of 20 of 22 genes inferred from RNA sequencing were confirmed with the results from qRT-PCR. Additionally, a novel isoform of anti-lipopolysaccharide factor, PmALF7 whose transcript was induced in the stomach after challenge with VP3HP was discovered. This study provided a fundamental information on the molecular response in the shrimp stomach during the AHPND infection that would be beneficial for future research. Copyright © 2016 Elsevier Ltd. All rights reserved.
Suratanee, Apichat; Plaimas, Kitiporn
2017-01-01
The associations between proteins and diseases are crucial information for investigating pathological mechanisms. However, the number of known and reliable protein-disease associations is quite small. In this study, an analysis framework to infer associations between proteins and diseases was developed based on a large data set of a human protein-protein interaction network integrating an effective network search, namely, the reverse k -nearest neighbor (R k NN) search. The R k NN search was used to identify an impact of a protein on other proteins. Then, associations between proteins and diseases were inferred statistically. The method using the R k NN search yielded a much higher precision than a random selection, standard nearest neighbor search, or when applying the method to a random protein-protein interaction network. All protein-disease pair candidates were verified by a literature search. Supporting evidence for 596 pairs was identified. In addition, cluster analysis of these candidates revealed 10 promising groups of diseases to be further investigated experimentally. This method can be used to identify novel associations to better understand complex relationships between proteins and diseases.
Rothman, Naomi B; Magee, Joe C
2016-01-01
Our findings draw attention to the interpersonal communication function of a relatively unexplored dimension of emotions-the level of social engagement versus disengagement. In four experiments, regardless of valence and target group gender, observers infer greater relational well-being (more cohesiveness and less conflict) between group members from socially engaging (sadness and appreciation) versus disengaging (anger and pride) emotion expressions. Supporting our argument that social (dis)engagement is a critical dimension communicated by these emotions, we demonstrate (1) that inferences about group members' self-interest mediate the effect of socially engaging emotions on cohesiveness and (2) that the influence of socially disengaging emotion expressions on inferences of conflict is attenuated when groups have collectivistic norms (i.e., members value a high level of social engagement). Furthermore, we show an important downstream consequence of these inferences of relational well-being: Groups that seem less cohesive because of their members' proud (versus appreciative) expressions are also expected to have worse task performance.
Kulkarni, Yogesh M.; Chambers, Emily; McGray, A. J. Robert; Ware, Jason S.; Bramson, Jonathan L.
2012-01-01
Interleukin-12 (IL12) enhances anti-tumor immunity when delivered to the tumor microenvironment. However, local immunoregulatory elements dampen the efficacy of IL12. The identity of these local mechanisms used by tumors to suppress immunosurveillance represents a key knowledge gap for improving tumor immunotherapy. From a systems perspective, local suppression of anti-tumor immunity is a closed-loop system - where system response is determined by an unknown combination of external inputs and local cellular cross-talk. Here, we recreated this closed-loop system in vitro and combined quantitative high content assays, in silico model-based inference, and a proteomic workflow to identify the biochemical cues responsible for immunosuppression. Following an induction period, the B16 melanoma cell model, a transplantable model for spontaneous malignant melanoma, inhibited the response of a T helper cell model to IL12. This paracrine effect was not explained by induction of apoptosis or creation of a cytokine sink, despite both mechanisms present within the co-culture assay. Tumor-derived Wnt-inducible signaling protein-1 (WISP-1) was identified to exert paracrine action on immune cells by inhibiting their response to IL12. Moreover, WISP-1 was expressed in vivo following intradermal challenge with B16F10 cells and was inferred to be expressed at the tumor periphery. Collectively, the data suggest that (1) biochemical cues associated with epithelial-to-mesenchymal transition can shape anti-tumor immunity through paracrine action and (2) remnants of the immunoselective pressure associated with evolution in cancer include both sculpting of tumor antigens and expression of proteins that proactively shape anti-tumor immunity. PMID:22777646
The interface of protein structure, protein biophysics, and molecular evolution
Liberles, David A; Teichmann, Sarah A; Bahar, Ivet; Bastolla, Ugo; Bloom, Jesse; Bornberg-Bauer, Erich; Colwell, Lucy J; de Koning, A P Jason; Dokholyan, Nikolay V; Echave, Julian; Elofsson, Arne; Gerloff, Dietlind L; Goldstein, Richard A; Grahnen, Johan A; Holder, Mark T; Lakner, Clemens; Lartillot, Nicholas; Lovell, Simon C; Naylor, Gavin; Perica, Tina; Pollock, David D; Pupko, Tal; Regan, Lynne; Roger, Andrew; Rubinstein, Nimrod; Shakhnovich, Eugene; Sjölander, Kimmen; Sunyaev, Shamil; Teufel, Ashley I; Thorne, Jeffrey L; Thornton, Joseph W; Weinreich, Daniel M; Whelan, Simon
2012-01-01
Abstract The interface of protein structural biology, protein biophysics, molecular evolution, and molecular population genetics forms the foundations for a mechanistic understanding of many aspects of protein biochemistry. Current efforts in interdisciplinary protein modeling are in their infancy and the state-of-the art of such models is described. Beyond the relationship between amino acid substitution and static protein structure, protein function, and corresponding organismal fitness, other considerations are also discussed. More complex mutational processes such as insertion and deletion and domain rearrangements and even circular permutations should be evaluated. The role of intrinsically disordered proteins is still controversial, but may be increasingly important to consider. Protein geometry and protein dynamics as a deviation from static considerations of protein structure are also important. Protein expression level is known to be a major determinant of evolutionary rate and several considerations including selection at the mRNA level and the role of interaction specificity are discussed. Lastly, the relationship between modeling and needed high-throughput experimental data as well as experimental examination of protein evolution using ancestral sequence resurrection and in vitro biochemistry are presented, towards an aim of ultimately generating better models for biological inference and prediction. PMID:22528593
De Cegli, Rossella; Iacobacci, Simona; Flore, Gemma; Gambardella, Gennaro; Mao, Lei; Cutillo, Luisa; Lauria, Mario; Klose, Joachim; Illingworth, Elizabeth; Banfi, Sandro; di Bernardo, Diego
2013-01-01
Gene expression profiles can be used to infer previously unknown transcriptional regulatory interaction among thousands of genes, via systems biology 'reverse engineering' approaches. We 'reverse engineered' an embryonic stem (ES)-specific transcriptional network from 171 gene expression profiles, measured in ES cells, to identify master regulators of gene expression ('hubs'). We discovered that E130012A19Rik (E13), highly expressed in mouse ES cells as compared with differentiated cells, was a central 'hub' of the network. We demonstrated that E13 is a protein-coding gene implicated in regulating the commitment towards the different neuronal subtypes and glia cells. The overexpression and knock-down of E13 in ES cell lines, undergoing differentiation into neurons and glia cells, caused a strong up-regulation of the glutamatergic neurons marker Vglut2 and a strong down-regulation of the GABAergic neurons marker GAD65 and of the radial glia marker Blbp. We confirmed E13 expression in the cerebral cortex of adult mice and during development. By immuno-based affinity purification, we characterized protein partners of E13, involved in the Polycomb complex. Our results suggest a role of E13 in regulating the division between glutamatergic projection neurons and GABAergic interneurons and glia cells possibly by epigenetic-mediated transcriptional regulation.
Insights into the noncoding RNome of nitrogen-fixing endosymbiotic α-proteobacteria.
Jiménez-Zurdo, José I; Valverde, Claudio; Becker, Anke
2013-02-01
Symbiotic chronic infection of legumes by rhizobia involves transition of invading bacteria from a free-living environment in soil to an intracellular state as differentiated nitrogen-fixing bacteroids within the nodules elicited in the host plant. The adaptive flexibility demanded by this complex lifestyle is likely facilitated by the large set of regulatory proteins encoded by rhizobial genomes. However, proteins are not the only relevant players in the regulation of gene expression in bacteria. Large-scale high-throughput analysis of prokaryotic genomes is evidencing the expression of an unexpected plethora of small untranslated transcripts (sRNAs) with housekeeping or regulatory roles. sRNAs mostly act in response to environmental cues as post-transcriptional regulators of gene expression through protein-assisted base-pairing interactions with target mRNAs. Riboregulation contributes to fine-tune a wide range of bacterial processes which, in intracellular animal pathogens, largely compromise virulence traits. Here, we summarize the incipient knowledge about the noncoding RNome structure of nitrogen-fixing endosymbiotic bacteria as inferred from genome-wide searches for sRNA genes in the alfalfa partner Sinorhizobium meliloti and further comparative genomics analysis. The biology of relevant S. meliloti RNA chaperones (e.g., Hfq) is also reviewed as a first global indicator of the impact of riboregulation in the establishment of the symbiotic interaction.
Tessier-Cloutier, Basile; Soslow, Robert A; Stewart, Colin J R; Köbel, Martin; Lee, Cheng-Han
2018-04-19
Dedifferentiated endometrial carcinomas (DDECs)/undifferentiated endometrial carcinomas (UECs) are aggressive endometrial cancers with frequent genomic inactivation of core components of switch/sucrose non-fermentable (SWI/SNF) complex proteins. Claudin-4, an epithelial intercellular tight junction protein, was recently found to be expressed in SWI/SNF-deficient undifferentiated carcinomas but not in SWI/SNF-deficient sarcomas. The aim of this study was to examine claudin-4 expression in UECs/DDECs and other high-grade uterine carcinomas. We examined claudin-4 expression by immunohistochemistry (clone 3E2C1) on tissue microarrays that contained 44 UECs/DDECs (24 SWI/SNF-deficient), 50 carcinosarcomas, 164 grade 3 endometrioid carcinomas, 57 serous carcinomas, and 20 clear cell carcinomas. Tumours with <5% claudin-4 expression were considered to be negative. Nearly all SWI/SNF-deficient, and most SWI/SNF-proficient, UECs/DDECs showed a complete absence of claudin-4 expression in the undifferentiated component, whereas the differentiated component in DDECs showed consistent and diffuse claudin-4 expression. Only one SWI/SNF-deficient DDEC showed focal expression of claudin-4 in the undifferentiated component, as compared with diffuse expression in the corresponding differentiated component. Claudin-4 expression was consistently absent in the sarcomatous component of carcinosarcoma, and it was absent in 24% of grade 3 endometrioid carcinomas and serous carcinomas. Claudin-4 expression can be absent or very focal in a subset of high-grade endometrial carcinomas, and is almost always absent in the undifferentiated components of SWI/SNF-deficient UECs/DDECs, despite the apparent epithelial origin in the case of DDECs. Therefore, claudin-4 expression cannot be used to infer mesenchymal or epithelial tumour origin in the endometrium. The consistent loss or down-regulation of claudin-4, a tight junction protein, in SWI/SNF-deficient UECs/DDECs further supports the undifferentiated nature of these tumours. © 2018 John Wiley & Sons Ltd.
Hsp90 Promotes Kinase Evolution
Lachowiec, Jennifer; Lemus, Tzitziki; Borenstein, Elhanan; Queitsch, Christine
2015-01-01
Heat-shock protein 90 (Hsp90) promotes the maturation and stability of its client proteins, including many kinases. In doing so, Hsp90 may allow its clients to accumulate mutations as previously proposed by the capacitor hypothesis. If true, Hsp90 clients should show increased evolutionary rate compared with nonclients; however, other factors, such as gene expression and protein connectivity, may confound or obscure the chaperone’s putative contribution. Here, we compared the evolutionary rates of many Hsp90 clients and nonclients in the human protein kinase superfamily. We show that Hsp90 client status promotes evolutionary rate independently of, but in a small magnitude similar to that of gene expression and protein connectivity. Hsp90’s effect on kinase evolutionary rate was detected across mammals, specifically relaxing purifying selection. Hsp90 clients also showed increased nucleotide diversity and harbored more damaging variation than nonclient kinases across humans. These results are consistent with the central argument of the capacitor hypothesis that interaction with the chaperone allows its clients to harbor genetic variation. Hsp90 client status is thought to be highly dynamic with as few as one amino acid change rendering a protein dependent on the chaperone. Contrary to this expectation, we found that across protein kinase phylogeny Hsp90 client status tends to be gained, maintained, and shared among closely related kinases. We also infer that the ancestral protein kinase was not an Hsp90 client. Taken together, our results suggest that Hsp90 played an important role in shaping the kinase superfamily. PMID:25246701
A Machine Learning Approach to Predict Gene Regulatory Networks in Seed Development in Arabidopsis
Ni, Ying; Aghamirzaie, Delasa; Elmarakeby, Haitham; Collakova, Eva; Li, Song; Grene, Ruth; Heath, Lenwood S.
2016-01-01
Gene regulatory networks (GRNs) provide a representation of relationships between regulators and their target genes. Several methods for GRN inference, both unsupervised and supervised, have been developed to date. Because regulatory relationships consistently reprogram in diverse tissues or under different conditions, GRNs inferred without specific biological contexts are of limited applicability. In this report, a machine learning approach is presented to predict GRNs specific to developing Arabidopsis thaliana embryos. We developed the Beacon GRN inference tool to predict GRNs occurring during seed development in Arabidopsis based on a support vector machine (SVM) model. We developed both global and local inference models and compared their performance, demonstrating that local models are generally superior for our application. Using both the expression levels of the genes expressed in developing embryos and prior known regulatory relationships, GRNs were predicted for specific embryonic developmental stages. The targets that are strongly positively correlated with their regulators are mostly expressed at the beginning of seed development. Potential direct targets were identified based on a match between the promoter regions of these inferred targets and the cis elements recognized by specific regulators. Our analysis also provides evidence for previously unknown inhibitory effects of three positive regulators of gene expression. The Beacon GRN inference tool provides a valuable model system for context-specific GRN inference and is freely available at https://github.com/BeaconProjectAtVirginiaTech/beacon_network_inference.git. PMID:28066488
Jacquin, Hugo; Gilson, Amy; Shakhnovich, Eugene; Cocco, Simona; Monasson, Rémi
2016-05-01
Inverse statistical approaches to determine protein structure and function from Multiple Sequence Alignments (MSA) are emerging as powerful tools in computational biology. However the underlying assumptions of the relationship between the inferred effective Potts Hamiltonian and real protein structure and energetics remain untested so far. Here we use lattice protein model (LP) to benchmark those inverse statistical approaches. We build MSA of highly stable sequences in target LP structures, and infer the effective pairwise Potts Hamiltonians from those MSA. We find that inferred Potts Hamiltonians reproduce many important aspects of 'true' LP structures and energetics. Careful analysis reveals that effective pairwise couplings in inferred Potts Hamiltonians depend not only on the energetics of the native structure but also on competing folds; in particular, the coupling values reflect both positive design (stabilization of native conformation) and negative design (destabilization of competing folds). In addition to providing detailed structural information, the inferred Potts models used as protein Hamiltonian for design of new sequences are able to generate with high probability completely new sequences with the desired folds, which is not possible using independent-site models. Those are remarkable results as the effective LP Hamiltonians used to generate MSA are not simple pairwise models due to the competition between the folds. Our findings elucidate the reasons for the success of inverse approaches to the modelling of proteins from sequence data, and their limitations.
From laws of inference to protein folding dynamics.
Tseng, Chih-Yuan; Yu, Chun-Ping; Lee, H C
2010-08-01
Protein folding dynamics is one of major issues constantly investigated in the study of protein functions. The molecular dynamic (MD) simulation with the replica exchange method (REM) is a common theoretical approach considered. Yet a trade-off in applying the REM is that the dynamics toward the native configuration in the simulations seems lost. In this work, we show that given REM-MD simulation results, protein folding dynamics can be directly derived from laws of inference. The applicability of the resulting approach, the entropic folding dynamics, is illustrated by investigating a well-studied Trp-cage peptide. Our results are qualitatively comparable with those from other studies. The current studies suggest that the incorporation of laws of inference and physics brings in a comprehensive perspective on exploring the protein folding dynamics.
Zhu, Jiewei; Huang, Xiuli; Liu, Tong; Gao, Shigang; Chen, Jie
2012-08-01
ZmDIP was cloned and its function against Curvularia lunata was analyzed, according to a previous finding on a drought-inducible protein in resistant maize identified through MALDI-TOF-MS/MS. The ZmDIP expression varied in roots, leaf sheaths, and young, as well as old, leaves of different maize inbred lines. The ZmDIP transcript level changed in leaves over the course of time after inoculation with C. lunata. A prokaryotic expression analysis demonstrated that the gene can regulate the salt stress tolerance of Escherichia coli. The ZmDIP transient expression in the maize leaf showed that the gene was also linked to leaf resistance against the C. lunata infection. ZmDIP-mediated ROS and ABA signaling pathways were inferred to be closely associated with maize leaf resistance to the pathogen infection.
Bhadra, Pratiti; Pal, Debnath
2017-04-01
Dynamics is integral to the function of proteins, yet the use of molecular dynamics (MD) simulation as a technique remains under-explored for molecular function inference. This is more important in the context of genomics projects where novel proteins are determined with limited evolutionary information. Recently we developed a method to match the query protein's flexible segments to infer function using a novel approach combining analysis of residue fluctuation-graphs and auto-correlation vectors derived from coarse-grained (CG) MD trajectory. The method was validated on a diverse dataset with sequence identity between proteins as low as 3%, with high function-recall rates. Here we share its implementation as a publicly accessible web service, named DynFunc (Dynamics Match for Function) to query protein function from ≥1 µs long CG dynamics trajectory information of protein subunits. Users are provided with the custom-developed coarse-grained molecular mechanics (CGMM) forcefield to generate the MD trajectories for their protein of interest. On upload of trajectory information, the DynFunc web server identifies specific flexible regions of the protein linked to putative molecular function. Our unique application does not use evolutionary information to infer molecular function from MD information and can, therefore, work for all proteins, including moonlighting and the novel ones, whenever structural information is available. Our pipeline is expected to be of utility to all structural biologists working with novel proteins and interested in moonlighting functions. Copyright © 2017 Elsevier Ltd. All rights reserved.
Amino acid sequence of a trypsin inhibitor from a Spirometra (Spirometra erinaceieuropaei).
Sanda, A; Uchida, A; Itagaki, T; Kobayashi, H; Inokuchi, N; Koyama, T; Iwama, M; Ohgi, K; Irie, M
2001-12-01
A trypsin inhibitor that is highly homologous with bovine pancreatic trypsin inhibitor (BPTI) was co-purified along with RNase from Spirometra (Spirometra erinaceieuropaei). The amino acid sequence of this inhibitor (SETI) and the nucleotide sequence of the cDNA encoding this protein were determined by protein chemistry and gene technology. SETI contains 68 amino acid residues and has a molecular mass of 7,798 Da. SETI has 31 amino acid residues that are identical with BPTI's sequence, including 6 half-cystine and 5 aromatic amino acid residues. The active site Lys residue in BPTI is replaced by an Arg residue in SETI. SETI is an effective inhibitor of trypsin and moderately inhibits a-chymotrypsin, but less inhibits elastase or subtilisin. SETI was expressed by E. coli containing a PelB vector carrying the SETI encoding cDNA; an expression yield of 0.68 mg/l was obtained. The phylogenetic relationship of SETI and the other BPTI-like trypsin inhibitors was analyzed using most likelihood inference methods.
Identifying the multiple dysregulated oncoproteins that contribute to tumorigenesis in a given patient is crucial for developing personalized treatment plans. However, accurate inference of aberrant protein activity in biological samples is still challenging as genetic alterations are only partially predictive and direct measurements of protein activity are generally not feasible.
Inferring the palaeoenvironment of ancient bacteria on the basis of resurrected proteins
NASA Technical Reports Server (NTRS)
Gaucher, Eric A.; Thomson, J. Michael; Burgan, Michelle F.; Benner, Steven A.
2003-01-01
Features of the physical environment surrounding an ancestral organism can be inferred by reconstructing sequences of ancient proteins made by those organisms, resurrecting these proteins in the laboratory, and measuring their properties. Here, we resurrect candidate sequences for elongation factors of the Tu family (EF-Tu) found at ancient nodes in the bacterial evolutionary tree, and measure their activities as a function of temperature. The ancient EF-Tu proteins have temperature optima of 55-65 degrees C. This value seems to be robust with respect to uncertainties in the ancestral reconstruction. This suggests that the ancient bacteria that hosted these particular genes were thermophiles, and neither hyperthermophiles nor mesophiles. This conclusion can be compared and contrasted with inferences drawn from an analysis of the lengths of branches in trees joining proteins from contemporary bacteria, the distribution of thermophily in derived bacterial lineages, the inferred G + C content of ancient ribosomal RNA, and the geological record combined with assumptions concerning molecular clocks. The study illustrates the use of experimental palaeobiochemistry and assumptions about deep phylogenetic relationships between bacteria to explore the character of ancient life.
Singh, Manish K; Tiwari, Pramod K
2016-08-01
Hsp27, a highly conserved small molecular weight heat shock protein, is widely known to be developmentally regulated and heat inducible. Its role in thermotolerance is also implicated. This study is a sequel of our earlier studies to understand the molecular organization of heat shock genes/proteins and their role in development and thermal adaptation in a sheep pest, Lucilia cuprina (blowfly), which exhibits unusually high adaptability to a variety of environmental stresses, including heat and chemicals. In this report our aim was to understand the evolutionary relationship of Lucilia hsp27 gene/protein with those of other species and its role in thermal adaptation. We sequence characterized the Lchsp27 gene (coding region) and analyzed its expression in various larval and adult tissues under normal as well as heat shock conditions. The nucleotide sequence analysis of 678 bps long-coding region of Lchsp27 exhibited closest evolutionary proximity with Drosophila (90.09%), which belongs to the same order, Diptera. Heat shock caused significant enhancement in the expression of Lchsp27 gene in all the larval and adult tissues examined, however, in a tissue specific manner. Significantly, in Malpighian tubules, while the heat-induced level of hsp27 transcript (mRNA) appeared increased as compared to control, the protein level remained unaltered and nuclear localized. We infer that Lchsp27 may have significant role in the maintenance of cellular homeostasis, particularly, during summer months, when the fly remains exposed to high heat in its natural habitat. © 2015 Institute of Zoology, Chinese Academy of Sciences.
Binding of human plasminogen by the lipoprotein LipL46 of Leptospira interrogans.
Santos, Jadson V; Pereira, Priscila R M; Fernandes, Luis G V; Siqueira, Gabriela Hase; de Souza, Gisele O; Souza Filho, Antônio; Vasconcellos, Silvio A; Heinemann, Marcos B; Chapola, Erica G B; Nascimento, Ana L T O
2018-02-01
Leptospirosis is a widespread zoonosis caused by pathogenic Leptospira. Bacteria disseminate via the bloodstream and colonize the renal tubules of reservoir hosts. Leptospiral surface-exposed proteins are important targets, because due to their location they can elicit immune response and mediate adhesion and invasion processes. LipL46 has been previously reported to be located at the leptospiral outer membrane and recognized by antibodies present in serum of infected hamsters. In this study, we have confirmed the cellular location of this protein by immunofluorescence and FACS. We have cloned and expressed the recombinant protein LipL46 in its soluble form. LipL46 was recognized by confirmed leptospirosis human serum, suggesting its expression during infection. Binding screening of LipL46 with extracellular matrix (ECM) and plasma components showed that this protein interacts with plasminogen. The binding is dose-dependent on protein concentration, but saturation was not reached with the range of protein concentration used. Kringle domains of plasminogen and lysine residues of the recombinant protein are involved in the binding because the lysine analog, amino caproic acid (ACA) almost totally inhibited the reaction. The interaction of LipL46 with plasminogen generates plasmin in the presence of plasminogen activator uPA. Because plasmin generated at the leptospiral surface can degrade ECM molecules and decrease opsonophagocytosis, we tentatively infer that Lip46 has a role in helping the invasion process of pathogenic Leptospira. Copyright © 2017. Published by Elsevier Ltd.
PIA: An Intuitive Protein Inference Engine with a Web-Based User Interface.
Uszkoreit, Julian; Maerkens, Alexandra; Perez-Riverol, Yasset; Meyer, Helmut E; Marcus, Katrin; Stephan, Christian; Kohlbacher, Oliver; Eisenacher, Martin
2015-07-02
Protein inference connects the peptide spectrum matches (PSMs) obtained from database search engines back to proteins, which are typically at the heart of most proteomics studies. Different search engines yield different PSMs and thus different protein lists. Analysis of results from one or multiple search engines is often hampered by different data exchange formats and lack of convenient and intuitive user interfaces. We present PIA, a flexible software suite for combining PSMs from different search engine runs and turning these into consistent results. PIA can be integrated into proteomics data analysis workflows in several ways. A user-friendly graphical user interface can be run either locally or (e.g., for larger core facilities) from a central server. For automated data processing, stand-alone tools are available. PIA implements several established protein inference algorithms and can combine results from different search engines seamlessly. On several benchmark data sets, we show that PIA can identify a larger number of proteins at the same protein FDR when compared to that using inference based on a single search engine. PIA supports the majority of established search engines and data in the mzIdentML standard format. It is implemented in Java and freely available at https://github.com/mpc-bioinformatics/pia.
Yang, Jia-Sin; Lin, Chiao-Wen; Hsieh, Yi-Hsien; Chien, Ming-Hsien; Chuang, Chun-Yi; Yang, Shun-Fa
2017-10-10
Oral cancer is a solid malignant tumor that is prone to occur following hypoxia. There are no clear studies showing a link between hypoxia and oral carcinogenesis. Carbonic anhydrase IX (CAIX), which is a hypoxia-induced transmembrane protein, is highly expressed in various types of human cancer. However, the effects of CAIX on the metastasis of human oral cancer cells and the underlying molecular mechanisms have not been clarified. In this study, we observed that CAIX overexpression increased the migratory and invasive abilities of SCC-9 and SAS cells. In addition, CAIX overexpression increased the mRNA and protein expression of matrix metalloproteinase-9 (MMP-9) and the phosphorylation of focal adhesion kinase (FAK), steroid receptor coactivator (Src), and extracellular signal-regulated kinase 1/2 signaling proteins. CAIX overexpression also increased the binding capacity of nuclear factor-κB (NF-κB), c-Jun, and c-Fos on the MMP-9 gene promoter. In addition, treatment with MMP-9 short hairpin RNA, an MMP inhibitor (GM6001), an FAK mutant, or an MEK inhibitor (U0126) inhibited CAIX-induced cell motility in SCC-9 cells. Moreover, data sets from The Cancer Genome Atlas demonstrated that CAIX expression was significantly associated with advanced progression and poor survival in oral cancer. In conclusion, it can be inferred that CAIX overexpression induces MMP-9 gene expression, which consequently induces the metastasis of oral cancer cells.
Lu, S; Halberg, R; Kroos, L
1990-01-01
During sporulation of the Gram-positive bacterium Bacillus subtilis, transcription of genes encoding spore coat proteins in the mother-cell compartment of the sporangium is controlled by RNA polymerase containing the sigma subunit called sigma K. Based on comparison of the N-terminal amino acid sequence of sigma K with the nucleotide sequence of the gene encoding sigma K (sigK), the primary product of sigK was inferred to be a pro-protein (pro-sigma K) with 20 extra amino acids at the N terminus. Using antibodies generated against pro-sigma K, we have detected pro-sigma K beginning at the third hour of sporulation and sigma K beginning about 1 hr later. Even when pro-sigma K is expressed artificially during growth and throughout sporulation, sigma K appears at the normal time and expression of a sigma K-controlled gene occurs normally. These results suggest that pro-sigma K is an inactive precursor that is proteolytically processed to active sigma K in a developmentally regulated fashion. Mutations that block forespore gene expression block accumulation of sigma K but not accumulation of pro-sigma K, suggesting that pro-sigma K processing is a regulatory device that couples the programs of gene expression in the two compartments of the sporangium. We propose that this regulatory device ensures completion of forespore morphogenesis prior to the synthesis in the mother-cell of spore coat proteins that will encase the forespore. Images PMID:2124700
Yang, Jia-Sin; Lin, Chiao-Wen; Hsieh, Yi-Hsien; Chien, Ming-Hsien; Chuang, Chun-Yi; Yang, Shun-Fa
2017-01-01
Oral cancer is a solid malignant tumor that is prone to occur following hypoxia. There are no clear studies showing a link between hypoxia and oral carcinogenesis. Carbonic anhydrase IX (CAIX), which is a hypoxia-induced transmembrane protein, is highly expressed in various types of human cancer. However, the effects of CAIX on the metastasis of human oral cancer cells and the underlying molecular mechanisms have not been clarified. In this study, we observed that CAIX overexpression increased the migratory and invasive abilities of SCC-9 and SAS cells. In addition, CAIX overexpression increased the mRNA and protein expression of matrix metalloproteinase-9 (MMP-9) and the phosphorylation of focal adhesion kinase (FAK), steroid receptor coactivator (Src), and extracellular signal-regulated kinase 1/2 signaling proteins. CAIX overexpression also increased the binding capacity of nuclear factor-κB (NF-κB), c-Jun, and c-Fos on the MMP-9 gene promoter. In addition, treatment with MMP-9 short hairpin RNA, an MMP inhibitor (GM6001), an FAK mutant, or an MEK inhibitor (U0126) inhibited CAIX-induced cell motility in SCC-9 cells. Moreover, data sets from The Cancer Genome Atlas demonstrated that CAIX expression was significantly associated with advanced progression and poor survival in oral cancer. In conclusion, it can be inferred that CAIX overexpression induces MMP-9 gene expression, which consequently induces the metastasis of oral cancer cells. PMID:29137326
Jiang, T; Jiang, C-Y; Shu, J-H; Xu, Y-J
2017-07-10
The molecular mechanism of nasopharyngeal carcinoma (NPC) is poorly understood and effective therapeutic approaches are needed. This research aimed to excavate the attractor modules involved in the progression of NPC and provide further understanding of the underlying mechanism of NPC. Based on the gene expression data of NPC, two specific protein-protein interaction networks for NPC and control conditions were re-weighted using Pearson correlation coefficient. Then, a systematic tracking of candidate modules was conducted on the re-weighted networks via cliques algorithm, and a total of 19 and 38 modules were separately identified from NPC and control networks, respectively. Among them, 8 pairs of modules with similar gene composition were selected, and 2 attractor modules were identified via the attract method. Functional analysis indicated that these two attractor modules participate in one common bioprocess of cell division. Based on the strategy of integrating systemic module inference with the attract method, we successfully identified 2 attractor modules. These attractor modules might play important roles in the molecular pathogenesis of NPC via affecting the bioprocess of cell division in a conjunct way. Further research is needed to explore the correlations between cell division and NPC.
Li, Jieyue; Xiong, Liang; Schneider, Jeff; Murphy, Robert F
2012-06-15
Knowledge of the subcellular location of a protein is crucial for understanding its functions. The subcellular pattern of a protein is typically represented as the set of cellular components in which it is located, and an important task is to determine this set from microscope images. In this article, we address this classification problem using confocal immunofluorescence images from the Human Protein Atlas (HPA) project. The HPA contains images of cells stained for many proteins; each is also stained for three reference components, but there are many other components that are invisible. Given one such cell, the task is to classify the pattern type of the stained protein. We first randomly select local image regions within the cells, and then extract various carefully designed features from these regions. This region-based approach enables us to explicitly study the relationship between proteins and different cell components, as well as the interactions between these components. To achieve these two goals, we propose two discriminative models that extend logistic regression with structured latent variables. The first model allows the same protein pattern class to be expressed differently according to the underlying components in different regions. The second model further captures the spatial dependencies between the components within the same cell so that we can better infer these components. To learn these models, we propose a fast approximate algorithm for inference, and then use gradient-based methods to maximize the data likelihood. In the experiments, we show that the proposed models help improve the classification accuracies on synthetic data and real cellular images. The best overall accuracy we report in this article for classifying 942 proteins into 13 classes of patterns is about 84.6%, which to our knowledge is the best so far. In addition, the dependencies learned are consistent with prior knowledge of cell organization. http://murphylab.web.cmu.edu/software/.
Rozpedek, Wioletta; Markiewicz, Lukasz; Diehl, J Alan; Pytel, Dariusz; Majsterek, Ireneusz
2015-01-01
Recent evidence suggests that the development of Alzheimer's disease (AD) and related cognitive loss is due to mutations in the Amyloid Precursor Protein (APP) gene on chromosome 21 and increased activation of eukaryotic translation initiation factor-2α (eIF2α) phosphorylation. The high level of misfolded and unfolded proteins loading in Endoplasmic Reticulum (ER) lumen triggers ER stress and as a result Unfolded Protein Response (UPR) pathways are activated. Stress-dependent activation of the protein kinase RNA-like endoplasmic reticulum kinase (PERK) leads to the significant elevation of phospho-eIF2α. That attenuates general translation and, on the other hand, promotes the preferential synthesis of Activating Transcription Factor 4 (ATF4) and secretase β (BACE1) - a pivotal enzyme responsible for the initiation of the amyloidogenic pathway resulting in the generation of the amyloid β (Aβ) variant with high ability to form toxic senile plaques in AD brains. Moreover, excessive, long-term stress conditions may contribute to inducing neuronal death by apoptosis as a result of the overactivated expression of pro-apoptotic proteins via ATF4. These findings allow to infer that dysregulated translation, increased expression of BACE1 and ATF4, as a result of eIF2α phosphorylation, may be a major contributor to structural and functional neuronal loss resulting in memory impairment. Thus, blocking PERK-dependent eIF2α phosphorylation through specific, small-molecule PERK branch inhibitors seems to be a potential treatment strategy for AD individuals. That may contribute to the restoration of global translation rates and reduction of expression of ATF4 and BACE1. Hence, the treatment strategy can block accelerated β -amyloidogenesis by reduction in APP cleaving via the BACE1-dependent amyloidogenic pathway.
Zhang, Wangshu; Coba, Marcelo P; Sun, Fengzhu
2016-01-11
Protein domains can be viewed as portable units of biological function that defines the functional properties of proteins. Therefore, if a protein is associated with a disease, protein domains might also be associated and define disease endophenotypes. However, knowledge about such domain-disease relationships is rarely available. Thus, identification of domains associated with human diseases would greatly improve our understanding of the mechanism of human complex diseases and further improve the prevention, diagnosis and treatment of these diseases. Based on phenotypic similarities among diseases, we first group diseases into overlapping modules. We then develop a framework to infer associations between domains and diseases through known relationships between diseases and modules, domains and proteins, as well as proteins and disease modules. Different methods including Association, Maximum likelihood estimation (MLE), Domain-disease pair exclusion analysis (DPEA), Bayesian, and Parsimonious explanation (PE) approaches are developed to predict domain-disease associations. We demonstrate the effectiveness of all the five approaches via a series of validation experiments, and show the robustness of the MLE, Bayesian and PE approaches to the involved parameters. We also study the effects of disease modularization in inferring novel domain-disease associations. Through validation, the AUC (Area Under the operating characteristic Curve) scores for Bayesian, MLE, DPEA, PE, and Association approaches are 0.86, 0.84, 0.83, 0.83 and 0.79, respectively, indicating the usefulness of these approaches for predicting domain-disease relationships. Finally, we choose the Bayesian approach to infer domains associated with two common diseases, Crohn's disease and type 2 diabetes. The Bayesian approach has the best performance for the inference of domain-disease relationships. The predicted landscape between domains and diseases provides a more detailed view about the disease mechanisms.
A high resolution atlas of gene expression in the domestic sheep (Ovis aries)
Farquhar, Iseabail L.; Young, Rachel; Lefevre, Lucas; Pridans, Clare; Tsang, Hiu G.; Afrasiabi, Cyrus; Watson, Mick; Whitelaw, C. Bruce; Freeman, Tom C.; Archibald, Alan L.; Hume, David A.
2017-01-01
Sheep are a key source of meat, milk and fibre for the global livestock sector, and an important biomedical model. Global analysis of gene expression across multiple tissues has aided genome annotation and supported functional annotation of mammalian genes. We present a large-scale RNA-Seq dataset representing all the major organ systems from adult sheep and from several juvenile, neonatal and prenatal developmental time points. The Ovis aries reference genome (Oar v3.1) includes 27,504 genes (20,921 protein coding), of which 25,350 (19,921 protein coding) had detectable expression in at least one tissue in the sheep gene expression atlas dataset. Network-based cluster analysis of this dataset grouped genes according to their expression pattern. The principle of ‘guilt by association’ was used to infer the function of uncharacterised genes from their co-expression with genes of known function. We describe the overall transcriptional signatures present in the sheep gene expression atlas and assign those signatures, where possible, to specific cell populations or pathways. The findings are related to innate immunity by focusing on clusters with an immune signature, and to the advantages of cross-breeding by examining the patterns of genes exhibiting the greatest expression differences between purebred and crossbred animals. This high-resolution gene expression atlas for sheep is, to our knowledge, the largest transcriptomic dataset from any livestock species to date. It provides a resource to improve the annotation of the current reference genome for sheep, presenting a model transcriptome for ruminants and insight into gene, cell and tissue function at multiple developmental stages. PMID:28915238
A high resolution atlas of gene expression in the domestic sheep (Ovis aries).
Clark, Emily L; Bush, Stephen J; McCulloch, Mary E B; Farquhar, Iseabail L; Young, Rachel; Lefevre, Lucas; Pridans, Clare; Tsang, Hiu G; Wu, Chunlei; Afrasiabi, Cyrus; Watson, Mick; Whitelaw, C Bruce; Freeman, Tom C; Summers, Kim M; Archibald, Alan L; Hume, David A
2017-09-01
Sheep are a key source of meat, milk and fibre for the global livestock sector, and an important biomedical model. Global analysis of gene expression across multiple tissues has aided genome annotation and supported functional annotation of mammalian genes. We present a large-scale RNA-Seq dataset representing all the major organ systems from adult sheep and from several juvenile, neonatal and prenatal developmental time points. The Ovis aries reference genome (Oar v3.1) includes 27,504 genes (20,921 protein coding), of which 25,350 (19,921 protein coding) had detectable expression in at least one tissue in the sheep gene expression atlas dataset. Network-based cluster analysis of this dataset grouped genes according to their expression pattern. The principle of 'guilt by association' was used to infer the function of uncharacterised genes from their co-expression with genes of known function. We describe the overall transcriptional signatures present in the sheep gene expression atlas and assign those signatures, where possible, to specific cell populations or pathways. The findings are related to innate immunity by focusing on clusters with an immune signature, and to the advantages of cross-breeding by examining the patterns of genes exhibiting the greatest expression differences between purebred and crossbred animals. This high-resolution gene expression atlas for sheep is, to our knowledge, the largest transcriptomic dataset from any livestock species to date. It provides a resource to improve the annotation of the current reference genome for sheep, presenting a model transcriptome for ruminants and insight into gene, cell and tissue function at multiple developmental stages.
Analytical results for a stochastic model of gene expression with arbitrary partitioning of proteins
NASA Astrophysics Data System (ADS)
Tschirhart, Hugo; Platini, Thierry
2018-05-01
In biophysics, the search for analytical solutions of stochastic models of cellular processes is often a challenging task. In recent work on models of gene expression, it was shown that a mapping based on partitioning of Poisson arrivals (PPA-mapping) can lead to exact solutions for previously unsolved problems. While the approach can be used in general when the model involves Poisson processes corresponding to creation or degradation, current applications of the method and new results derived using it have been limited to date. In this paper, we present the exact solution of a variation of the two-stage model of gene expression (with time dependent transition rates) describing the arbitrary partitioning of proteins. The methodology proposed makes full use of the PPA-mapping by transforming the original problem into a new process describing the evolution of three biological switches. Based on a succession of transformations, the method leads to a hierarchy of reduced models. We give an integral expression of the time dependent generating function as well as explicit results for the mean, variance, and correlation function. Finally, we discuss how results for time dependent parameters can be extended to the three-stage model and used to make inferences about models with parameter fluctuations induced by hidden stochastic variables.
Xu, Dongkui; Liu, Shikai; Zhang, Liang; Song, Lili
2017-04-01
The dysregulated molecules and their involvement in lymph node metastases of cervical cancer are far from been fully revealed. In this study, by reviewing MUC4 expression in The Human Protein Atlas and retrieving gene microarray data in GEO dataset (No. GDS4664), we found that MUC4 upregulation is associated with lymph node metastasis in cervical cancer. Knockdown of MUC4 in Hela and SiHa cells significantly reduced their invasion and also reduced the mesenchymal properties. By performing bioinformatics analysis, we observed that miR-211 is a potential suppressor of MUC4, which has a predicted highly conserved binding site in the 3'UTR of MUC among mammals. The following assays confirmed that miR-211 can directly target the 3'UTR of MUC4 and inhibit its expression at both mRNA and protein levels. In addition, enforced miR-211 expression phenocopies the effects of MUC4 siRNA in inhibiting cervical cancer cell invasion and reversing EMT properties. Therefore, we infer that miR-211 is a novel miRNA with suppressive effect on MUC4 expression and can inhibit cervical cancer cell invasion and EMT. Copyright © 2016. Published by Elsevier Inc.
A grammar inference approach for predicting kinase specific phosphorylation sites.
Datta, Sutapa; Mukhopadhyay, Subhasis
2015-01-01
Kinase mediated phosphorylation site detection is the key mechanism of post translational mechanism that plays an important role in regulating various cellular processes and phenotypes. Many diseases, like cancer are related with the signaling defects which are associated with protein phosphorylation. Characterizing the protein kinases and their substrates enhances our ability to understand the mechanism of protein phosphorylation and extends our knowledge of signaling network; thereby helping us to treat such diseases. Experimental methods for predicting phosphorylation sites are labour intensive and expensive. Also, manifold increase of protein sequences in the databanks over the years necessitates the improvement of high speed and accurate computational methods for predicting phosphorylation sites in protein sequences. Till date, a number of computational methods have been proposed by various researchers in predicting phosphorylation sites, but there remains much scope of improvement. In this communication, we present a simple and novel method based on Grammatical Inference (GI) approach to automate the prediction of kinase specific phosphorylation sites. In this regard, we have used a popular GI algorithm Alergia to infer Deterministic Stochastic Finite State Automata (DSFA) which equally represents the regular grammar corresponding to the phosphorylation sites. Extensive experiments on several datasets generated by us reveal that, our inferred grammar successfully predicts phosphorylation sites in a kinase specific manner. It performs significantly better when compared with the other existing phosphorylation site prediction methods. We have also compared our inferred DSFA with two other GI inference algorithms. The DSFA generated by our method performs superior which indicates that our method is robust and has a potential for predicting the phosphorylation sites in a kinase specific manner.
Chakraborty, Sandeep; Nascimento, Rafael; Zaini, Paulo A; Gouran, Hossein; Rao, Basuthkar J; Goulart, Luiz R; Dandekar, Abhaya M
2016-01-01
Background. Xylella fastidiosa, the causative agent of various plant diseases including Pierce's disease in the US, and Citrus Variegated Chlorosis in Brazil, remains a continual source of concern and economic losses, especially since almost all commercial varieties are sensitive to this Gammaproteobacteria. Differential expression of proteins in infected tissue is an established methodology to identify key elements involved in plant defense pathways. Methods. In the current work, we developed a methodology named CHURNER that emphasizes relevant protein functions from proteomic data, based on identification of proteins with similar structures that do not necessarily have sequence homology. Such clustering emphasizes protein functions which have multiple copies that are up/down-regulated, and highlights similar proteins which are differentially regulated. As a working example we present proteomic data enumerating differentially expressed proteins in xylem sap from grapevines that were infected with X. fastidiosa. Results. Analysis of this data by CHURNER highlighted pathogenesis related PR-1 proteins, reinforcing this as the foremost protein function in xylem sap involved in the grapevine defense response to X. fastidiosa. β-1, 3-glucanase, which has both anti-microbial and anti-fungal activities, is also up-regulated. Simultaneously, chitinases are found to be both up and down-regulated by CHURNER, and thus the net gain of this protein function loses its significance in the defense response. Discussion. We demonstrate how structural data can be incorporated in the pipeline of proteomic data analysis prior to making inferences on the importance of individual proteins to plant defense mechanisms. We expect CHURNER to be applicable to any proteomic data set.
De Cegli, Rossella; Iacobacci, Simona; Flore, Gemma; Gambardella, Gennaro; Mao, Lei; Cutillo, Luisa; Lauria, Mario; Klose, Joachim; Illingworth, Elizabeth; Banfi, Sandro; di Bernardo, Diego
2013-01-01
Gene expression profiles can be used to infer previously unknown transcriptional regulatory interaction among thousands of genes, via systems biology ‘reverse engineering’ approaches. We ‘reverse engineered’ an embryonic stem (ES)-specific transcriptional network from 171 gene expression profiles, measured in ES cells, to identify master regulators of gene expression (‘hubs’). We discovered that E130012A19Rik (E13), highly expressed in mouse ES cells as compared with differentiated cells, was a central ‘hub’ of the network. We demonstrated that E13 is a protein-coding gene implicated in regulating the commitment towards the different neuronal subtypes and glia cells. The overexpression and knock-down of E13 in ES cell lines, undergoing differentiation into neurons and glia cells, caused a strong up-regulation of the glutamatergic neurons marker Vglut2 and a strong down-regulation of the GABAergic neurons marker GAD65 and of the radial glia marker Blbp. We confirmed E13 expression in the cerebral cortex of adult mice and during development. By immuno-based affinity purification, we characterized protein partners of E13, involved in the Polycomb complex. Our results suggest a role of E13 in regulating the division between glutamatergic projection neurons and GABAergic interneurons and glia cells possibly by epigenetic-mediated transcriptional regulation. PMID:23180766
Trypsteen, Wim; Mohammadi, Pejman; Van Hecke, Clarissa; Mestdagh, Pieter; Lefever, Steve; Saeys, Yvan; De Bleser, Pieter; Vandesompele, Jo; Ciuffi, Angela; Vandekerckhove, Linos; De Spiegelaere, Ward
2016-10-26
Studying the effects of HIV infection on the host transcriptome has typically focused on protein-coding genes. However, recent advances in the field of RNA sequencing revealed that long non-coding RNAs (lncRNAs) add an extensive additional layer to the cell's molecular network. Here, we performed transcriptome profiling throughout a primary HIV infection in vitro to investigate lncRNA expression at the different HIV replication cycle processes (reverse transcription, integration and particle production). Subsequently, guilt-by-association, transcription factor and co-expression analysis were performed to infer biological roles for the lncRNAs identified in the HIV-host interplay. Many lncRNAs were suggested to play a role in mechanisms relying on proteasomal and ubiquitination pathways, apoptosis, DNA damage responses and cell cycle regulation. Through transcription factor binding analysis, we found that lncRNAs display a distinct transcriptional regulation profile as compared to protein coding mRNAs, suggesting that mRNAs and lncRNAs are independently modulated. In addition, we identified five differentially expressed lncRNA-mRNA pairs with mRNA involvement in HIV pathogenesis with possible cis regulatory lncRNAs that control nearby mRNA expression and function. Altogether, the present study demonstrates that lncRNAs add a new dimension to the HIV-host interplay and should be further investigated as they may represent targets for controlling HIV replication.
Inferring gene expression from ribosomal promoter sequences, a crowdsourcing approach
Meyer, Pablo; Siwo, Geoffrey; Zeevi, Danny; Sharon, Eilon; Norel, Raquel; Segal, Eran; Stolovitzky, Gustavo; Siwo, Geoffrey; Rider, Andrew K.; Tan, Asako; Pinapati, Richard S.; Emrich, Scott; Chawla, Nitesh; Ferdig, Michael T.; Tung, Yi-An; Chen, Yong-Syuan; Chen, Mei-Ju May; Chen, Chien-Yu; Knight, Jason M.; Sahraeian, Sayed Mohammad Ebrahim; Esfahani, Mohammad Shahrokh; Dreos, Rene; Bucher, Philipp; Maier, Ezekiel; Saeys, Yvan; Szczurek, Ewa; Myšičková, Alena; Vingron, Martin; Klein, Holger; Kiełbasa, Szymon M.; Knisley, Jeff; Bonnell, Jeff; Knisley, Debra; Kursa, Miron B.; Rudnicki, Witold R.; Bhattacharjee, Madhuchhanda; Sillanpää, Mikko J.; Yeung, James; Meysman, Pieter; Rodríguez, Aminael Sánchez; Engelen, Kristof; Marchal, Kathleen; Huang, Yezhou; Mordelet, Fantine; Hartemink, Alexander; Pinello, Luca; Yuan, Guo-Cheng
2013-01-01
The Gene Promoter Expression Prediction challenge consisted of predicting gene expression from promoter sequences in a previously unknown experimentally generated data set. The challenge was presented to the community in the framework of the sixth Dialogue for Reverse Engineering Assessments and Methods (DREAM6), a community effort to evaluate the status of systems biology modeling methodologies. Nucleotide-specific promoter activity was obtained by measuring fluorescence from promoter sequences fused upstream of a gene for yellow fluorescence protein and inserted in the same genomic site of yeast Saccharomyces cerevisiae. Twenty-one teams submitted results predicting the expression levels of 53 different promoters from yeast ribosomal protein genes. Analysis of participant predictions shows that accurate values for low-expressed and mutated promoters were difficult to obtain, although in the latter case, only when the mutation induced a large change in promoter activity compared to the wild-type sequence. As in previous DREAM challenges, we found that aggregation of participant predictions provided robust results, but did not fare better than the three best algorithms. Finally, this study not only provides a benchmark for the assessment of methods predicting activity of a specific set of promoters from their sequence, but it also shows that the top performing algorithm, which used machine-learning approaches, can be improved by the addition of biological features such as transcription factor binding sites. PMID:23950146
Liu, Bin; Govindan, Ramesh; Uzzi, Brian
2016-01-01
Emotions are increasingly inferred linguistically from online data with a goal of predicting off-line behavior. Yet, it is unknown whether emotions inferred linguistically from online communications correlate with actual changes in off-line activity. We analyzed all 886,000 trading decisions and 1,234,822 instant messages of 30 professional day traders over a continuous 2 year period. Linguistically inferring the traders' emotional states from instant messages, we find that emotions expressed in online communications reflect the same distributions of emotions found in controlled experiments done on traders. Further, we find that expressed online emotions predict the profitability of actual trading behavior. Relative to their baselines, traders who expressed little emotion or traders that expressed high levels of emotion made relatively unprofitable trades. Conversely, traders expressing moderate levels of emotional activation made relatively profitable trades.
Liu, Bin; Govindan, Ramesh; Uzzi, Brian
2016-01-01
Emotions are increasingly inferred linguistically from online data with a goal of predicting off-line behavior. Yet, it is unknown whether emotions inferred linguistically from online communications correlate with actual changes in off-line activity. We analyzed all 886,000 trading decisions and 1,234,822 instant messages of 30 professional day traders over a continuous 2 year period. Linguistically inferring the traders’ emotional states from instant messages, we find that emotions expressed in online communications reflect the same distributions of emotions found in controlled experiments done on traders. Further, we find that expressed online emotions predict the profitability of actual trading behavior. Relative to their baselines, traders who expressed little emotion or traders that expressed high levels of emotion made relatively unprofitable trades. Conversely, traders expressing moderate levels of emotional activation made relatively profitable trades. PMID:26765539
Zhang, Yaoyang; Xu, Tao; Shan, Bing; Hart, Jonathan; Aslanian, Aaron; Han, Xuemei; Zong, Nobel; Li, Haomin; Choi, Howard; Wang, Dong; Acharya, Lipi; Du, Lisa; Vogt, Peter K; Ping, Peipei; Yates, John R
2015-11-03
Shotgun proteomics generates valuable information from large-scale and target protein characterizations, including protein expression, protein quantification, protein post-translational modifications (PTMs), protein localization, and protein-protein interactions. Typically, peptides derived from proteolytic digestion, rather than intact proteins, are analyzed by mass spectrometers because peptides are more readily separated, ionized and fragmented. The amino acid sequences of peptides can be interpreted by matching the observed tandem mass spectra to theoretical spectra derived from a protein sequence database. Identified peptides serve as surrogates for their proteins and are often used to establish what proteins were present in the original mixture and to quantify protein abundance. Two major issues exist for assigning peptides to their originating protein. The first issue is maintaining a desired false discovery rate (FDR) when comparing or combining multiple large datasets generated by shotgun analysis and the second issue is properly assigning peptides to proteins when homologous proteins are present in the database. Herein we demonstrate a new computational tool, ProteinInferencer, which can be used for protein inference with both small- or large-scale data sets to produce a well-controlled protein FDR. In addition, ProteinInferencer introduces confidence scoring for individual proteins, which makes protein identifications evaluable. This article is part of a Special Issue entitled: Computational Proteomics. Copyright © 2015. Published by Elsevier B.V.
Genetic network inference as a series of discrimination tasks.
Kimura, Shuhei; Nakayama, Satoshi; Hatakeyama, Mariko
2009-04-01
Genetic network inference methods based on sets of differential equations generally require a great deal of time, as the equations must be solved many times. To reduce the computational cost, researchers have proposed other methods for inferring genetic networks by solving sets of differential equations only a few times, or even without solving them at all. When we try to obtain reasonable network models using these methods, however, we must estimate the time derivatives of the gene expression levels with great precision. In this study, we propose a new method to overcome the drawbacks of inference methods based on sets of differential equations. Our method infers genetic networks by obtaining classifiers capable of predicting the signs of the derivatives of the gene expression levels. For this purpose, we defined a genetic network inference problem as a series of discrimination tasks, then solved the defined series of discrimination tasks with a linear programming machine. Our experimental results demonstrated that the proposed method is capable of correctly inferring genetic networks, and doing so more than 500 times faster than the other inference methods based on sets of differential equations. Next, we applied our method to actual expression data of the bacterial SOS DNA repair system. And finally, we demonstrated that our approach relates to the inference method based on the S-system model. Though our method provides no estimation of the kinetic parameters, it should be useful for researchers interested only in the network structure of a target system. Supplementary data are available at Bioinformatics online.
Ren, Dongqing; Jin, Juan; Li, Xiaojuan; Zeng, Guiying
2008-01-01
To explore the bio-effects of electromagnetic pulse(EMP) on mouse small intestines induced by means of gene chip. Twelve BALB/c mice were randomly assigned to the normal control group and the EMP group with 6 in each group. The EMP group was irradiated with 200 kV/m, 200 pulses EMP. 18 hours after the irradiation, the mice were sacrificed and their jejunum of small intestines were eviscerated. The fluorescent cDNA probes labeled with Cy3 and Cy5 were prepared from RNA extracted from the intestines of the two groups. Probes of the two groups were then hybridized against cDNA gene chip, the fluorescent signals were scanned with a scanner and the results were analyzed by computer. Compared with the control, 56 genes in gene expression profile were altered. The expression levels of 37 genes were up-regulated distinctly while 19 genes were down-regulated significantly. Among the 56 genes, 19 were reported with known or inferred functions, 12 up-regulated genes were catenin alpha 1 (alpha-catenin), ly-6 alloantigen(Ly-6E), fructose-6-phosphate transaminase (GF6P), ribosomal protein S17 (rpS17), small proline-rich protein 2A (Sprr2a), glandular kallikrein27 (GK27), lipoxygenase-3, aldo-keto reductase (Akr1c12), GSG1, amylase 2 (Amy2),elastase 2, p6-5 gene and 7 down-regulated genes were junctional adhesion molecule (Jam), protein arginine methyltransferase (Carm1),NNP-1, 2-5 A synthetase L2,Mlark gene, ATP synthase alpha subunit, uncoupling protein-2 (Ucp2) gene; the other 37 were reported with unknown functions. EMP irradiation could induce specific expressions of some genes in mouse small intestines and most of these genes were up-regulated ones.
Cystic Fibrosis Gene Encodes a cAMP-Dependent Chloride Channel in Heart
NASA Astrophysics Data System (ADS)
Hart, Padraig; Warth, John D.; Levesque, Paul C.; Collier, Mei Lin; Geary, Yvonne; Horowitz, Burton; Hume, Joseph R.
1996-06-01
cAMP-dependent chloride channels in heart contribute to autonomic regulation of action potential duration and membrane potential and have been inferred to be due to cardiac expression of the epithelial cystic fibrosis transmembrane conductance regulator (CFTR) chloride channel. In this report, a cDNA from rabbit ventricle was isolated and sequenced, which encodes an exon 5 splice variant (exon 5-) of CFTR, with >90% identity to human CFTR cDNA present in epithelial cells. Expression of this cDNA in Xenopus oocytes gave rise to robust cAMP-activated chloride currents that were absent in control water-injected oocytes. Antisense oligodeoxynucleotides directed against CFTR significnatly reduced the density of cAMP-dependent chloride currents in acutely cultured myocytes, thereby establishing a direct functional link between cardiac expression of CFTR protein and an endogenous chloride channel in native cardiac myocytes.
NASA Astrophysics Data System (ADS)
Jia, Chen; Qian, Hong; Chen, Min; Zhang, Michael Q.
2018-03-01
The transient response to a stimulus and subsequent recovery to a steady state are the fundamental characteristics of a living organism. Here we study the relaxation kinetics of autoregulatory gene networks based on the chemical master equation model of single-cell stochastic gene expression with nonlinear feedback regulation. We report a novel relation between the rate of relaxation, characterized by the spectral gap of the Markov model, and the feedback sign of the underlying gene circuit. When a network has no feedback, the relaxation rate is exactly the decaying rate of the protein. We further show that positive feedback always slows down the relaxation kinetics while negative feedback always speeds it up. Numerical simulations demonstrate that this relation provides a possible method to infer the feedback topology of autoregulatory gene networks by using time-series data of gene expression.
Wang, Jianxin; Chen, Bo; Wang, Yaqun; Wang, Ningtao; Garbey, Marc; Tran-Son-Tay, Roger; Berceli, Scott A.; Wu, Rongling
2013-01-01
The capacity of an organism to respond to its environment is facilitated by the environmentally induced alteration of gene and protein expression, i.e. expression plasticity. The reconstruction of gene regulatory networks based on expression plasticity can gain not only new insights into the causality of transcriptional and cellular processes but also the complex regulatory mechanisms that underlie biological function and adaptation. We describe an approach for network inference by integrating expression plasticity into Shannon’s mutual information. Beyond Pearson correlation, mutual information can capture non-linear dependencies and topology sparseness. The approach measures the network of dependencies of genes expressed in different environments, allowing the environment-induced plasticity of gene dependencies to be tested in unprecedented details. The approach is also able to characterize the extent to which the same genes trigger different amounts of expression in response to environmental changes. We demonstrated the usefulness of this approach through analysing gene expression data from a rabbit vein graft study that includes two distinct blood flow environments. The proposed approach provides a powerful tool for the modelling and analysis of dynamic regulatory networks using gene expression data from distinct environments. PMID:23470995
Employing conservation of co-expression to improve functional inference
Daub, Carsten O; Sonnhammer, Erik LL
2008-01-01
Background Observing co-expression between genes suggests that they are functionally coupled. Co-expression of orthologous gene pairs across species may improve function prediction beyond the level achieved in a single species. Results We used orthology between genes of the three different species S. cerevisiae, D. melanogaster, and C. elegans to combine co-expression across two species at a time. This led to increased function prediction accuracy when we incorporated expression data from either of the other two species and even further increased when conservation across both of the two other species was considered at the same time. Employing the conservation across species to incorporate abundant model organism data for the prediction of protein interactions in poorly characterized species constitutes a very powerful annotation method. Conclusion To be able to employ the most suitable co-expression distance measure for our analysis, we evaluated the ability of four popular gene co-expression distance measures to detect biologically relevant interactions between pairs of genes. For the expression datasets employed in our co-expression conservation analysis above, we used the GO and the KEGG PATHWAY databases as gold standards. While the differences between distance measures were small, Spearman correlation showed to give most robust results. PMID:18808668
Cross-talk between AMPK and EGFR dependent Signaling in Non-Small Cell Lung Cancer
NASA Astrophysics Data System (ADS)
Praveen, Paurush; Hülsmann, Helen; Sültmann, Holger; Kuner, Ruprecht; Fröhlich, Holger
2016-06-01
Lung cancers globally account for 12% of new cancer cases, 85% of these being Non Small Cell Lung Cancer (NSCLC). Therapies like erlotinib target the key player EGFR, which is mutated in about 10% of lung adenocarcinoma. However, drug insensitivity and resistance caused by second mutations in the EGFR or aberrant bypass signaling have evolved as a major challenge in controlling these tumors. Recently, AMPK activation was proposed to sensitize NSCLC cells against erlotinib treatment. However, the underlying mechanism is largely unknown. In this work we aim to unravel the interplay between 20 proteins that were previously associated with EGFR signaling and erlotinib drug sensitivity. The inferred network shows a high level of agreement with protein-protein interactions reported in STRING and HIPPIE databases. It is further experimentally validated with protein measurements. Moreover, predictions derived from our network model fairly agree with somatic mutations and gene expression data from primary lung adenocarcinoma. Altogether our results support the role of AMPK in EGFR signaling and drug sensitivity.
ASSESSING AND COMBINING RELIABILITY OF PROTEIN INTERACTION SOURCES
LEACH, SONIA; GABOW, AARON; HUNTER, LAWRENCE; GOLDBERG, DEBRA S.
2008-01-01
Integrating diverse sources of interaction information to create protein networks requires strategies sensitive to differences in accuracy and coverage of each source. Previous integration approaches calculate reliabilities of protein interaction information sources based on congruity to a designated ‘gold standard.’ In this paper, we provide a comparison of the two most popular existing approaches and propose a novel alternative for assessing reliabilities which does not require a gold standard. We identify a new method for combining the resultant reliabilities and compare it against an existing method. Further, we propose an extrinsic approach to evaluation of reliability estimates, considering their influence on the downstream tasks of inferring protein function and learning regulatory networks from expression data. Results using this evaluation method show 1) our method for reliability estimation is an attractive alternative to those requiring a gold standard and 2) the new method for combining reliabilities is less sensitive to noise in reliability assignments than the similar existing technique. PMID:17990508
Pantzartzi, Chrysoula N.; Drosopoulou, Elena; Scouras, Zacharias G.
2013-01-01
Hsp90s, members of the Heat Shock Protein class, protect the structure and function of proteins and play a significant task in cellular homeostasis and signal transduction. In order to determine the number of hsp90 gene copies and encoded proteins in fungal and animal lineages and through that key duplication events that this family has undergone, we collected and evaluated Hsp90 protein sequences and corresponding Expressed Sequence Tags and analyzed available genomes from various taxa. We provide evidence for duplication events affecting either single species or wider taxonomic groups. With regard to Fungi, duplicated genes have been detected in several lineages. In invertebrates, we demonstrate key duplication events in certain clades of Arthropoda and Mollusca, and a possible gene loss event in a hymenopteran family. Finally, we infer that the duplication event responsible for the two (a and b) isoforms in vertebrates occurred probably shortly after the split of Hyperoartia and Gnathostomata. PMID:24066039
Identification of a new EF-hand superfamily member from Trypanosoma brucei
NASA Technical Reports Server (NTRS)
Wong, S.; Kretsinger, R. H.; Campbell, D. A.
1992-01-01
We identified several open reading frames between the regions encoding calmodulin and ubiquitin-EP52/1 in the genome of Trypanosoma brucei. One of these, EFH5, encodes a protein 192 amino acids long. The EFH5 transcript is present in poly(A)+ mRNA and is present at similar levels in the mammalian bloodstream form and the insect procyclic form. EFH5 contains four EF-hand homolog domains, two of which are inferred to bind Ca2+ ions. We expressed EFH5 as a fusion protein in Escherichia coli and demonstrated calcium-binding activity of the fusion protein using the 45Ca-overlay technique. The function of EFH5 remains unknown; however, as the fourth EF-hand homolog identified in trypanosomes, it attests to the broad range of functions assumed by calcium functioning as a second messenger. EFH5, which is most closely related to LAV1-2 from Physarum, represents a distinct subfamily among the EF-hand-containing proteins.
Akashi, A; Yoshida, Y; Nakagoshi, H; Kuroki, K; Hashimoto, T; Tagawa, K; Imamoto, F
1988-10-01
Stabilizing factor, a 9 kDa protein, stabilizes and facilitates formation of the complex between mitochondrial ATP synthase and its intrinsic inhibitor protein. A clone containing the gene encoding the 9 kDa protein was selected from a yeast genomic library to determine the structure of its precursor protein. As deduced from the nucleotide sequence, the precursor of the yeast 9 kDa stabilizing factor contains 86 amino acid residues and has a molecular weight of 10,062. From the predicted sequence we infer that the stabilizing factor precursor contains a presequence of 23 amino acid residues at its amino terminus. We also used S1 mapping to determine the initiation site of transcription under glucose-repressed or derepressed conditions. These experiments suggest that transcription of this gene starts at three different sites and that only one of them is not affected by the presence of glucose.
Cross-talk between AMPK and EGFR dependent Signaling in Non-Small Cell Lung Cancer
Praveen, Paurush; Hülsmann, Helen; Sültmann, Holger; Kuner, Ruprecht; Fröhlich, Holger
2016-01-01
Lung cancers globally account for 12% of new cancer cases, 85% of these being Non Small Cell Lung Cancer (NSCLC). Therapies like erlotinib target the key player EGFR, which is mutated in about 10% of lung adenocarcinoma. However, drug insensitivity and resistance caused by second mutations in the EGFR or aberrant bypass signaling have evolved as a major challenge in controlling these tumors. Recently, AMPK activation was proposed to sensitize NSCLC cells against erlotinib treatment. However, the underlying mechanism is largely unknown. In this work we aim to unravel the interplay between 20 proteins that were previously associated with EGFR signaling and erlotinib drug sensitivity. The inferred network shows a high level of agreement with protein-protein interactions reported in STRING and HIPPIE databases. It is further experimentally validated with protein measurements. Moreover, predictions derived from our network model fairly agree with somatic mutations and gene expression data from primary lung adenocarcinoma. Altogether our results support the role of AMPK in EGFR signaling and drug sensitivity. PMID:27279498
Cross-talk between AMPK and EGFR dependent Signaling in Non-Small Cell Lung Cancer.
Praveen, Paurush; Hülsmann, Helen; Sültmann, Holger; Kuner, Ruprecht; Fröhlich, Holger
2016-06-09
Lung cancers globally account for 12% of new cancer cases, 85% of these being Non Small Cell Lung Cancer (NSCLC). Therapies like erlotinib target the key player EGFR, which is mutated in about 10% of lung adenocarcinoma. However, drug insensitivity and resistance caused by second mutations in the EGFR or aberrant bypass signaling have evolved as a major challenge in controlling these tumors. Recently, AMPK activation was proposed to sensitize NSCLC cells against erlotinib treatment. However, the underlying mechanism is largely unknown. In this work we aim to unravel the interplay between 20 proteins that were previously associated with EGFR signaling and erlotinib drug sensitivity. The inferred network shows a high level of agreement with protein-protein interactions reported in STRING and HIPPIE databases. It is further experimentally validated with protein measurements. Moreover, predictions derived from our network model fairly agree with somatic mutations and gene expression data from primary lung adenocarcinoma. Altogether our results support the role of AMPK in EGFR signaling and drug sensitivity.
Kishi, Asuka; Yamamoto, Masahito; Kikuchi, Akihito; Iwanuma, Osamu; Watanabe, Yutaka; Ide, Yoshinobu; Abe, Shinichi
2012-09-01
Meckel's cartilage is known to be involved in formation of the prenatal mandible. However, the relationship between Meckel's cartilage and the embryonic mylohyoid muscle during growth and development has been investigated only rarely. This study examined the expression of intermediate filaments in Meckel's cartilage and the embryonic mylohyoid muscle in fetal mice during morphological development. Specimens of E12-16 ICR mice sectioned in the frontal direction were subjected to immunohistochemistry for vimentin and desmin. Hematoxylin and eosin sections showed that the immature mylohyoid muscle began to grow along Meckel's cartilage during fetal development. Weak vimentin expression was detected in the mylohyoid muscle and surrounding tissues at E12. Desmin expression was detected specifically in the mylohyoid, and strong expression was evident after E13, and increased with age. It was inferred that the mylohyoid muscle is one the tissues developing from Meckel's cartilage, the latter exerting a continuous influence on the growth of the former. In the early stage, the surrounding mesenchymal tissues expressing vimentin formed a scaffold for the developing mylohyoid muscle. Muscle attachment at E13 showed steady desmin expression, which continued until maturity. This study suggested the possibility that Meckel's cartilage has an influence not only on the mandibular bone, but also on the development of the mylohyoid muscle attached to the mandibular bone. Furthermore, it revealed a stage of the developmental process of the mylohyoid muscle in which the expression of vimentin, which is a common protein in the surrounding tissue such as muscle and bone, induces the morphological formation of the mylohyoid muscle, cooperating with the surrounding structures.
Zeng, Jia; Hannenhalli, Sridhar
2013-01-01
Gene duplication, followed by functional evolution of duplicate genes, is a primary engine of evolutionary innovation. In turn, gene expression evolution is a critical component of overall functional evolution of paralogs. Inferring evolutionary history of gene expression among paralogs is therefore a problem of considerable interest. It also represents significant challenges. The standard approaches of evolutionary reconstruction assume that at an internal node of the duplication tree, the two duplicates evolve independently. However, because of various selection pressures functional evolution of the two paralogs may be coupled. The coupling of paralog evolution corresponds to three major fates of gene duplicates: subfunctionalization (SF), conserved function (CF) or neofunctionalization (NF). Quantitative analysis of these fates is of great interest and clearly influences evolutionary inference of expression. These two interrelated problems of inferring gene expression and evolutionary fates of gene duplicates have not been studied together previously and motivate the present study. Here we propose a novel probabilistic framework and algorithm to simultaneously infer (i) ancestral gene expression and (ii) the likely fate (SF, NF, CF) at each duplication event during the evolution of gene family. Using tissue-specific gene expression data, we develop a nonparametric belief propagation (NBP) algorithm to predict the ancestral expression level as a proxy for function, and describe a novel probabilistic model that relates the predicted and known expression levels to the possible evolutionary fates. We validate our model using simulation and then apply it to a genome-wide set of gene duplicates in human. Our results suggest that SF tends to be more frequent at the earlier stage of gene family expansion, while NF occurs more frequently later on.
The role of aquaporins in polycystic ovary syndrome - A way towards a novel drug target in PCOS.
Wawrzkiewicz-Jałowiecka, Agata; Kowalczyk, Karolina; Pluta, Dagmara; Blukacz, Łukasz; Madej, Paweł
2017-05-01
Aquaporins (AQPs) are transmembrane proteins, able to transport water (and in some cases also small solutes, e. g. glycerol) through the cell membrane. There are twelve types of aquaporins (AQP1-AQP12) expressed in mammalian reproductive systems. According to literature, many diseases of the reproductive organs are correlated with changes of AQPs expression and their malfunction. That is the case in the polycystic ovary syndrome (PCOS), where dysfunctions of AQPs 7-9 and alterations in its levels occur. In this work, we postulate how AQPs are involved in PCOS-related disorders, in order to emphasize their potential therapeutic meaning as a drug target. Our research allows for a surprising inference, that genetic mutation causing malfunction and/or decreased expression of aquaporins, may be incorporated in the popular insulin-dependent hypothesis of PCOS pathogenesis. What is more, changes in AQP's expression may affect the folliculogenesis and follicular atresia in PCOS. Copyright © 2017 Elsevier Ltd. All rights reserved.
2010-01-01
Background Trichomonas vaginalis is the most common non-viral human sexually transmitted pathogen and importantly, contributes to facilitating the spread of HIV. Yet very little is known about its surface and secreted proteins mediating interactions with, and permitting the invasion and colonisation of, the host mucosa. Initial annotations of T. vaginalis genome identified a plethora of candidate extracellular proteins. Results Data mining of the T. vaginalis genome identified 911 BspA-like entries (TvBspA) sharing TpLRR-like leucine-rich repeats, which represent the largest gene family encoding potential extracellular proteins for the pathogen. A broad range of microorganisms encoding BspA-like proteins was identified and these are mainly known to live on mucosal surfaces, among these T. vaginalis is endowed with the largest gene family. Over 190 TvBspA proteins with inferred transmembrane domains were characterised by a considerable structural diversity between their TpLRR and other types of repetitive sequences and two subfamilies possessed distinct classic sorting signal motifs for endocytosis. One TvBspA subfamily also shared a glycine-rich protein domain with proteins from Clostridium difficile pathogenic strains and C. difficile phages. Consistent with the hypothesis that TvBspA protein structural diversity implies diverse roles, we demonstrated for several TvBspA genes differential expression at the transcript level in different growth conditions. Identified variants of repetitive segments between several TvBspA paralogues and orthologues from two clinical isolates were also consistent with TpLRR and other repetitive sequences to be functionally important. For one TvBspA protein cell surface expression and antibody responses by both female and male T. vaginalis infected patients were also demonstrated. Conclusions The biased mucosal habitat for microbial species encoding BspA-like proteins, the characterisation of a vast structural diversity for the TvBspA proteins, differential expression of a subset of TvBspA genes and the cellular localisation and immunological data for one TvBspA; all point to the importance of the TvBspA proteins to various aspects of T. vaginalis pathobiology at the host-pathogen interface. PMID:20144183
Reading biological processes from nucleotide sequences
NASA Astrophysics Data System (ADS)
Murugan, Anand
Cellular processes have traditionally been investigated by techniques of imaging and biochemical analysis of the molecules involved. The recent rapid progress in our ability to manipulate and read nucleic acid sequences gives us direct access to the genetic information that directs and constrains biological processes. While sequence data is being used widely to investigate genotype-phenotype relationships and population structure, here we use sequencing to understand biophysical mechanisms. We present work on two different systems. First, in chapter 2, we characterize the stochastic genetic editing mechanism that produces diverse T-cell receptors in the human immune system. We do this by inferring statistical distributions of the underlying biochemical events that generate T-cell receptor coding sequences from the statistics of the observed sequences. This inferred model quantitatively describes the potential repertoire of T-cell receptors that can be produced by an individual, providing insight into its potential diversity and the probability of generation of any specific T-cell receptor. Then in chapter 3, we present work on understanding the functioning of regulatory DNA sequences in both prokaryotes and eukaryotes. Here we use experiments that measure the transcriptional activity of large libraries of mutagenized promoters and enhancers and infer models of the sequence-function relationship from this data. For the bacterial promoter, we infer a physically motivated 'thermodynamic' model of the interaction of DNA-binding proteins and RNA polymerase determining the transcription rate of the downstream gene. For the eukaryotic enhancers, we infer heuristic models of the sequence-function relationship and use these models to find synthetic enhancer sequences that optimize inducibility of expression. Both projects demonstrate the utility of sequence information in conjunction with sophisticated statistical inference techniques for dissecting underlying biophysical mechanisms.
Inferring network structure in non-normal and mixed discrete-continuous genomic data.
Bhadra, Anindya; Rao, Arvind; Baladandayuthapani, Veerabhadran
2018-03-01
Inferring dependence structure through undirected graphs is crucial for uncovering the major modes of multivariate interaction among high-dimensional genomic markers that are potentially associated with cancer. Traditionally, conditional independence has been studied using sparse Gaussian graphical models for continuous data and sparse Ising models for discrete data. However, there are two clear situations when these approaches are inadequate. The first occurs when the data are continuous but display non-normal marginal behavior such as heavy tails or skewness, rendering an assumption of normality inappropriate. The second occurs when a part of the data is ordinal or discrete (e.g., presence or absence of a mutation) and the other part is continuous (e.g., expression levels of genes or proteins). In this case, the existing Bayesian approaches typically employ a latent variable framework for the discrete part that precludes inferring conditional independence among the data that are actually observed. The current article overcomes these two challenges in a unified framework using Gaussian scale mixtures. Our framework is able to handle continuous data that are not normal and data that are of mixed continuous and discrete nature, while still being able to infer a sparse conditional sign independence structure among the observed data. Extensive performance comparison in simulations with alternative techniques and an analysis of a real cancer genomics data set demonstrate the effectiveness of the proposed approach. © 2017, The International Biometric Society.
Inferring network structure in non-normal and mixed discrete-continuous genomic data
Bhadra, Anindya; Rao, Arvind; Baladandayuthapani, Veerabhadran
2017-01-01
Inferring dependence structure through undirected graphs is crucial for uncovering the major modes of multivariate interaction among high-dimensional genomic markers that are potentially associated with cancer. Traditionally, conditional independence has been studied using sparse Gaussian graphical models for continuous data and sparse Ising models for discrete data. However, there are two clear situations when these approaches are inadequate. The first occurs when the data are continuous but display non-normal marginal behavior such as heavy tails or skewness, rendering an assumption of normality inappropriate. The second occurs when a part of the data is ordinal or discrete (e.g., presence or absence of a mutation) and the other part is continuous (e.g., expression levels of genes or proteins). In this case, the existing Bayesian approaches typically employ a latent variable framework for the discrete part that precludes inferring conditional independence among the data that are actually observed. The current article overcomes these two challenges in a unified framework using Gaussian scale mixtures. Our framework is able to handle continuous data that are not normal and data that are of mixed continuous and discrete nature, while still being able to infer a sparse conditional sign independence structure among the observed data. Extensive performance comparison in simulations with alternative techniques and an analysis of a real cancer genomics data set demonstrate the effectiveness of the proposed approach. PMID:28437848
Praveen, Paurush; Fröhlich, Holger
2013-01-01
Inferring regulatory networks from experimental data via probabilistic graphical models is a popular framework to gain insights into biological systems. However, the inherent noise in experimental data coupled with a limited sample size reduces the performance of network reverse engineering. Prior knowledge from existing sources of biological information can address this low signal to noise problem by biasing the network inference towards biologically plausible network structures. Although integrating various sources of information is desirable, their heterogeneous nature makes this task challenging. We propose two computational methods to incorporate various information sources into a probabilistic consensus structure prior to be used in graphical model inference. Our first model, called Latent Factor Model (LFM), assumes a high degree of correlation among external information sources and reconstructs a hidden variable as a common source in a Bayesian manner. The second model, a Noisy-OR, picks up the strongest support for an interaction among information sources in a probabilistic fashion. Our extensive computational studies on KEGG signaling pathways as well as on gene expression data from breast cancer and yeast heat shock response reveal that both approaches can significantly enhance the reconstruction accuracy of Bayesian Networks compared to other competing methods as well as to the situation without any prior. Our framework allows for using diverse information sources, like pathway databases, GO terms and protein domain data, etc. and is flexible enough to integrate new sources, if available.
Molecular Determinants of Mutant Phenotypes, Inferred from Saturation Mutagenesis Data.
Tripathi, Arti; Gupta, Kritika; Khare, Shruti; Jain, Pankaj C; Patel, Siddharth; Kumar, Prasanth; Pulianmackal, Ajai J; Aghera, Nilesh; Varadarajan, Raghavan
2016-11-01
Understanding how mutations affect protein activity and organismal fitness is a major challenge. We used saturation mutagenesis combined with deep sequencing to determine mutational sensitivity scores for 1,664 single-site mutants of the 101 residue Escherichia coli cytotoxin, CcdB at seven different expression levels. Active-site residues could be distinguished from buried ones, based on their differential tolerance to aliphatic and charged amino acid substitutions. At nonactive-site positions, the average mutational tolerance correlated better with depth from the protein surface than with accessibility. Remarkably, similar results were observed for two other small proteins, PDZ domain (PSD95 pdz3 ) and IgG-binding domain of protein G (GB1). Mutational sensitivity data obtained with CcdB were used to derive a procedure for predicting functional effects of mutations. Results compared favorably with those of two widely used computational predictors. In vitro characterization of 80 single, nonactive-site mutants of CcdB showed that activity in vivo correlates moderately with thermal stability and solubility. The inability to refold reversibly, as well as a decreased folding rate in vitro, is associated with decreased activity in vivo. Upon probing the effect of modulating expression of various proteases and chaperones on mutant phenotypes, most deleterious mutants showed an increased in vivo activity and solubility only upon over-expression of either Trigger factor or SecB ATP-independent chaperones. Collectively, these data suggest that folding kinetics rather than protein stability is the primary determinant of activity in vivo This study enhances our understanding of how mutations affect phenotype, as well as the ability to predict fitness effects of point mutations. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
The Effect of Post-Resistance Exercise Amino Acids on Plasma MCP-1 and CCR2 Expression
Wells, Adam J.; Hoffman, Jay R.; Jajtner, Adam R.; Varanoske, Alyssa N.; Church, David D.; Gonzalez, Adam M.; Townsend, Jeremy R.; Boone, Carleigh H.; Baker, Kayla M.; Beyer, Kyle S.; Mangine, Gerald T.; Oliveira, Leonardo P.; Fukuda, David H.; Stout, Jeffrey R.
2016-01-01
The recruitment and infiltration of classical monocytes into damaged muscle is critical for optimal tissue remodeling. This study examined the effects of an amino acid supplement on classical monocyte recruitment following an acute bout of lower body resistance exercise. Ten resistance-trained men (24.7 ± 3.4 years; 90.1 ± 11.3 kg; 176.0 ± 4.9 cm) ingested supplement (SUPP) or placebo (PL) immediately post-exercise in a randomized, cross-over design. Blood samples were obtained at baseline (BL), immediately (IP), 30-min (30P), 1-h (1H), 2-h (2H), and 5-h (5H) post-exercise to assess plasma concentrations of monocyte chemoattractant protein 1 (MCP-1), myoglobin, cortisol and insulin concentrations; and expressions of C-C chemokine receptor-2 (CCR2), and macrophage-1 antigen (CD11b) on classical monocytes. Magnitude-based inferences were used to provide inferences on the true effects of SUPP compared to PL. Changes in myoglobin, cortisol, and insulin concentrations were similar between treatments. Compared to PL, plasma MCP-1 was “very likely greater” (98.1% likelihood effect) in SUPP at 2H. CCR2 expression was “likely greater” at IP (84.9% likelihood effect), “likely greater” at 1H (87.7% likelihood effect), “very likely greater” at 2H (97.0% likelihood effect), and “likely greater” at 5H (90.1% likelihood effect) in SUPP, compared to PL. Ingestion of SUPP did not influence CD11b expression. Ingestion of an amino acid supplement immediately post-exercise appears to help maintain plasma MCP-1 concentrations and augment CCR2 expression in resistance trained men. PMID:27384580
Iwata, Hiroaki; Mizutani, Sayaka; Tabei, Yasuo; Kotera, Masaaki; Goto, Susumu; Yamanishi, Yoshihiro
2013-01-01
Most phenotypic effects of drugs are involved in the interactions between drugs and their target proteins, however, our knowledge about the molecular mechanism of the drug-target interactions is very limited. One of challenging issues in recent pharmaceutical science is to identify the underlying molecular features which govern drug-target interactions. In this paper, we make a systematic analysis of the correlation between drug side effects and protein domains, which we call "pharmacogenomic features," based on the drug-target interaction network. We detect drug side effects and protein domains that appear jointly in known drug-target interactions, which is made possible by using classifiers with sparse models. It is shown that the inferred pharmacogenomic features can be used for predicting potential drug-target interactions. We also discuss advantages and limitations of the pharmacogenomic features, compared with the chemogenomic features that are the associations between drug chemical substructures and protein domains. The inferred side effect-domain association network is expected to be useful for estimating common drug side effects for different protein families and characteristic drug side effects for specific protein domains.
Huang, Yi-Fei; Golding, G Brian
2015-02-15
A number of statistical phylogenetic methods have been developed to infer conserved functional sites or regions in proteins. Many methods, e.g. Rate4Site, apply the standard phylogenetic models to infer site-specific substitution rates and totally ignore the spatial correlation of substitution rates in protein tertiary structures, which may reduce their power to identify conserved functional patches in protein tertiary structures when the sequences used in the analysis are highly similar. The 3D sliding window method has been proposed to infer conserved functional patches in protein tertiary structures, but the window size, which reflects the strength of the spatial correlation, must be predefined and is not inferred from data. We recently developed GP4Rate to solve these problems under the Bayesian framework. Unfortunately, GP4Rate is computationally slow. Here, we present an intuitive web server, FuncPatch, to perform a fast approximate Bayesian inference of conserved functional patches in protein tertiary structures. Both simulations and four case studies based on empirical data suggest that FuncPatch is a good approximation to GP4Rate. However, FuncPatch is orders of magnitudes faster than GP4Rate. In addition, simulations suggest that FuncPatch is potentially a useful tool complementary to Rate4Site, but the 3D sliding window method is less powerful than FuncPatch and Rate4Site. The functional patches predicted by FuncPatch in the four case studies are supported by experimental evidence, which corroborates the usefulness of FuncPatch. The software FuncPatch is freely available at the web site, http://info.mcmaster.ca/yifei/FuncPatch golding@mcmaster.ca Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Applying dynamic Bayesian networks to perturbed gene expression data.
Dojer, Norbert; Gambin, Anna; Mizera, Andrzej; Wilczyński, Bartek; Tiuryn, Jerzy
2006-05-08
A central goal of molecular biology is to understand the regulatory mechanisms of gene transcription and protein synthesis. Because of their solid basis in statistics, allowing to deal with the stochastic aspects of gene expressions and noisy measurements in a natural way, Bayesian networks appear attractive in the field of inferring gene interactions structure from microarray experiments data. However, the basic formalism has some disadvantages, e.g. it is sometimes hard to distinguish between the origin and the target of an interaction. Two kinds of microarray experiments yield data particularly rich in information regarding the direction of interactions: time series and perturbation experiments. In order to correctly handle them, the basic formalism must be modified. For example, dynamic Bayesian networks (DBN) apply to time series microarray data. To our knowledge the DBN technique has not been applied in the context of perturbation experiments. We extend the framework of dynamic Bayesian networks in order to incorporate perturbations. Moreover, an exact algorithm for inferring an optimal network is proposed and a discretization method specialized for time series data from perturbation experiments is introduced. We apply our procedure to realistic simulations data. The results are compared with those obtained by standard DBN learning techniques. Moreover, the advantages of using exact learning algorithm instead of heuristic methods are analyzed. We show that the quality of inferred networks dramatically improves when using data from perturbation experiments. We also conclude that the exact algorithm should be used when it is possible, i.e. when considered set of genes is small enough.
Heerdink, Marc W; Koning, Lukas F; van Doorn, Evert A; van Kleef, Gerben A
2018-05-18
Other people's emotional reactions to a third person's behaviour are potentially informative about what is appropriate within a given situation. We investigated whether and how observers' inferences of such injunctive norms are shaped by expressions of anger and disgust. Building on the moral emotions literature, we hypothesised that angry and disgusted expressions produce relative differences in the strength of autonomy-based versus purity-based norm inferences. We report three studies (plus three supplementary studies) using different types of stimuli (vignette-based, video clips) to investigate how emotional reactions shape norms about potential norm violations (eating snacks, drinking alcohol), and contexts (groups of friends, a university, a company). Consistent with our theoretical argument, the results indicate that observers use others' emotional reactions not only to infer whether a particular behaviour is inappropriate, but also why it is inappropriate: because it primarily violates autonomy standards (as suggested relatively more strongly by expressions of anger) or purity standards (as suggested relatively more strongly by expressions of disgust). We conclude that the social functionality of emotions in groups extends to shaping norms based on moral standards.
Mirza, Neelofar; Taj, Gohar; Arora, Sandeep; Kumar, Anil
2014-10-25
Finger millet (Eleusine coracana) variably accumulates calcium in different tissues, due to differential expression of genes involved in uptake, translocation and accumulation of calcium. Ca(2+)/H(+) antiporter (CAX1), two pore channel (TPC1), CaM-stimulated type IIB Ca(2+) ATPase and two CaM dependent protein kinase (CaMK1 and 2) homologs were studied in finger millet. Two genotypes GP-45 and GP-1 (high and low calcium accumulating, respectively) were used to understand the role of these genes in differential calcium accumulation. For most of the genes higher expression was found in the high calcium accumulating genotype. CAX1 was strongly expressed in the late stages of spike development and could be responsible for accumulating high concentrations of calcium in seeds. TPC1 and Ca(2+) ATPase homologs recorded strong expression in the root, stem and developing spike and signify their role in calcium uptake and translocation, respectively. Calmodulin showed strong expression and a similar expression pattern to the type IIB ATPase in the developing spike only and indicating developing spike or even seed specific isoform of CaM affecting the activity of downstream target of calcium transportation. Interestingly, CaMK1 and CaMK2 had expression patterns similar to ATPase and TPC1 in various tissues raising a possibility of their respective regulation via CaM kinase. Expression pattern of 14-3-3 gene was observed to be similar to CAX1 gene in leaf and developing spike inferring a surprising possibility of CAX1 regulation through 14-3-3 protein. Our results provide a molecular insight for explaining the mechanism of calcium accumulation in finger millet. Copyright © 2014 Elsevier B.V. All rights reserved.
Wang, Yi Kan; Hurley, Daniel G.; Schnell, Santiago; Print, Cristin G.; Crampin, Edmund J.
2013-01-01
We develop a new regression algorithm, cMIKANA, for inference of gene regulatory networks from combinations of steady-state and time-series gene expression data. Using simulated gene expression datasets to assess the accuracy of reconstructing gene regulatory networks, we show that steady-state and time-series data sets can successfully be combined to identify gene regulatory interactions using the new algorithm. Inferring gene networks from combined data sets was found to be advantageous when using noisy measurements collected with either lower sampling rates or a limited number of experimental replicates. We illustrate our method by applying it to a microarray gene expression dataset from human umbilical vein endothelial cells (HUVECs) which combines time series data from treatment with growth factor TNF and steady state data from siRNA knockdown treatments. Our results suggest that the combination of steady-state and time-series datasets may provide better prediction of RNA-to-RNA interactions, and may also reveal biological features that cannot be identified from dynamic or steady state information alone. Finally, we consider the experimental design of genomics experiments for gene regulatory network inference and show that network inference can be improved by incorporating steady-state measurements with time-series data. PMID:23967277
Sobol-Shikler, Tal; Robinson, Peter
2010-07-01
We present a classification algorithm for inferring affective states (emotions, mental states, attitudes, and the like) from their nonverbal expressions in speech. It is based on the observations that affective states can occur simultaneously and different sets of vocal features, such as intonation and speech rate, distinguish between nonverbal expressions of different affective states. The input to the inference system was a large set of vocal features and metrics that were extracted from each utterance. The classification algorithm conducted independent pairwise comparisons between nine affective-state groups. The classifier used various subsets of metrics of the vocal features and various classification algorithms for different pairs of affective-state groups. Average classification accuracy of the 36 pairwise machines was 75 percent, using 10-fold cross validation. The comparison results were consolidated into a single ranked list of the nine affective-state groups. This list was the output of the system and represented the inferred combination of co-occurring affective states for the analyzed utterance. The inference accuracy of the combined machine was 83 percent. The system automatically characterized over 500 affective state concepts from the Mind Reading database. The inference of co-occurring affective states was validated by comparing the inferred combinations to the lexical definitions of the labels of the analyzed sentences. The distinguishing capabilities of the system were comparable to human performance.
Complex Ancestries of Isoprenoid Synthesis in Dinoflagellates.
Bentlage, Bastian; Rogers, Travis S; Bachvaroff, Tsvetan R; Delwiche, Charles F
2016-01-01
Isoprenoid metabolism occupies a central position in the anabolic metabolism of all living cells. In plastid-bearing organisms, two pathways may be present for de novo isoprenoid synthesis, the cytosolic mevalonate pathway (MVA) and nuclear-encoded, plastid-targeted nonmevalonate pathway (DOXP). Using transcriptomic data we find that dinoflagellates apparently make exclusive use of the DOXP pathway. Using phylogenetic analyses of all DOXP genes we inferred the evolutionary origins of DOXP genes in dinoflagellates. Plastid replacements led to a DOXP pathway of multiple evolutionary origins. Dinoflagellates commonly referred to as dinotoms due to their relatively recent acquisition of a diatom plastid, express two completely redundant DOXP pathways. Dinoflagellates with a tertiary plastid of haptophyte origin, by contrast, express a hybrid pathway of dual evolutionary origin. Here, changes in the targeting motif of signal/transit peptide likely allow for targeting the new plastid by the proteins of core isoprenoid metabolism proteins. Parasitic dinoflagellates of the Amoebophyra species complex appear to have lost the DOXP pathway, suggesting that they may rely on their host for sterol synthesis. © 2015 The Author(s) Journal of Eukaryotic Microbiology © 2015 International Society of Protistologists.
Identification of core pathways based on attractor and crosstalk in ischemic stroke.
Diao, Xiufang; Liu, Aijuan
2018-02-01
Ischemic stroke is a leading cause of mortality and disability around the world. It is an important task to identify dysregulated pathways which infer molecular and functional insights existing in high-throughput experimental data. Gene expression profile of E-GEOD-16561 was collected. Pathways were obtained from the database of Kyoto Encyclopedia of Genes and Genomes and Retrieval of Interacting Genes was used to download protein-protein interaction sets. Attractor and crosstalk approaches were applied to screen dysregulated pathways. A total of 20 differentially expressed genes were identified in ischemic stroke. Thirty-nine significant differential pathways were identified according to P<0.01 and 28 pathways were identified with RP<0.01 and 17 pathways were identified with impact factor >250. On the basis of the three criteria, 11 significant dysfunctional pathways were identified. Among them, Epstein-Barr virus infection was the most significant differential pathway. In conclusion, with the method based on attractor and crosstalk, significantly dysfunctional pathways were identified. These pathways are expected to provide molecular mechanism of ischemic stroke and represents a novel potential therapeutic target for ischemic stroke treatment.
In silico re-identification of properties of drug target proteins.
Kim, Baeksoo; Jo, Jihoon; Han, Jonghyun; Park, Chungoo; Lee, Hyunju
2017-05-31
Computational approaches in the identification of drug targets are expected to reduce time and effort in drug development. Advances in genomics and proteomics provide the opportunity to uncover properties of druggable genomes. Although several studies have been conducted for distinguishing drug targets from non-drug targets, they mainly focus on the sequences and functional roles of proteins. Many other properties of proteins have not been fully investigated. Using the DrugBank (version 3.0) database containing nearly 6,816 drug entries including 760 FDA-approved drugs and 1822 of their targets and human UniProt/Swiss-Prot databases, we defined 1578 non-redundant drug target and 17,575 non-drug target proteins. To select these non-redundant protein datasets, we built four datasets (A, B, C, and D) by considering clustering of paralogous proteins. We first reassessed the widely used properties of drug target proteins. We confirmed and extended that drug target proteins (1) are likely to have more hydrophobic, less polar, less PEST sequences, and more signal peptide sequences higher and (2) are more involved in enzyme catalysis, oxidation and reduction in cellular respiration, and operational genes. In this study, we proposed new properties (essentiality, expression pattern, PTMs, and solvent accessibility) for effectively identifying drug target proteins. We found that (1) drug targetability and protein essentiality are decoupled, (2) druggability of proteins has high expression level and tissue specificity, and (3) functional post-translational modification residues are enriched in drug target proteins. In addition, to predict the drug targetability of proteins, we exploited two machine learning methods (Support Vector Machine and Random Forest). When we predicted drug targets by combining previously known protein properties and proposed new properties, an F-score of 0.8307 was obtained. When the newly proposed properties are integrated, the prediction performance is improved and these properties are related to drug targets. We believe that our study will provide a new aspect in inferring drug-target interactions.
The N and C Termini of ZO-1 Are Surrounded by Distinct Proteins and Functional Protein Networks*
Van Itallie, Christina M.; Aponte, Angel; Tietgens, Amber Jean; Gucek, Marjan; Fredriksson, Karin; Anderson, James Melvin
2013-01-01
The proteins and functional protein networks of the tight junction remain incompletely defined. Among the currently known proteins are barrier-forming proteins like occludin and the claudin family; scaffolding proteins like ZO-1; and some cytoskeletal, signaling, and cell polarity proteins. To define a more complete list of proteins and infer their functional implications, we identified the proteins that are within molecular dimensions of ZO-1 by fusing biotin ligase to either its N or C terminus, expressing these fusion proteins in Madin-Darby canine kidney epithelial cells, and purifying and identifying the resulting biotinylated proteins by mass spectrometry. Of a predicted proteome of ∼9000, we identified more than 400 proteins tagged by biotin ligase fused to ZO-1, with both identical and distinct proteins near the N- and C-terminal ends. Those proximal to the N terminus were enriched in transmembrane tight junction proteins, and those proximal to the C terminus were enriched in cytoskeletal proteins. We also identified many unexpected but easily rationalized proteins and verified partial colocalization of three of these proteins with ZO-1 as examples. In addition, functional networks of interacting proteins were tagged, such as the basolateral but not apical polarity network. These results provide a rich inventory of proteins and potential novel insights into functions and protein networks that should catalyze further understanding of tight junction biology. Unexpectedly, the technique demonstrates high spatial resolution, which could be generally applied to defining other subcellular protein compartmentalization. PMID:23553632
Protein 3D Structure Computed from Evolutionary Sequence Variation
Sheridan, Robert; Hopf, Thomas A.; Pagnani, Andrea; Zecchina, Riccardo; Sander, Chris
2011-01-01
The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing. In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy. We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7–4.8 Å Cα-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes. PMID:22163331
Co-Option and De Novo Gene Evolution Underlie Molluscan Shell Diversity
Aguilera, Felipe; McDougall, Carmel
2017-01-01
Abstract Molluscs fabricate shells of incredible diversity and complexity by localized secretions from the dorsal epithelium of the mantle. Although distantly related molluscs express remarkably different secreted gene products, it remains unclear if the evolution of shell structure and pattern is underpinned by the differential co-option of conserved genes or the integration of lineage-specific genes into the mantle regulatory program. To address this, we compare the mantle transcriptomes of 11 bivalves and gastropods of varying relatedness. We find that each species, including four Pinctada (pearl oyster) species that diverged within the last 20 Ma, expresses a unique mantle secretome. Lineage- or species-specific genes comprise a large proportion of each species’ mantle secretome. A majority of these secreted proteins have unique domain architectures that include repetitive, low complexity domains (RLCDs), which evolve rapidly, and have a proclivity to expand, contract and rearrange in the genome. There are also a large number of secretome genes expressed in the mantle that arose before the origin of gastropods and bivalves. Each species expresses a unique set of these more ancient genes consistent with their independent co-option into these mantle gene regulatory networks. From this analysis, we infer lineage-specific secretomes underlie shell diversity, and include both rapidly evolving RLCD-containing proteins, and the continual recruitment and loss of both ancient and recently evolved genes into the periphery of the regulatory network controlling gene expression in the mantle epithelium. PMID:28053006
Cytoprophet: a Cytoscape plug-in for protein and domain interaction networks inference.
Morcos, Faruck; Lamanna, Charles; Sikora, Marcin; Izaguirre, Jesús
2008-10-01
Cytoprophet is a software tool that allows prediction and visualization of protein and domain interaction networks. It is implemented as a plug-in of Cytoscape, an open source software framework for analysis and visualization of molecular networks. Cytoprophet implements three algorithms that predict new potential physical interactions using the domain composition of proteins and experimental assays. The algorithms for protein and domain interaction inference include maximum likelihood estimation (MLE) using expectation maximization (EM); the set cover approach maximum specificity set cover (MSSC) and the sum-product algorithm (SPA). After accepting an input set of proteins with Uniprot ID/Accession numbers and a selected prediction algorithm, Cytoprophet draws a network of potential interactions with probability scores and GO distances as edge attributes. A network of domain interactions between the domains of the initial protein list can also be generated. Cytoprophet was designed to take advantage of the visual capabilities of Cytoscape and be simple to use. An example of inference in a signaling network of myxobacterium Myxococcus xanthus is presented and available at Cytoprophet's website. http://cytoprophet.cse.nd.edu.
Inferring Gene Regulatory Networks by Singular Value Decomposition and Gravitation Field Algorithm
Zheng, Ming; Wu, Jia-nan; Huang, Yan-xin; Liu, Gui-xia; Zhou, You; Zhou, Chun-guang
2012-01-01
Reconstruction of gene regulatory networks (GRNs) is of utmost interest and has become a challenge computational problem in system biology. However, every existing inference algorithm from gene expression profiles has its own advantages and disadvantages. In particular, the effectiveness and efficiency of every previous algorithm is not high enough. In this work, we proposed a novel inference algorithm from gene expression data based on differential equation model. In this algorithm, two methods were included for inferring GRNs. Before reconstructing GRNs, singular value decomposition method was used to decompose gene expression data, determine the algorithm solution space, and get all candidate solutions of GRNs. In these generated family of candidate solutions, gravitation field algorithm was modified to infer GRNs, used to optimize the criteria of differential equation model, and search the best network structure result. The proposed algorithm is validated on both the simulated scale-free network and real benchmark gene regulatory network in networks database. Both the Bayesian method and the traditional differential equation model were also used to infer GRNs, and the results were used to compare with the proposed algorithm in our work. And genetic algorithm and simulated annealing were also used to evaluate gravitation field algorithm. The cross-validation results confirmed the effectiveness of our algorithm, which outperforms significantly other previous algorithms. PMID:23226565
Gene-network inference by message passing
NASA Astrophysics Data System (ADS)
Braunstein, A.; Pagnani, A.; Weigt, M.; Zecchina, R.
2008-01-01
The inference of gene-regulatory processes from gene-expression data belongs to the major challenges of computational systems biology. Here we address the problem from a statistical-physics perspective and develop a message-passing algorithm which is able to infer sparse, directed and combinatorial regulatory mechanisms. Using the replica technique, the algorithmic performance can be characterized analytically for artificially generated data. The algorithm is applied to genome-wide expression data of baker's yeast under various environmental conditions. We find clear cases of combinatorial control, and enrichment in common functional annotations of regulated genes and their regulators.
Ashworth, Justin; Plaisier, Christopher L.; Lo, Fang Yin; Reiss, David J.; Baliga, Nitin S.
2014-01-01
Widespread microbial genome sequencing presents an opportunity to understand the gene regulatory networks of non-model organisms. This requires knowledge of the binding sites for transcription factors whose DNA-binding properties are unknown or difficult to infer. We adapted a protein structure-based method to predict the specificities and putative regulons of homologous transcription factors across diverse species. As a proof-of-concept we predicted the specificities and transcriptional target genes of divergent archaeal feast/famine regulatory proteins, several of which are encoded in the genome of Halobacterium salinarum. This was validated by comparison to experimentally determined specificities for transcription factors in distantly related extremophiles, chromatin immunoprecipitation experiments, and cis-regulatory sequence conservation across eighteen related species of halobacteria. Through this analysis we were able to infer that Halobacterium salinarum employs a divergent local trans-regulatory strategy to regulate genes (carA and carB) involved in arginine and pyrimidine metabolism, whereas Escherichia coli employs an operon. The prediction of gene regulatory binding sites using structure-based methods is useful for the inference of gene regulatory relationships in new species that are otherwise difficult to infer. PMID:25255272
Ashworth, Justin; Plaisier, Christopher L; Lo, Fang Yin; Reiss, David J; Baliga, Nitin S
2014-01-01
Widespread microbial genome sequencing presents an opportunity to understand the gene regulatory networks of non-model organisms. This requires knowledge of the binding sites for transcription factors whose DNA-binding properties are unknown or difficult to infer. We adapted a protein structure-based method to predict the specificities and putative regulons of homologous transcription factors across diverse species. As a proof-of-concept we predicted the specificities and transcriptional target genes of divergent archaeal feast/famine regulatory proteins, several of which are encoded in the genome of Halobacterium salinarum. This was validated by comparison to experimentally determined specificities for transcription factors in distantly related extremophiles, chromatin immunoprecipitation experiments, and cis-regulatory sequence conservation across eighteen related species of halobacteria. Through this analysis we were able to infer that Halobacterium salinarum employs a divergent local trans-regulatory strategy to regulate genes (carA and carB) involved in arginine and pyrimidine metabolism, whereas Escherichia coli employs an operon. The prediction of gene regulatory binding sites using structure-based methods is useful for the inference of gene regulatory relationships in new species that are otherwise difficult to infer.
Wang, Chunli; Xu, Chunming; Chen, Rongfu; Yang, Li; Sung, Kl Paul
2018-02-12
Purposes The anterior cruciate ligament (ACL) has poor functional healing response. The synovial tissue surrounding ACL ligament might be a major regulator of the microenvironment in the joint cavity after ACL injury, thus affecting the repair process. Using transwell co-culture, this study explored the direct influence of human synovial cells (HSCs) on ACL fibroblasts (ACLfs) by characterizing the differential expression of the lysyl oxidase family (LOXs) and matrix metalloproteinases (MMP-1, -2, -3), which facilitate extracellular matrix (ECM) repair and degradation, respectively. Methods The mRNA expression levels of LOXs and MMP-1, -2, -3 were analyzed by semi-quantitative PCR and quantitative real-time PCR. The protein expression levels of LOXs and MMP-1, -2, -3 were detected by western blot. Results We found that co-culture resulted in an increase in the mRNAs of LOXs in normal ACLfs and differentially regulated the expression of MMPs. Then we applied 12% mechanical stretch on ACLfs to induce injury and found the mRNA expression levels of LOXs in injured ACLfs were decreased in the co-culture group relative to the mono-culture group. Conversely, the mRNA expression levels of MMPs in injured ACLfs were promoted in the co-culture group compared with the mono-culture group. At translational level, we found that LOXs were lower while MMPs were highly expressed in the co-culture group compared to the mono-culture group. Conclusions The co-culture of ACLfs and HSCs, which mimicked the cell-to-cell contact in a micro-environment, could contribute to protein modulators for wound healing, inferring the potential reason for the poor self-healing of injured ACL.
Hoehenwarter, Wolfgang; Larhlimi, Abdelhalim; Hummel, Jan; Egelhofer, Volker; Selbig, Joachim; van Dongen, Joost T; Wienkoop, Stefanie; Weckwerth, Wolfram
2011-07-01
Mass Accuracy Precursor Alignment is a fast and flexible method for comparative proteome analysis that allows the comparison of unprecedented numbers of shotgun proteomics analyses on a personal computer in a matter of hours. We compared 183 LC-MS analyses and more than 2 million MS/MS spectra and could define and separate the proteomic phenotypes of field grown tubers of 12 tetraploid cultivars of the crop plant Solanum tuberosum. Protein isoforms of patatin as well as other major gene families such as lipoxygenase and cysteine protease inhibitor that regulate tuber development were found to be the primary source of variability between the cultivars. This suggests that differentially expressed protein isoforms modulate genotype specific tuber development and the plant phenotype. We properly assigned the measured abundance of tryptic peptides to different protein isoforms that share extensive stretches of primary structure and thus inferred their abundance. Peptides unique to different protein isoforms were used to classify the remaining peptides assigned to the entire subset of isoforms based on a common abundance profile using multivariate statistical procedures. We identified nearly 4000 proteins which we used for quantitative functional annotation making this the most extensive study of the tuber proteome to date.
Magalhães, Alexandre P.; Verde, Nuno; Reis, Francisca; Martins, Inês; Costa, Daniela; Lino-Neto, Teresa; Castro, Pedro H.; Tavares, Rui M.; Azevedo, Herlânder
2016-01-01
Quercus suber (cork oak) is a West Mediterranean species of key economic interest, being extensively explored for its ability to generate cork. Like other Mediterranean plants, Q. suber is significantly threatened by climatic changes, imposing the need to quickly understand its physiological and molecular adaptability to drought stress imposition. In the present report, we uncovered the differential transcriptome of Q. suber roots exposed to long-term drought, using an RNA-Seq approach. 454-sequencing reads were used to de novo assemble a reference transcriptome, and mapping of reads allowed the identification of 546 differentially expressed unigenes. These were enriched in both effector genes (e.g., LEA, chaperones, transporters) as well as regulatory genes, including transcription factors (TFs) belonging to various different classes, and genes associated with protein turnover. To further extend functional characterization, we identified the orthologs of differentially expressed unigenes in the model species Arabidopsis thaliana, which then allowed us to perform in silico functional inference, including gene network analysis for protein function, protein subcellular localization and gene co-expression, and in silico enrichment analysis for TFs and cis-elements. Results indicated the existence of extensive transcriptional regulatory events, including activation of ABA-responsive genes and ABF-dependent signaling. We were then able to establish that a core ABA-signaling pathway involving PP2C-SnRK2-ABF components was induced in stressed Q. suber roots, identifying a key mechanism in this species’ response to drought. PMID:26793200
MIPS: a calmodulin-binding protein of Gracilaria lemaneiformis under heat shock.
Zhang, Xuan; Zhou, Huiyue; Zang, Xiaonan; Gong, Le; Sun, Hengyi; Zhang, Xuecheng
2014-08-01
To study the Ca(2+)/Calmodulin (CaM) signal transduction pathway of Gracilaria lemaneiformis under heat stress, myo-inositol-1-phosphate synthase (MIPS), a calmodulin-binding protein, was isolated using the yeast two-hybrid system. cDNA and DNA sequences of mips were cloned from G. lemaneiformis by using 5'RACE and genome walking procedures. The MIPS DNA sequence was 2,067 nucleotides long, containing an open reading frame (ORF) of 1,623 nucleotides with no intron. The mips ORF was predicted to encode 540 amino acids, which included the conserved MIPS domain and was 61-67 % similar to that of other species. After analyzing the amino acid sequence of MIPS, the CaM-Binding Domain (CaMBD) was inferred to be at a site spanning from amino acid 212 to amino acid 236. The yeast two-hybrid results proved that MIPS can interact with CaM and that MIPS is a type of calmodulin-binding protein. Next, the expression of CaM and MIPS in wild-type G. lemaneiformis and a heat-tolerant G. lemaneiformis cultivar, "981," were analyzed using real-time PCR under a heat shock of 32 °C. The expression level displayed a cyclical upward trend. Compared with wild type, the CaM expression levels of cultivar 981 were higher, which might directly relate to its resistance to high temperatures. This paper indicates that MIPS and CaM may play important roles in the high-temperature resistance of G. lemaneiformis.
How to talk about protein-level false discovery rates in shotgun proteomics.
The, Matthew; Tasnim, Ayesha; Käll, Lukas
2016-09-01
A frequently sought output from a shotgun proteomics experiment is a list of proteins that we believe to have been present in the analyzed sample before proteolytic digestion. The standard technique to control for errors in such lists is to enforce a preset threshold for the false discovery rate (FDR). Many consider protein-level FDRs a difficult and vague concept, as the measurement entities, spectra, are manifestations of peptides and not proteins. Here, we argue that this confusion is unnecessary and provide a framework on how to think about protein-level FDRs, starting from its basic principle: the null hypothesis. Specifically, we point out that two competing null hypotheses are used concurrently in today's protein inference methods, which has gone unnoticed by many. Using simulations of a shotgun proteomics experiment, we show how confusing one null hypothesis for the other can lead to serious discrepancies in the FDR. Furthermore, we demonstrate how the same simulations can be used to verify FDR estimates of protein inference methods. In particular, we show that, for a simple protein inference method, decoy models can be used to accurately estimate protein-level FDRs for both competing null hypotheses. © 2016 The Authors. Proteomics Published by Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Waters, Katrina M.; Liu, Tao; Quesenberry, Ryan D.; Willse, Alan R.; Bandyopadhyay, Somnath; Kathmann, Loel E.; Weber, Thomas J.; Smith, Richard D.; Wiley, H. Steven; Thrall, Brian D.
2012-01-01
To understand how integration of multiple data types can help decipher cellular responses at the systems level, we analyzed the mitogenic response of human mammary epithelial cells to epidermal growth factor (EGF) using whole genome microarrays, mass spectrometry-based proteomics and large-scale western blots with over 1000 antibodies. A time course analysis revealed significant differences in the expression of 3172 genes and 596 proteins, including protein phosphorylation changes measured by western blot. Integration of these disparate data types showed that each contributed qualitatively different components to the observed cell response to EGF and that varying degrees of concordance in gene expression and protein abundance measurements could be linked to specific biological processes. Networks inferred from individual data types were relatively limited, whereas networks derived from the integrated data recapitulated the known major cellular responses to EGF and exhibited more highly connected signaling nodes than networks derived from any individual dataset. While cell cycle regulatory pathways were altered as anticipated, we found the most robust response to mitogenic concentrations of EGF was induction of matrix metalloprotease cascades, highlighting the importance of the EGFR system as a regulator of the extracellular environment. These results demonstrate the value of integrating multiple levels of biological information to more accurately reconstruct networks of cellular response. PMID:22479638
ESTuber db: an online database for Tuber borchii EST sequences.
Lazzari, Barbara; Caprera, Andrea; Cosentino, Cristian; Stella, Alessandra; Milanesi, Luciano; Viotti, Angelo
2007-03-08
The ESTuber database (http://www.itb.cnr.it/estuber) includes 3,271 Tuber borchii expressed sequence tags (EST). The dataset consists of 2,389 sequences from an in-house prepared cDNA library from truffle vegetative hyphae, and 882 sequences downloaded from GenBank and representing four libraries from white truffle mycelia and ascocarps at different developmental stages. An automated pipeline was prepared to process EST sequences using public software integrated by in-house developed Perl scripts. Data were collected in a MySQL database, which can be queried via a php-based web interface. Sequences included in the ESTuber db were clustered and annotated against three databases: the GenBank nr database, the UniProtKB database and a third in-house prepared database of fungi genomic sequences. An algorithm was implemented to infer statistical classification among Gene Ontology categories from the ontology occurrences deduced from the annotation procedure against the UniProtKB database. Ontologies were also deduced from the annotation of more than 130,000 EST sequences from five filamentous fungi, for intra-species comparison purposes. Further analyses were performed on the ESTuber db dataset, including tandem repeats search and comparison of the putative protein dataset inferred from the EST sequences to the PROSITE database for protein patterns identification. All the analyses were performed both on the complete sequence dataset and on the contig consensus sequences generated by the EST assembly procedure. The resulting web site is a resource of data and links related to truffle expressed genes. The Sequence Report and Contig Report pages are the web interface core structures which, together with the Text search utility and the Blast utility, allow easy access to the data stored in the database.
PAnalyzer: a software tool for protein inference in shotgun proteomics.
Prieto, Gorka; Aloria, Kerman; Osinalde, Nerea; Fullaondo, Asier; Arizmendi, Jesus M; Matthiesen, Rune
2012-11-05
Protein inference from peptide identifications in shotgun proteomics must deal with ambiguities that arise due to the presence of peptides shared between different proteins, which is common in higher eukaryotes. Recently data independent acquisition (DIA) approaches have emerged as an alternative to the traditional data dependent acquisition (DDA) in shotgun proteomics experiments. MSE is the term used to name one of the DIA approaches used in QTOF instruments. MSE data require specialized software to process acquired spectra and to perform peptide and protein identifications. However the software available at the moment does not group the identified proteins in a transparent way by taking into account peptide evidence categories. Furthermore the inspection, comparison and report of the obtained results require tedious manual intervention. Here we report a software tool to address these limitations for MSE data. In this paper we present PAnalyzer, a software tool focused on the protein inference process of shotgun proteomics. Our approach considers all the identified proteins and groups them when necessary indicating their confidence using different evidence categories. PAnalyzer can read protein identification files in the XML output format of the ProteinLynx Global Server (PLGS) software provided by Waters Corporation for their MSE data, and also in the mzIdentML format recently standardized by HUPO-PSI. Multiple files can also be read simultaneously and are considered as technical replicates. Results are saved to CSV, HTML and mzIdentML (in the case of a single mzIdentML input file) files. An MSE analysis of a real sample is presented to compare the results of PAnalyzer and ProteinLynx Global Server. We present a software tool to deal with the ambiguities that arise in the protein inference process. Key contributions are support for MSE data analysis by ProteinLynx Global Server and technical replicates integration. PAnalyzer is an easy to use multiplatform and free software tool.
PAnalyzer: A software tool for protein inference in shotgun proteomics
2012-01-01
Background Protein inference from peptide identifications in shotgun proteomics must deal with ambiguities that arise due to the presence of peptides shared between different proteins, which is common in higher eukaryotes. Recently data independent acquisition (DIA) approaches have emerged as an alternative to the traditional data dependent acquisition (DDA) in shotgun proteomics experiments. MSE is the term used to name one of the DIA approaches used in QTOF instruments. MSE data require specialized software to process acquired spectra and to perform peptide and protein identifications. However the software available at the moment does not group the identified proteins in a transparent way by taking into account peptide evidence categories. Furthermore the inspection, comparison and report of the obtained results require tedious manual intervention. Here we report a software tool to address these limitations for MSE data. Results In this paper we present PAnalyzer, a software tool focused on the protein inference process of shotgun proteomics. Our approach considers all the identified proteins and groups them when necessary indicating their confidence using different evidence categories. PAnalyzer can read protein identification files in the XML output format of the ProteinLynx Global Server (PLGS) software provided by Waters Corporation for their MSE data, and also in the mzIdentML format recently standardized by HUPO-PSI. Multiple files can also be read simultaneously and are considered as technical replicates. Results are saved to CSV, HTML and mzIdentML (in the case of a single mzIdentML input file) files. An MSE analysis of a real sample is presented to compare the results of PAnalyzer and ProteinLynx Global Server. Conclusions We present a software tool to deal with the ambiguities that arise in the protein inference process. Key contributions are support for MSE data analysis by ProteinLynx Global Server and technical replicates integration. PAnalyzer is an easy to use multiplatform and free software tool. PMID:23126499
Inferring protein domains associated with drug side effects based on drug-target interaction network
2013-01-01
Background Most phenotypic effects of drugs are involved in the interactions between drugs and their target proteins, however, our knowledge about the molecular mechanism of the drug-target interactions is very limited. One of challenging issues in recent pharmaceutical science is to identify the underlying molecular features which govern drug-target interactions. Results In this paper, we make a systematic analysis of the correlation between drug side effects and protein domains, which we call "pharmacogenomic features," based on the drug-target interaction network. We detect drug side effects and protein domains that appear jointly in known drug-target interactions, which is made possible by using classifiers with sparse models. It is shown that the inferred pharmacogenomic features can be used for predicting potential drug-target interactions. We also discuss advantages and limitations of the pharmacogenomic features, compared with the chemogenomic features that are the associations between drug chemical substructures and protein domains. Conclusion The inferred side effect-domain association network is expected to be useful for estimating common drug side effects for different protein families and characteristic drug side effects for specific protein domains. PMID:24565527
Exploring Plant Co-Expression and Gene-Gene Interactions with CORNET 3.0.
Van Bel, Michiel; Coppens, Frederik
2017-01-01
Selecting and filtering a reference expression and interaction dataset when studying specific pathways and regulatory interactions can be a very time-consuming and error-prone task. In order to reduce the duplicated efforts required to amass such datasets, we have created the CORNET (CORrelation NETworks) platform which allows for easy access to a wide variety of data types: coexpression data, protein-protein interactions, regulatory interactions, and functional annotations. The CORNET platform outputs its results in either text format or through the Cytoscape framework, which is automatically launched by the CORNET website.CORNET 3.0 is the third iteration of the web platform designed for the user exploration of the coexpression space of plant genomes, with a focus on the model species Arabidopsis thaliana. Here we describe the platform: the tools, data, and best practices when using the platform. We indicate how the platform can be used to infer networks from a set of input genes, such as upregulated genes from an expression experiment. By exploring the network, new target and regulator genes can be discovered, allowing for follow-up experiments and more in-depth study. We also indicate how to avoid common pitfalls when evaluating the networks and how to avoid over interpretation of the results.All CORNET versions are available at http://bioinformatics.psb.ugent.be/cornet/ .
Post-Transcriptional Regulation of BCL2 mRNA by the RNA-Binding Protein ZFP36L1 in Malignant B Cells
Zekavati, Anna; Nasir, Asghar; Alcaraz, Amor; Aldrovandi, Maceler; Marsh, Phil; Norton, John D.; Murphy, John J.
2014-01-01
The human ZFP36 zinc finger protein family consists of ZFP36, ZFP36L1, and ZFP36L2. These proteins regulate various cellular processes, including cell apoptosis, by binding to adenine uridine rich elements in the 3′ untranslated regions of sets of target mRNAs to promote their degradation. The pro-apoptotic and other functions of ZFP36 family members have been implicated in the pathogenesis of lymphoid malignancies. To identify candidate mRNAs that are targeted in the pro-apoptotic response by ZFP36L1, we reverse-engineered a gene regulatory network for all three ZFP36 family members using the ‘maximum information coefficient’ (MIC) for target gene inference on a large microarray gene expression dataset representing cells of diverse histological origin. Of the three inferred ZFP36L1 mRNA targets that were identified, we focussed on experimental validation of mRNA for the pro-survival protein, BCL2, as a target for ZFP36L1. RNA electrophoretic mobility shift assay experiments revealed that ZFP36L1 interacted with the BCL2 adenine uridine rich element. In murine BCL1 leukemia cells stably transduced with a ZFP36L1 ShRNA lentiviral construct, BCL2 mRNA degradation was significantly delayed compared to control lentiviral expressing cells and ZFP36L1 knockdown in different cell types (BCL1, ACHN, Ramos), resulted in increased levels of BCL2 mRNA levels compared to control cells. 3′ untranslated region luciferase reporter assays in HEK293T cells showed that wild type but not zinc finger mutant ZFP36L1 protein was able to downregulate a BCL2 construct containing the BCL2 adenine uridine rich element and removal of the adenine uridine rich core from the BCL2 3′ untranslated region in the reporter construct significantly reduced the ability of ZFP36L1 to mediate this effect. Taken together, our data are consistent with ZFP36L1 interacting with and mediating degradation of BCL2 mRNA as an important target through which ZFP36L1 mediates its pro-apoptotic effects in malignant B-cells. PMID:25014217
Genomic survey, expression profile and co-expression network analysis of OsWD40 family in rice
2012-01-01
Background WD40 proteins represent a large family in eukaryotes, which have been involved in a broad spectrum of crucial functions. Systematic characterization and co-expression analysis of OsWD40 genes enable us to understand the networks of the WD40 proteins and their biological processes and gene functions in rice. Results In this study, we identify and analyze 200 potential OsWD40 genes in rice, describing their gene structures, genome localizations, and evolutionary relationship of each member. Expression profiles covering the whole life cycle in rice has revealed that transcripts of OsWD40 were accumulated differentially during vegetative and reproductive development and preferentially up or down-regulated in different tissues. Under phytohormone treatments, 25 OsWD40 genes were differentially expressed with treatments of one or more of the phytohormone NAA, KT, or GA3 in rice seedlings. We also used a combined analysis of expression correlation and Gene Ontology annotation to infer the biological role of the OsWD40 genes in rice. The results suggested that OsWD40 genes may perform their diverse functions by complex network, thus were predictive for understanding their biological pathways. The analysis also revealed that OsWD40 genes might interact with each other to take part in metabolic pathways, suggesting a more complex feedback network. Conclusions All of these analyses suggest that the functions of OsWD40 genes are diversified, which provide useful references for selecting candidate genes for further functional studies. PMID:22429805
Gene expression with ontologic enrichment and connectivity mapping tools is widely used to infer modes of action (MOA) for therapeutic drugs. Despite progress in high-throughput (HT) genomic systems, strategies suitable to identify industrial chemical MOA are needed. The L1000 is...
TRACING CO-REGULATORY NETWORK DYNAMICS IN NOISY, SINGLE-CELL TRANSCRIPTOME TRAJECTORIES.
Cordero, Pablo; Stuart, Joshua M
2017-01-01
The availability of gene expression data at the single cell level makes it possible to probe the molecular underpinnings of complex biological processes such as differentiation and oncogenesis. Promising new methods have emerged for reconstructing a progression 'trajectory' from static single-cell transcriptome measurements. However, it remains unclear how to adequately model the appreciable level of noise in these data to elucidate gene regulatory network rewiring. Here, we present a framework called Single Cell Inference of MorphIng Trajectories and their Associated Regulation (SCIMITAR) that infers progressions from static single-cell transcriptomes by employing a continuous parametrization of Gaussian mixtures in high-dimensional curves. SCIMITAR yields rich models from the data that highlight genes with expression and co-expression patterns that are associated with the inferred progression. Further, SCIMITAR extracts regulatory states from the implicated trajectory-evolvingco-expression networks. We benchmark the method on simulated data to show that it yields accurate cell ordering and gene network inferences. Applied to the interpretation of a single-cell human fetal neuron dataset, SCIMITAR finds progression-associated genes in cornerstone neural differentiation pathways missed by standard differential expression tests. Finally, by leveraging the rewiring of gene-gene co-expression relations across the progression, the method reveals the rise and fall of co-regulatory states and trajectory-dependent gene modules. These analyses implicate new transcription factors in neural differentiation including putative co-factors for the multi-functional NFAT pathway.
Ambroise, Jérôme; Robert, Annie; Macq, Benoit; Gala, Jean-Luc
2012-01-06
An important challenge in system biology is the inference of biological networks from postgenomic data. Among these biological networks, a gene transcriptional regulatory network focuses on interactions existing between transcription factors (TFs) and and their corresponding target genes. A large number of reverse engineering algorithms were proposed to infer such networks from gene expression profiles, but most current methods have relatively low predictive performances. In this paper, we introduce the novel TNIFSED method (Transcriptional Network Inference from Functional Similarity and Expression Data), that infers a transcriptional network from the integration of correlations and partial correlations of gene expression profiles and gene functional similarities through a supervised classifier. In the current work, TNIFSED was applied to predict the transcriptional network in Escherichia coli and in Saccharomyces cerevisiae, using datasets of 445 and 170 affymetrix arrays, respectively. Using the area under the curve of the receiver operating characteristics and the F-measure as indicators, we showed the predictive performance of TNIFSED to be better than unsupervised state-of-the-art methods. TNIFSED performed slightly worse than the supervised SIRENE algorithm for the target genes identification of the TF having a wide range of yet identified target genes but better for TF having only few identified target genes. Our results indicate that TNIFSED is complementary to the SIRENE algorithm, and particularly suitable to discover target genes of "orphan" TFs.
Rafiqi, Maryam; Gan, Pamela H P; Ravensdale, Michael; Lawrence, Gregory J; Ellis, Jeffrey G; Jones, David A; Hardham, Adrienne R; Dodds, Peter N
2010-06-01
Translocation of pathogen effector proteins into the host cell cytoplasm is a key determinant for the pathogenicity of many bacterial and oomycete plant pathogens. A number of secreted fungal avirulence (Avr) proteins are also inferred to be delivered into host cells, based on their intracellular recognition by host resistance proteins, including those of flax rust (Melampsora lini). Here, we show by immunolocalization that the flax rust AvrM protein is secreted from haustoria during infection and accumulates in the haustorial wall. Five days after inoculation, the AvrM protein was also detected within the cytoplasm of a proportion of plant cells containing haustoria, confirming its delivery into host cells during infection. Transient expression of secreted AvrL567 and AvrM proteins fused to cerulean fluorescent protein in tobacco (Nicotiana tabacum) and flax cells resulted in intracellular accumulation of the fusion proteins. The rust Avr protein signal peptides were functional in plants and efficiently directed fused cerulean into the secretory pathway. Thus, these secreted effectors are internalized into the plant cell cytosol in the absence of the pathogen, suggesting that they do not require a pathogen-encoded transport mechanism. Uptake of these proteins is dependent on signals in their N-terminal regions, but the primary sequence features of these uptake regions are not conserved between different rust effectors.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Jing; Ma, Zihao; Carr, Steven A.
Coexpression of mRNAs under multiple conditions is commonly used to infer cofunctionality of their gene products despite well-known limitations of this “guilt-by-association” (GBA) approach. Recent advancements in mass spectrometry-based proteomic technologies have enabled global expression profiling at the protein level; however, whether proteome profiling data can outperform transcriptome profiling data for coexpression based gene function prediction has not been systematically investigated. Here, we address this question by constructing and analyzing mRNA and protein coexpression networks for three cancer types with matched mRNA and protein profiling data from The Cancer Genome Atlas (TCGA) and the Clinical Proteomic Tumor Analysis Consortium (CPTAC).more » Our analyses revealed a marked difference in wiring between the mRNA and protein coexpression networks. Whereas protein coexpression was driven primarily by functional similarity between coexpressed genes, mRNA coexpression was driven by both cofunction and chromosomal colocalization of the genes. Functionally coherent mRNA modules were more likely to have their edges preserved in corresponding protein networks than functionally incoherent mRNA modules. Proteomic data strengthened the link between gene expression and function for at least 75% of Gene Ontology (GO) biological processes and 90% of KEGG pathways. A web application Gene2Net (http://cptac.gene2net.org) developed based on the three protein coexpression networks revealed novel gene-function relationships, such as linking ERBB2 (HER2) to lipid biosynthetic process in breast cancer, identifying PLG as a new gene involved in complement activation, and identifying AEBP1 as a new epithelial-mesenchymal transition (EMT) marker. Our results demonstrate that proteome profiling outperforms transcriptome profiling for coexpression based gene function prediction. Proteomics should be integrated if not preferred in gene function and human disease studies. Molecular & Cellular Proteomics 16: 10.1074/mcp.M116.060301, 121–134, 2017.« less
Protein-protein interactions in the RPS4/RRS1 immune receptor complex
Sarris, Panagiotis F.
2017-01-01
Plant NLR (Nucleotide-binding domain and Leucine-rich Repeat) immune receptor proteins are encoded by Resistance (R) genes and confer specific resistance to pathogen races that carry the corresponding recognized effectors. Some NLR proteins function in pairs, forming receptor complexes for the perception of specific effectors. We show here that the Arabidopsis RPS4 and RRS1 NLR proteins are both required to make an authentic immune complex. Over-expression of RPS4 in tobacco or in Arabidopsis results in constitutive defense activation; this phenotype is suppressed in the presence of RRS1. RRS1 protein co-immunoprecipitates (co-IPs) with itself in the presence or absence of RPS4, but in contrast, RPS4 does not associate with itself in the absence of RRS1. In the presence of RRS1, RPS4 associates with defense signaling regulator EDS1 solely in the nucleus, in contrast to the extra-nuclear location found in the absence of RRS1. The AvrRps4 effector does not disrupt RPS4-EDS1 association in the presence of RRS1. In the absence of RRS1, AvrRps4 interacts with EDS1, forming nucleocytoplasmic aggregates, the formation of which is disturbed by the co-expression of PAD4 but not by SAG101. These data indicate that the study of an immune receptor protein complex in the absence of all components can result in misleading inferences, and reveals an NLR complex that dynamically interacts with the immune regulators EDS1/PAD4 or EDS1/SAG101, and with effectors, during the process by which effector recognition is converted to defense activation. PMID:28475615
Boboila, Shuobo; Lopez, Gonzalo; Yu, Jiyang; Banerjee, Debarshi; Kadenhe-Chiweshe, Angela; Connolly, Eileen P; Kandel, Jessica J; Rajbhandari, Presha; Silva, Jose M; Califano, Andrea; Yamashiro, Darrell J
2018-06-07
Despite the identification of MYCN amplification as an adverse prognostic marker in neuroblastoma, MYCN inhibitors have yet to be developed. Here, by integrating evidence from a whole-genome shRNA library screen and the computational inference of master regulator proteins, we identify transcription factor activating protein 4 (TFAP4) as a critical effector of MYCN amplification in neuroblastoma, providing a novel synthetic lethal target. We demonstrate that TFAP4 is a direct target of MYCN in neuroblastoma cells, and that its expression and activity strongly negatively correlate with neuroblastoma patient survival. Silencing TFAP4 selectively inhibits MYCN-amplified neuroblastoma cell growth both in vitro and in vivo, in xenograft mouse models. Mechanistically, silencing TFAP4 induces neuroblastoma differentiation, as evidenced by increased neurite outgrowth and upregulation of neuronal markers. Taken together, our results demonstrate that TFAP4 is a key regulator of MYCN-amplified neuroblastoma and may represent a valuable novel therapeutic target.
Dong, Chen; Hu, Huigang; Xie, Jianghui
2016-12-01
DNA-binding with one finger (Dof) domain proteins are a multigene family of plant-specific transcription factors involved in numerous aspects of plant growth and development. In this study, we report a genome-wide search for Musa acuminata Dof (MaDof) genes and their expression profiles at different developmental stages and in response to various abiotic stresses. In addition, a complete overview of the Dof gene family in bananas is presented, including the gene structures, chromosomal locations, cis-regulatory elements, conserved protein domains, and phylogenetic inferences. Based on the genome-wide analysis, we identified 74 full-length protein-coding MaDof genes unevenly distributed on 11 chromosomes. Phylogenetic analysis with Dof members from diverse plant species showed that MaDof genes can be classified into four subgroups (StDof I, II, III, and IV). The detailed genomic information of the MaDof gene homologs in the present study provides opportunities for functional analyses to unravel the exact role of the genes in plant growth and development.
Praveen, Paurush; Fröhlich, Holger
2013-01-01
Inferring regulatory networks from experimental data via probabilistic graphical models is a popular framework to gain insights into biological systems. However, the inherent noise in experimental data coupled with a limited sample size reduces the performance of network reverse engineering. Prior knowledge from existing sources of biological information can address this low signal to noise problem by biasing the network inference towards biologically plausible network structures. Although integrating various sources of information is desirable, their heterogeneous nature makes this task challenging. We propose two computational methods to incorporate various information sources into a probabilistic consensus structure prior to be used in graphical model inference. Our first model, called Latent Factor Model (LFM), assumes a high degree of correlation among external information sources and reconstructs a hidden variable as a common source in a Bayesian manner. The second model, a Noisy-OR, picks up the strongest support for an interaction among information sources in a probabilistic fashion. Our extensive computational studies on KEGG signaling pathways as well as on gene expression data from breast cancer and yeast heat shock response reveal that both approaches can significantly enhance the reconstruction accuracy of Bayesian Networks compared to other competing methods as well as to the situation without any prior. Our framework allows for using diverse information sources, like pathway databases, GO terms and protein domain data, etc. and is flexible enough to integrate new sources, if available. PMID:23826291
Finding undetected protein associations in cell signaling by belief propagation.
Bailly-Bechet, M; Borgs, C; Braunstein, A; Chayes, J; Dagkessamanskaia, A; François, J-M; Zecchina, R
2011-01-11
External information propagates in the cell mainly through signaling cascades and transcriptional activation, allowing it to react to a wide spectrum of environmental changes. High-throughput experiments identify numerous molecular components of such cascades that may, however, interact through unknown partners. Some of them may be detected using data coming from the integration of a protein-protein interaction network and mRNA expression profiles. This inference problem can be mapped onto the problem of finding appropriate optimal connected subgraphs of a network defined by these datasets. The optimization procedure turns out to be computationally intractable in general. Here we present a new distributed algorithm for this task, inspired from statistical physics, and apply this scheme to alpha factor and drug perturbations data in yeast. We identify the role of the COS8 protein, a member of a gene family of previously unknown function, and validate the results by genetic experiments. The algorithm we present is specially suited for very large datasets, can run in parallel, and can be adapted to other problems in systems biology. On renowned benchmarks it outperforms other algorithms in the field.
Algorithms for database-dependent search of MS/MS data.
Matthiesen, Rune
2013-01-01
The frequent used bottom-up strategy for identification of proteins and their associated modifications generate nowadays typically thousands of MS/MS spectra that normally are matched automatically against a protein sequence database. Search engines that take as input MS/MS spectra and a protein sequence database are referred as database-dependent search engines. Many programs both commercial and freely available exist for database-dependent search of MS/MS spectra and most of the programs have excellent user documentation. The aim here is therefore to outline the algorithm strategy behind different search engines rather than providing software user manuals. The process of database-dependent search can be divided into search strategy, peptide scoring, protein scoring, and finally protein inference. Most efforts in the literature have been put in to comparing results from different software rather than discussing the underlining algorithms. Such practical comparisons can be cluttered by suboptimal implementation and the observed differences are frequently caused by software parameters settings which have not been set proper to allow even comparison. In other words an algorithmic idea can still be worth considering even if the software implementation has been demonstrated to be suboptimal. The aim in this chapter is therefore to split the algorithms for database-dependent searching of MS/MS data into the above steps so that the different algorithmic ideas become more transparent and comparable. Most search engines provide good implementations of the first three data analysis steps mentioned above, whereas the final step of protein inference are much less developed for most search engines and is in many cases performed by an external software. The final part of this chapter illustrates how protein inference is built into the VEMS search engine and discusses a stand-alone program SIR for protein inference that can import a Mascot search result.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zelinka, L.; McCann, S.; Budde, J.
2011-08-05
Highlights: {yields} Affinity purification of the autoimmune rippling muscle disease immunogenic domain of titin. {yields} Partial sequence analysis confirms that the peptides is in the I band region of titin. {yields} This region of the human titin shows high degree of homology to mouse titin N2-A. -- Abstract: Autoimmune rippling muscle disease (ARMD) is an autoimmune neuromuscular disease associated with myasthenia gravis (MG). Past studies in our laboratory recognized a very high molecular weight skeletal muscle protein antigen identified by ARMD patient antisera as the titin isoform. These past studies used antisera from ARMD and MG patients as probes tomore » screen a human skeletal muscle cDNA library and several pBluescript clones revealed supporting expression of immunoreactive peptides. This study characterizes the products of subcloning the titin immunoreactive domain into pGEX-3X and the subsequent fusion protein. Sequence analysis of the fusion gene indicates the cloned titin domain (GenBank ID: (EU428784)) is in frame and is derived from a sequence of N2-A spanning the exons 248-250 an area that encodes the fibronectin III domain. PCR and EcoR1 restriction mapping studies have demonstrated that the inserted cDNA is of a size that is predicted by bioinformatics analysis of the subclone. Expression of the fusion protein result in the isolation of a polypeptide of 52 kDa consistent with the predicted inferred amino acid sequence. Immunoblot experiments of the fusion protein, using rippling muscle/myasthenia gravis antisera, demonstrate that only the titin domain is immunoreactive.« less
Identifying biological pathways that underlie primordial short stature using network analysis.
Hanson, Dan; Stevens, Adam; Murray, Philip G; Black, Graeme C M; Clayton, Peter E
2014-06-01
Mutations in CUL7, OBSL1 and CCDC8, leading to disordered ubiquitination, cause one of the commonest primordial growth disorders, 3-M syndrome. This condition is associated with i) abnormal p53 function, ii) GH and/or IGF1 resistance, which may relate to failure to recycle signalling molecules, and iii) cellular IGF2 deficiency. However the exact molecular mechanisms that may link these abnormalities generating growth restriction remain undefined. In this study, we have used immunoprecipitation/mass spectrometry and transcriptomic studies to generate a 3-M 'interactome', to define key cellular pathways and biological functions associated with growth failure seen in 3-M. We identified 189 proteins which interacted with CUL7, OBSL1 and CCDC8, from which a network including 176 of these proteins was generated. To strengthen the association to 3-M syndrome, these proteins were compared with an inferred network generated from the genes that were differentially expressed in 3-M fibroblasts compared with controls. This resulted in a final 3-M network of 131 proteins, with the most significant biological pathway within the network being mRNA splicing/processing. We have shown using an exogenous insulin receptor (INSR) minigene system that alternative splicing of exon 11 is significantly changed in HEK293 cells with altered expression of CUL7, OBSL1 and CCDC8 and in 3-M fibroblasts. The net result is a reduction in the expression of the mitogenic INSR isoform in 3-M syndrome. From these preliminary data, we hypothesise that disordered ubiquitination could result in aberrant mRNA splicing in 3-M; however, further investigation is required to determine whether this contributes to growth failure. © 2014 The authors.
Dual Transcriptomic Profiling of Host and Microbiota during Health and Disease in Pediatric Asthma.
Pérez-Losada, Marcos; Castro-Nallar, Eduardo; Bendall, Matthew L; Freishtat, Robert J; Crandall, Keith A
2015-01-01
High-throughput sequencing (HTS) analysis of microbial communities from the respiratory airways has heavily relied on the 16S rRNA gene. Given the intrinsic limitations of this approach, airway microbiome research has focused on assessing bacterial composition during health and disease, and its variation in relation to clinical and environmental factors, or other microbiomes. Consequently, very little effort has been dedicated to describing the functional characteristics of the airway microbiota and even less to explore the microbe-host interactions. Here we present a simultaneous assessment of microbiome and host functional diversity and host-microbe interactions from the same RNA-seq experiment, while accounting for variation in clinical metadata. Transcriptomic (host) and metatranscriptomic (microbiota) sequences from the nasal epithelium of 8 asthmatics and 6 healthy controls were separated in silico and mapped to available human and NCBI-NR protein reference databases. Human genes differentially expressed in asthmatics and controls were then used to infer upstream regulators involved in immune and inflammatory responses. Concomitantly, microbial genes were mapped to metabolic databases (COG, SEED, and KEGG) to infer microbial functions differentially expressed in asthmatics and controls. Finally, multivariate analysis was applied to find associations between microbiome characteristics and host upstream regulators while accounting for clinical variation. Our study showed significant differences in the metabolism of microbiomes from asthmatic and non-asthmatic children for up to 25% of the functional properties tested. Enrichment analysis of 499 differentially expressed host genes for inflammatory and immune responses revealed 43 upstream regulators differentially activated in asthma. Microbial adhesion (virulence) and Proteobacteria abundance were significantly associated with variation in the expression of the upstream regulator IL1A; suggesting that microbiome characteristics modulate host inflammatory and immune systems during asthma.
Majeske, Audrey J; Oren, Matan; Sacchi, Sandro; Smith, L Courtney
2014-12-01
Immune systems in animals rely on fast and efficient responses to a wide variety of pathogens. The Sp185/333 gene family in the purple sea urchin, Strongylocentrotus purpuratus, consists of an estimated 50 (±10) members per genome that share a basic gene structure but show high sequence diversity, primarily due to the mosaic appearance of short blocks of sequence called elements. The genes show significantly elevated expression in three subpopulations of phagocytes responding to marine bacteria. The encoded Sp185/333 proteins are highly diverse and have central effector functions in the immune system. In this study we report the Sp185/333 gene expression in single sea urchin phagocytes. Sea urchins challenged with heat-killed marine bacteria resulted in a typical increase in coelomocyte concentration within 24 h, which included an increased proportion of phagocytes expressing Sp185/333 proteins. Phagocyte fractions enriched from coelomocytes were used in limiting dilutions to obtain samples of single cells that were evaluated for Sp185/333 gene expression by nested RT-PCR. Amplicon sequences showed identical or nearly identical Sp185/333 amplicon sequences in single phagocytes with matches to six known Sp185/333 element patterns, including both common and rare element patterns. This suggested that single phagocytes show restricted expression from the Sp185/333 gene family and infers a diverse, flexible, and efficient response to pathogens. This type of expression pattern from a family of immune response genes in single cells has not been identified previously in other invertebrates. Copyright © 2014 by The American Association of Immunologists, Inc.
Allen, Margaret L.; Mertens, Jeffrey A.
2008-01-01
Three unique cDNAs encoding putative polygalacturonase enzymes were isolated from the tarnished plant bug, Lygus lineolaris (Palisot de Beauvois) (Hemiptera: Miridae). The three nucleotide sequences were dissimilar to one another, but the deduced amino acid sequences were similar to each other and to other polygalacturonases from insects, fungi, plants, and bacteria. Four conserved segments characteristic of polygalacturonases were present, but with some notable semiconservative substitutions. Two of four expected disulfide bridge—forming cysteine pairs were present. All three inferred protein translations included predicted signal sequences of 17 to 20 amino acids. Amplification of genomic DNA identified an intron in one of the genes, Llpg1, in the 5′ untranslated region. Semiquantitative RT-PCR revealed expression in all stages of the insect except the eggs. Expression in adults, male and female, was highly variable, indicating a family of highly inducible and diverse enzymes adapted to the generalist polyphagous nature of this important pest. PMID:20233096
Ariani, Andrea; Gepts, Paul
2015-10-01
Plant aquaporins are a large and diverse family of water channel proteins that are essential for several physiological processes in living organisms. Numerous studies have linked plant aquaporins with a plethora of processes, such as nutrient acquisition, CO2 transport, plant growth and development, and response to abiotic stresses. However, little is known about this protein family in common bean. Here, we present a genome-wide identification of the aquaporin gene family in common bean (Phaseolus vulgaris L.), a legume crop essential for human nutrition. We identified 41 full-length coding aquaporin sequences in the common bean genome, divided by phylogenetic analysis into five sub-families (PIPs, TIPs, NIPs, SIPs and XIPs). Residues determining substrate specificity of aquaporins (i.e., NPA motifs and ar/R selectivity filter) seem conserved between common bean and other plant species, allowing inference of substrate specificity for these proteins. Thanks to the availability of RNA-sequencing datasets, expression levels in different organs and in leaves of wild and domesticated bean accessions were evaluated. Three aquaporins (PvTIP1;1, PvPIP2;4 and PvPIP1;2) have the overall highest mean expressions, with PvTIP1;1 having the highest expression among all aquaporins. We performed an EST database mining to identify drought-responsive aquaporins in common bean. This analysis showed a significant increase in expression for PvTIP1;1 in drought stress conditions compared to well-watered environments. The pivotal role suggested for PvTIP1;1 in regulating water homeostasis and drought stress response in the common bean should be verified by further field experimentation under drought stress.
Cross-species inference of long non-coding RNAs greatly expands the ruminant transcriptome.
Bush, Stephen J; Muriuki, Charity; McCulloch, Mary E B; Farquhar, Iseabail L; Clark, Emily L; Hume, David A
2018-04-24
mRNA-like long non-coding RNAs (lncRNAs) are a significant component of mammalian transcriptomes, although most are expressed only at low levels, with high tissue-specificity and/or at specific developmental stages. Thus, in many cases lncRNA detection by RNA-sequencing (RNA-seq) is compromised by stochastic sampling. To account for this and create a catalogue of ruminant lncRNAs, we compared de novo assembled lncRNAs derived from large RNA-seq datasets in transcriptional atlas projects for sheep and goats with previous lncRNAs assembled in cattle and human. We then combined the novel lncRNAs with the sheep transcriptional atlas to identify co-regulated sets of protein-coding and non-coding loci. Few lncRNAs could be reproducibly assembled from a single dataset, even with deep sequencing of the same tissues from multiple animals. Furthermore, there was little sequence overlap between lncRNAs that were assembled from pooled RNA-seq data. We combined positional conservation (synteny) with cross-species mapping of candidate lncRNAs to identify a consensus set of ruminant lncRNAs and then used the RNA-seq data to demonstrate detectable and reproducible expression in each species. In sheep, 20 to 30% of lncRNAs were located close to protein-coding genes with which they are strongly co-expressed, which is consistent with the evolutionary origin of some ncRNAs in enhancer sequences. Nevertheless, most of the lncRNAs are not co-expressed with neighbouring protein-coding genes. Alongside substantially expanding the ruminant lncRNA repertoire, the outcomes of our analysis demonstrate that stochastic sampling can be partly overcome by combining RNA-seq datasets from related species. This has practical implications for the future discovery of lncRNAs in other species.
LWD-TCP complex activates the morning gene CCA1 in Arabidopsis.
Wu, Jing-Fen; Tsai, Huang-Lung; Joanito, Ignasius; Wu, Yi-Chen; Chang, Chin-Wen; Li, Yi-Hang; Wang, Ying; Hong, Jong Chan; Chu, Jhih-Wei; Hsu, Chao-Ping; Wu, Shu-Hsing
2016-10-13
A double-negative feedback loop formed by the morning genes CIRCADIAN CLOCK ASSOCIATED1 (CCA1)/LATE ELONGATED HYPOCOTYL (LHY) and the evening gene TIMING OF CAB EXPRESSION1 (TOC1) contributes to regulation of the circadian clock in Arabidopsis. A 24-h circadian cycle starts with the peak expression of CCA1 at dawn. Although CCA1 is targeted by multiple transcriptional repressors, including PSEUDO-RESPONSE REGULATOR9 (PRR9), PRR7, PRR5 and CCA1 HIKING EXPEDITION (CHE), activators of CCA1 remain elusive. Here we use mathematical modelling to infer a co-activator role for LIGHT-REGULATED WD1 (LWD1) in CCA1 expression. We show that the TEOSINTE BRANCHED 1-CYCLOIDEA-PCF20 (TCP20) and TCP22 proteins act as LWD-interacting transcriptional activators. The concomitant binding of LWD1 and TCP20/TCP22 to the TCP-binding site in the CCA1 promoter activates CCA1. Our study reveals activators of the morning gene CCA1 and provides an action mechanism that ensures elevated expression of CCA1 at dawn to sustain a robust clock.
LWD–TCP complex activates the morning gene CCA1 in Arabidopsis
Wu, Jing-Fen; Tsai, Huang-Lung; Joanito, Ignasius; Wu, Yi-Chen; Chang, Chin-Wen; Li, Yi-Hang; Wang, Ying; Hong, Jong Chan; Chu, Jhih-Wei; Hsu, Chao-Ping; Wu, Shu-Hsing
2016-01-01
A double-negative feedback loop formed by the morning genes CIRCADIAN CLOCK ASSOCIATED1 (CCA1)/LATE ELONGATED HYPOCOTYL (LHY) and the evening gene TIMING OF CAB EXPRESSION1 (TOC1) contributes to regulation of the circadian clock in Arabidopsis. A 24-h circadian cycle starts with the peak expression of CCA1 at dawn. Although CCA1 is targeted by multiple transcriptional repressors, including PSEUDO-RESPONSE REGULATOR9 (PRR9), PRR7, PRR5 and CCA1 HIKING EXPEDITION (CHE), activators of CCA1 remain elusive. Here we use mathematical modelling to infer a co-activator role for LIGHT-REGULATED WD1 (LWD1) in CCA1 expression. We show that the TEOSINTE BRANCHED 1-CYCLOIDEA-PCF20 (TCP20) and TCP22 proteins act as LWD-interacting transcriptional activators. The concomitant binding of LWD1 and TCP20/TCP22 to the TCP-binding site in the CCA1 promoter activates CCA1. Our study reveals activators of the morning gene CCA1 and provides an action mechanism that ensures elevated expression of CCA1 at dawn to sustain a robust clock. PMID:27734958
Grassi, Angela; Di Camillo, Barbara; Ciccarese, Francesco; Agnusdei, Valentina; Zanovello, Paola; Amadori, Alberto; Finesso, Lorenzo; Indraccolo, Stefano; Toffolo, Gianna Maria
2016-03-12
Inference of gene regulation from expression data may help to unravel regulatory mechanisms involved in complex diseases or in the action of specific drugs. A challenging task for many researchers working in the field of systems biology is to build up an experiment with a limited budget and produce a dataset suitable to reconstruct putative regulatory modules worth of biological validation. Here, we focus on small-scale gene expression screens and we introduce a novel experimental set-up and a customized method of analysis to make inference on regulatory modules starting from genetic perturbation data, e.g. knockdown and overexpression data. To illustrate the utility of our strategy, it was applied to produce and analyze a dataset of quantitative real-time RT-PCR data, in which interferon-α (IFN-α) transcriptional response in endothelial cells is investigated by RNA silencing of two candidate IFN-α modulators, STAT1 and IFIH1. A putative regulatory module was reconstructed by our method, revealing an intriguing feed-forward loop, in which STAT1 regulates IFIH1 and they both negatively regulate IFNAR1. STAT1 regulation on IFNAR1 was object of experimental validation at the protein level. Detailed description of the experimental set-up and of the analysis procedure is reported, with the intent to be of inspiration for other scientists who want to realize similar experiments to reconstruct gene regulatory modules starting from perturbations of possible regulators. Application of our approach to the study of IFN-α transcriptional response modulators in endothelial cells has led to many interesting novel findings and new biological hypotheses worth of validation.
Werhli, Adriano V; Grzegorczyk, Marco; Husmeier, Dirk
2006-10-15
An important problem in systems biology is the inference of biochemical pathways and regulatory networks from postgenomic data. Various reverse engineering methods have been proposed in the literature, and it is important to understand their relative merits and shortcomings. In the present paper, we compare the accuracy of reconstructing gene regulatory networks with three different modelling and inference paradigms: (1) Relevance networks (RNs): pairwise association scores independent of the remaining network; (2) graphical Gaussian models (GGMs): undirected graphical models with constraint-based inference, and (3) Bayesian networks (BNs): directed graphical models with score-based inference. The evaluation is carried out on the Raf pathway, a cellular signalling network describing the interaction of 11 phosphorylated proteins and phospholipids in human immune system cells. We use both laboratory data from cytometry experiments as well as data simulated from the gold-standard network. We also compare passive observations with active interventions. On Gaussian observational data, BNs and GGMs were found to outperform RNs. The difference in performance was not significant for the non-linear simulated data and the cytoflow data, though. Also, we did not observe a significant difference between BNs and GGMs on observational data in general. However, for interventional data, BNs outperform GGMs and RNs, especially when taking the edge directions rather than just the skeletons of the graphs into account. This suggests that the higher computational costs of inference with BNs over GGMs and RNs are not justified when using only passive observations, but that active interventions in the form of gene knockouts and over-expressions are required to exploit the full potential of BNs. Data, software and supplementary material are available from http://www.bioss.sari.ac.uk/staff/adriano/research.html
Llorente, Briardo; de Souza, Flavio S J; Soto, Gabriela; Meyer, Cristian; Alonso, Guillermo D; Flawiá, Mirtha M; Bravo-Almonacid, Fernando; Ayub, Nicolás D; Rodríguez-Concepción, Manuel
2016-01-11
The plastid organelle comprises a high proportion of nucleus-encoded proteins that were acquired from different prokaryotic donors via independent horizontal gene transfers following its primary endosymbiotic origin. What forces drove the targeting of these alien proteins to the plastid remains an unresolved evolutionary question. To better understand this process we screened for suitable candidate proteins to recapitulate their prokaryote-to-eukaryote transition. Here we identify the ancient horizontal transfer of a bacterial polyphenol oxidase (PPO) gene to the nuclear genome of an early land plant ancestor and infer the possible mechanism behind the plastidial localization of the encoded enzyme. Arabidopsis plants expressing PPO versions either lacking or harbouring a plastid-targeting signal allowed examining fitness consequences associated with its subcellular localization. Markedly, a deleterious effect on plant growth was highly correlated with PPO activity only when producing the non-targeted enzyme, suggesting that selection favoured the fixation of plastid-targeted protein versions. Our results reveal a possible evolutionary mechanism of how selection against heterologous genes encoding cytosolic proteins contributed in incrementing plastid proteome complexity from non-endosymbiotic gene sources, a process that may also impact mitochondrial evolution.
A Novel Type III Endosome Transmembrane Protein, TEMP
Aturaliya, Rajith N.; Kerr, Markus C.; Teasdale, Rohan D.
2012-01-01
As part of a high-throughput subcellular localisation project, the protein encoded by the RIKEN mouse cDNA 2610528J11 was expressed and identified to be associated with both endosomes and the plasma membrane. Based on this, we have assigned the name TEMP for Type III Endosome Membrane Protein. TEMP encodes a short protein of 111 amino acids with a single, alpha-helical transmembrane domain. Experimental analysis of its membrane topology demonstrated it is a Type III membrane protein with the amino-terminus in the lumenal, or extracellular region, and the carboxy-terminus in the cytoplasm. In addition to the plasma membrane TEMP was localized to Rab5 positive early endosomes, Rab5/Rab11 positive recycling endosomes but not Rab7 positive late endosomes. Video microscopy in living cells confirmed TEMP’s plasma membrane localization and identified the intracellular endosome compartments to be tubulovesicular. Overexpression of TEMP resulted in the early/recycling endosomes clustering at the cell periphery that was dependent on the presence of intact microtubules. The cellular function of TEMP cannot be inferred based on bioinformatics comparison, but its cellular distribution between early/recycling endosomes and the plasma membrane suggests a role in membrane transport. PMID:24710541
Differentially delayed root proteome responses to salt stress in sugar cane varieties.
Pacheco, Cinthya Mirella; Pestana-Calsa, Maria Clara; Gozzo, Fabio Cesar; Mansur Custodio Nogueira, Rejane Jurema; Menossi, Marcelo; Calsa, Tercilio
2013-12-06
Soil salinity is a limiting factor to sugar cane crop development, although in general plants present variable mechanisms of tolerance to salinity stress. The molecular basis underlying these mechanisms can be inferred by using proteomic analysis. Thus, the objective of this work was to identify differentially expressed proteins in sugar cane plants submitted to salinity stress. For that, a greenhouse experiment was established with four sugar cane varieties and two salt conditions, 0 mM (control) and 200 mM NaCl. Physiological and proteomics analyses were performed after 2 and 72 h of stress induction by salt. Distinct physiological responses to salinity stress were observed in the varieties and linked to tolerance mechanisms. In proteomic analysis, the roots soluble protein fraction was extracted, quantified, and analyzed through bidimensional electrophoresis. Gel images analyses were done computationally, where in each contrast only one variable was considered (salinity condition or variety). Differential spots were excised, digested by trypsin, and identified via mass spectrometry. The tolerant variety RB867515 showed the highest accumulation of proteins involved in growth, development, carbohydrate and energy metabolism, reactive oxygen species metabolization, protein protection, and membrane stabilization after 2 h of stress. On the other hand, the presence of these proteins in the sensitive variety was verified only in stress treatment after 72 h. These data indicate that these stress responses pathways play a role in the tolerance to salinity in sugar cane, and their effectiveness for phenotypical tolerance depends on early stress detection and activation of the coding genes expression.
González-Mellado, Damián; von Wettstein-Knowles, Penny; Garcés, Rafael; Martínez-Force, Enrique
2010-05-01
The beta-ketoacyl-acyl carrier protein synthase III (KAS III; EC 2.3.1.180) is a condensing enzyme catalyzing the initial step of fatty acid biosynthesis using acetyl-CoA as primer. To determine the mechanisms involved in the biosynthesis of fatty acids in sunflower (Helianthus annuus L.) developing seeds, a cDNA coding for HaKAS III (EF514400) was isolated, cloned and sequenced. Its protein sequence is as much as 72% identical to other KAS III-like ones such as those from Perilla frutescens, Jatropha curcas, Ricinus communis or Cuphea hookeriana. Phylogenetic study of the HaKAS III homologous proteins infers its origin from cyanobacterial ancestors. A genomic DNA gel blot analysis revealed that HaKAS III is a single copy gene. Expression levels of this gene, examined by Q-PCR, revealed higher levels in developing seeds storing oil than in leaves, stems, roots or seedling cotyledons. Heterologous expression of HaKAS III in Escherichia coli altered their fatty acid content and composition implying an interaction of HaKAS III with the bacterial FAS complex. Testing purified HaKAS III recombinant protein by adding to a reconstituted E. coli FAS system lacking condensation activity revealed a novel substrate specificity. In contrast to all hitherto characterized plant KAS IIIs, the activities of which are limited to the first cycles of intraplastidial fatty acid biosynthesis yielding C6 chains, HaKAS III participates in at least four cycles resulting in C10 chains.
ERIC Educational Resources Information Center
Ford, Janet A.; Milosky, Linda M.
2003-01-01
Kindergarten children with language impairment (LI) and age-matched controls were asked to label facial expressions depicting various emotions and then to infer emotional reactions from stories presented either verbally, visually, or combined. Results suggest that inference errors made by children with LI during early stages of social processing…
Hsu, Chun-Nan; Lai, Jin-Mei; Liu, Chia-Hung; Tseng, Huei-Hun; Lin, Chih-Yun; Lin, Kuan-Ting; Yeh, Hsu-Hua; Sung, Ting-Yi; Hsu, Wen-Lian; Su, Li-Jen; Lee, Sheng-An; Chen, Chang-Han; Lee, Gen-Cher; Lee, DT; Shiue, Yow-Ling; Yeh, Chang-Wei; Chang, Chao-Hui; Kao, Cheng-Yan; Huang, Chi-Ying F
2007-01-01
Background The significant advances in microarray and proteomics analyses have resulted in an exponential increase in potential new targets and have promised to shed light on the identification of disease markers and cellular pathways. We aim to collect and decipher the HCC-related genes at the systems level. Results Here, we build an integrative platform, the Encyclopedia of Hepatocellular Carcinoma genes Online, dubbed EHCO , to systematically collect, organize and compare the pileup of unsorted HCC-related studies by using natural language processing and softbots. Among the eight gene set collections, ranging across PubMed, SAGE, microarray, and proteomics data, there are 2,906 genes in total; however, more than 77% genes are only included once, suggesting that tremendous efforts need to be exerted to characterize the relationship between HCC and these genes. Of these HCC inventories, protein binding represents the largest proportion (~25%) from Gene Ontology analysis. In fact, many differentially expressed gene sets in EHCO could form interaction networks (e.g. HBV-associated HCC network) by using available human protein-protein interaction datasets. To further highlight the potential new targets in the inferred network from EHCO, we combine comparative genomics and interactomics approaches to analyze 120 evolutionary conserved and overexpressed genes in HCC. 47 out of 120 queries can form a highly interactive network with 18 queries serving as hubs. Conclusion This architectural map may represent the first step toward the attempt to decipher the hepatocarcinogenesis at the systems level. Targeting hubs and/or disruption of the network formation might reveal novel strategy for HCC treatment. PMID:17326819
Locomotion in Lymphocytes is Altered by Differential PKC Isoform Expression
NASA Technical Reports Server (NTRS)
Sundaresan, A.; Risin, D.; Pellis, N. R.
1999-01-01
Lymphocyte locomotion is critical for proper elicitation of the immune response. Locomotion of immune cells via the interstitium is essential for optimal immune function during wound healing, inflammation and infection. There are conditions which alter lymphocyte locomotion and one of them is spaceflight. Lymphocyte locomotion is severely inhibited in true spaceflight (true microgravity) and in rotating wall vessel culture (modeled microgravity). When lymphocytes are activated prior to culture in modeled microgravity, locomotion is not inhibited and the levels are comparable to those of static cultured lymphocytes. When a phorbol ester (PMA) is used in modeled microgravity, lymphocyte locomotion is restored by 87%. This occurs regardless if PMA is added after culture in the rotating wall vessel or during culture. Inhibition of DNA synthesis also does not alter restoration of lymphocyte locomotion by PMA. PMA is a direct activator of (protein kinase C) PKC . When a calcium ionophore, ionomycin is used it does not possess any restorative properties towards locomotion either alone or collectively with PMA. Since PMA brings about restoration without help from calcium ionophores (ionomycin), it is infer-red that calcium independent PKC isoforms are involved. Changes were perceived in the protein levels of PKC 6 where levels of the protein were downregulated at 24,72 and 96 hours in untreated rotated cultures (modeled microgravity) compared to untreated static (1g) cultures. At 48 hours there is an increase in the levels of PKC & in the same experimental set up. Studies on transcriptional and translational patterns of calcium independent isoforms of PKC such as 8 and E are presented in this study.
Panni, Simona; Montecchi-Palazzi, Luisa; Kiemer, Lars; Cabibbo, Andrea; Paoluzi, Serena; Santonico, Elena; Landgraf, Christiane; Volkmer-Engert, Rudolf; Bachi, Angela; Castagnoli, Luisa; Cesareni, Gianni
2011-01-01
Large-scale interaction studies contribute the largest fraction of protein interactions information in databases. However, co-purification of non-specific or indirect ligands, often results in data sets that are affected by a considerable number of false positives. For the fraction of interactions mediated by short linear peptides, we present here a combined experimental and computational strategy for ranking the reliability of the inferred partners. We apply this strategy to the family of 14-3-3 domains. We have first characterized the recognition specificity of this domain family, largely confirming the results of previous analyses, while revealing new features of the preferred sequence context of 14-3-3 phospho-peptide partners. Notably, a proline next to the carboxy side of the phospho-amino acid functions as a potent inhibitor of 14-3-3 binding. The position-specific information about residue preference was encoded in a scoring matrix and two regular expressions. The integration of these three features in a single predictive model outperforms publicly available prediction tools. Next we have combined, by a naïve Bayesian approach, these "peptide features" with "protein features", such as protein co-expression and co-localization. Our approach provides an orthogonal reliability assessment and maps with high confidence the 14-3-3 peptide target on the partner proteins. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
McGlade, C J; Ellis, C; Reedijk, M; Anderson, D; Mbamalu, G; Reith, A D; Panayotou, G; End, P; Bernstein, A; Kazlauskas, A
1992-01-01
The binding of cytoplasmic signaling proteins such as phospholipase C-gamma 1 and Ras GTPase-activating protein to autophosphorylated growth factor receptors is directed by their noncatalytic Src homology region 2 (SH2) domains. The p85 alpha regulatory subunit of phosphatidylinositol (PI) 3-kinase, which associates with several receptor protein-tyrosine kinases, also contains two SH2 domains. Both p85 alpha SH2 domains, when expressed individually as fusion proteins in bacteria, bound stably to the activated beta receptor for platelet-derived growth factor (PDGF). Complex formation required PDGF stimulation and was dependent on receptor tyrosine kinase activity. The bacterial p85 alpha SH2 domains recognized activated beta PDGF receptor which had been immobilized on a filter, indicating that SH2 domains contact autophosphorylated receptors directly. Several receptor tyrosine kinases within the PDGF receptor subfamily, including the colony-stimulating factor 1 receptor and the Steel factor receptor (Kit), also associate with PI 3-kinase in vivo. Bacterially expressed SH2 domains derived from the p85 alpha subunit of PI 3-kinase bound in vitro to the activated colony-stimulating factor 1 receptor and to Kit. We infer that the SH2 domains of p85 alpha bind to high-affinity sites on these receptors, whose creation is dependent on receptor autophosphorylation. The SH2 domains of p85 are therefore primarily responsible for the binding of PI 3-kinase to activated growth factor receptors. Images PMID:1372092
Crowder, Camerron M; Meyer, Eli; Fan, Tung-Yung; Weis, Virginia M
2017-08-01
Reproductive timing in brooding corals has been correlated to temperature and lunar irradiance, but the mechanisms by which corals transduce these environmental variables into molecular signals are unknown. To gain insight into these processes, global gene expression profiles in the coral Pocillopora damicornis were examined (via RNA-Seq) across lunar phases and between temperature treatments, during a monthly planulation cycle. The interaction of temperature and lunar day together had the largest influence on gene expression. Mean timing of planulation, which occurred at lunar days 7.4 and 12.5 for 28- and 23°C-treated corals, respectively, was associated with an upregulation of transcripts in individual temperature treatments. Expression profiles of planulation-associated genes were compared between temperature treatments, revealing that elevated temperatures disrupted expression profiles associated with planulation. Gene functions inferred from homologous matches to online databases suggest complex neuropeptide signalling, with calcium as a central mediator, acting through tyrosine kinase and G protein-coupled receptor pathways. This work contributes to our understanding of coral reproductive physiology and the impacts of environmental variables on coral reproductive pathways. © 2017 John Wiley & Sons Ltd.
Pandey, Garima; Yadav, Chandra Bhan; Sahu, Pranav Pankaj; Muthamilarasan, Mehanathan; Prasad, Manoj
2017-05-01
Genome-wide methylation analysis of foxtail millet cultivars contrastingly differing in salinity tolerance revealed DNA demethylation events occurring in tolerant cultivar under salinity stress, eventually modulating the expression of stress-responsive genes. Reduced productivity and significant yield loss are the adverse effects of environmental conditions on physiological and biochemical pathways in crop plants. In this context, understanding the epigenetic machinery underlying the tolerance traits in a naturally stress tolerant crop is imperative. Foxtail millet (Setaria italica) is known for its better tolerance to abiotic stresses compared to other cereal crops. In the present study, methylation-sensitive amplified polymorphism (MSAP) technique was used to quantify the salt-induced methylation changes in two foxtail millet cultivars contrastingly differing in their tolerance levels to salt stress. The study highlighted that the DNA methylation level was significantly reduced in tolerant cultivar compared to sensitive cultivar. A total of 86 polymorphic MSAP fragments were identified, sequenced and functionally annotated. These fragments showed sequence similarity to several genes including ABC transporter, WRKY transcription factor, serine threonine-protein phosphatase, disease resistance, oxidoreductases, cell wall-related enzymes and retrotransposon and transposase like proteins, suggesting salt stress-induced methylation in these genes. Among these, four genes were chosen for expression profiling which showed differential expression pattern between both cultivars of foxtail millet. Altogether, the study infers that salinity stress induces genome-wide DNA demethylation, which in turn, modulates expression of corresponding genes.
Shen, Jianying; Zhang, Yu; Zhao, Shi; Mao, Hong; Wang, Zhongjing; Li, Honglian; Xu, Zihui
2018-05-01
Expanded hexanucleotide GGGGCC repeat in a noncoding region of C9ORF72 is the most common cause of frontotemporal dementia (FTD) and amyotrophic lateral sclerosis (ALS). However, its molecular pathogenesis remains unclear. In our previous study, the expanded GGGGCC repeats have been shown to be sufficient to cause neurodegeneration. In order to investigate the further role of expanded GGGGCC repeats in the neuron, the normal r(GGGGCC) 3 and mutant-type expanded r(GGGGCC) 30 expression vectors were transfected into Neuro-2a cells. Cell proliferation, dendrite development, and the proteins' levels of microtubule-associated protein-2 (MAP2) and cyclin-dependent kinase-5 (CDK5) were used to evaluate the cell toxicity of GGGGCC repeats on Neuro-2a cells. The results were shown that expression of expanded GGGGCC repeats caused neuronal cell toxicity in Neuro-2a cells, enhanced the expression of pMAP2 and pCDK5. Moreover, overexpression of Purα repaired expanded GGGGCC repeat-inducing neuronal toxicity in Neuro-2a cells and reduced the expression of pMAP2 and pCDK5. In all, our findings suggested that the expanded GGGGCC repeats might cause neurodegeneration through destroyed neuron cells. And the GGGGCC repeat-induced neuronal cell toxicity was inhibited by upregulation of Purα. We inferred that Purα inhibits expanded GGGGCC repeat-inducing neurodegeneration, which might reveal a novel mechanism of neurodegenerative diseases ALS and FTD.
Mathematical inference and control of molecular networks from perturbation experiments
NASA Astrophysics Data System (ADS)
Mohammed-Rasheed, Mohammed
One of the main challenges facing biologists and mathematicians in the post genomic era is to understand the behavior of molecular networks and harness this understanding into an educated intervention of the cell. The cell maintains its function via an elaborate network of interconnecting positive and negative feedback loops of genes, RNA and proteins that send different signals to a large number of pathways and molecules. These structures are referred to as genetic regulatory networks (GRNs) or molecular networks. GRNs can be viewed as dynamical systems with inherent properties and mechanisms, such as steady-state equilibriums and stability, that determine the behavior of the cell. The biological relevance of the mathematical concepts are important as they may predict the differentiation of a stem cell, the maintenance of a normal cell, the development of cancer and its aberrant behavior, and the design of drugs and response to therapy. Uncovering the underlying GRN structure from gene/protein expression data, e.g., microarrays or perturbation experiments, is called inference or reverse engineering of the molecular network. Because of the high cost and time consuming nature of biological experiments, the number of available measurements or experiments is very small compared to the number of molecules (genes, RNA and proteins). In addition, the observations are noisy, where the noise is due to the measurements imperfections as well as the inherent stochasticity of genetic expression levels. Intra-cellular activities and extra-cellular environmental attributes are also another source of variability. Thus, the inference of GRNs is, in general, an under-determined problem with a highly noisy set of observations. The ultimate goal of GRN inference and analysis is to be able to intervene within the network, in order to force it away from undesirable cellular states and into desirable ones. However, it remains a major challenge to design optimal intervention strategies in order to affect the time evolution of molecular activity in a desirable manner. In this proposal, we address both the inference and control problems of GRNs. In the first part of the thesis, we consider the control problem. We assume that we are given a general topology network structure, whose dynamics follow a discrete-time Markov chain model. We subsequently develop a comprehensive framework for optimal perturbation control of the network. The aim of the perturbation is to drive the network away from undesirable steady-states and to force it to converge to a unique desirable steady-state. The proposed framework does not make any assumptions about the topology of the initial network (e.g., ergodicity, weak and strong connectivity), and is thus applicable to general topology networks. We define the optimal perturbation as the minimum-energy perturbation measured in terms of the Frobenius norm between the initial and perturbed networks. We subsequently demonstrate that there exists at most one optimal perturbation that forces the network into the desirable steady-state. In the event where the optimal perturbation does not exist, we construct a family of sub-optimal perturbations that approximate the optimal solution arbitrarily closely. In the second part of the thesis, we address the inference problem of GRNs from time series data. We model the dynamics of the molecules using a system of ordinary differential equations corrupted by additive white noise. For large-scale networks, we formulate the inference problem as a constrained maximum likelihood estimation problem. We derive the molecular interactions that maximize the likelihood function while constraining the network to be sparse. We further propose a procedure to recover weak interactions based on the Bayesian information criterion. For small-size networks, we investigated the inference of a globally stable 7-gene melanoma genetic regulatory network from genetic perturbation experiments. We considered five melanoma cell lines, who exhibit different motility/invasion behavior under the same perturbation experiment of gene Wnt5a. The results of the simulations validate both the steady state levels and the experimental data of the perturbation experiments of all five cell lines. The goal of this study is to answer important questions that link the response of the network to perturbations, as measured by the experiments, to its structure, i.e., connectivity. Answers to these questions shed novel insights on the structure of networks and how they react to perturbations.
A Bayesian Active Learning Experimental Design for Inferring Signaling Networks.
Ness, Robert O; Sachs, Karen; Mallick, Parag; Vitek, Olga
2018-06-21
Machine learning methods for learning network structure are applied to quantitative proteomics experiments and reverse-engineer intracellular signal transduction networks. They provide insight into the rewiring of signaling within the context of a disease or a phenotype. To learn the causal patterns of influence between proteins in the network, the methods require experiments that include targeted interventions that fix the activity of specific proteins. However, the interventions are costly and add experimental complexity. We describe an active learning strategy for selecting optimal interventions. Our approach takes as inputs pathway databases and historic data sets, expresses them in form of prior probability distributions on network structures, and selects interventions that maximize their expected contribution to structure learning. Evaluations on simulated and real data show that the strategy reduces the detection error of validated edges as compared with an unguided choice of interventions and avoids redundant interventions, thereby increasing the effectiveness of the experiment.
Exploring Wound-Healing Genomic Machinery with a Network-Based Approach
Vitali, Francesca; Marini, Simone; Balli, Martina; Grosemans, Hanne; Sampaolesi, Maurilio; Lussier, Yves A.; Cusella De Angelis, Maria Gabriella; Bellazzi, Riccardo
2017-01-01
The molecular mechanisms underlying tissue regeneration and wound healing are still poorly understood despite their importance. In this paper we develop a bioinformatics approach, combining biology and network theory to drive experiments for better understanding the genetic underpinnings of wound healing mechanisms and for selecting potential drug targets. We start by selecting literature-relevant genes in murine wound healing, and inferring from them a Protein-Protein Interaction (PPI) network. Then, we analyze the network to rank wound healing-related genes according to their topological properties. Lastly, we perform a procedure for in-silico simulation of a treatment action in a biological pathway. The findings obtained by applying the developed pipeline, including gene expression analysis, confirms how a network-based bioinformatics method is able to prioritize candidate genes for in vitro analysis, thus speeding up the understanding of molecular mechanisms and supporting the discovery of potential drug targets. PMID:28635674
Chen, Bo; Chen, Minhua; Paisley, John; Zaas, Aimee; Woods, Christopher; Ginsburg, Geoffrey S; Hero, Alfred; Lucas, Joseph; Dunson, David; Carin, Lawrence
2010-11-09
Nonparametric Bayesian techniques have been developed recently to extend the sophistication of factor models, allowing one to infer the number of appropriate factors from the observed data. We consider such techniques for sparse factor analysis, with application to gene-expression data from three virus challenge studies. Particular attention is placed on employing the Beta Process (BP), the Indian Buffet Process (IBP), and related sparseness-promoting techniques to infer a proper number of factors. The posterior density function on the model parameters is computed using Gibbs sampling and variational Bayesian (VB) analysis. Time-evolving gene-expression data are considered for respiratory syncytial virus (RSV), Rhino virus, and influenza, using blood samples from healthy human subjects. These data were acquired in three challenge studies, each executed after receiving institutional review board (IRB) approval from Duke University. Comparisons are made between several alternative means of per-forming nonparametric factor analysis on these data, with comparisons as well to sparse-PCA and Penalized Matrix Decomposition (PMD), closely related non-Bayesian approaches. Applying the Beta Process to the factor scores, or to the singular values of a pseudo-SVD construction, the proposed algorithms infer the number of factors in gene-expression data. For real data the "true" number of factors is unknown; in our simulations we consider a range of noise variances, and the proposed Bayesian models inferred the number of factors accurately relative to other methods in the literature, such as sparse-PCA and PMD. We have also identified a "pan-viral" factor of importance for each of the three viruses considered in this study. We have identified a set of genes associated with this pan-viral factor, of interest for early detection of such viruses based upon the host response, as quantified via gene-expression data.
Dynamic modelling of microRNA regulation during mesenchymal stem cell differentiation.
Weber, Michael; Sotoca, Ana M; Kupfer, Peter; Guthke, Reinhard; van Zoelen, Everardus J
2013-11-12
Network inference from gene expression data is a typical approach to reconstruct gene regulatory networks. During chondrogenic differentiation of human mesenchymal stem cells (hMSCs), a complex transcriptional network is active and regulates the temporal differentiation progress. As modulators of transcriptional regulation, microRNAs (miRNAs) play a critical role in stem cell differentiation. Integrated network inference aimes at determining interrelations between miRNAs and mRNAs on the basis of expression data as well as miRNA target predictions. We applied the NetGenerator tool in order to infer an integrated gene regulatory network. Time series experiments were performed to measure mRNA and miRNA abundances of TGF-beta1+BMP2 stimulated hMSCs. Network nodes were identified by analysing temporal expression changes, miRNA target gene predictions, time series correlation and literature knowledge. Network inference was performed using NetGenerator to reconstruct a dynamical regulatory model based on the measured data and prior knowledge. The resulting model is robust against noise and shows an optimal trade-off between fitting precision and inclusion of prior knowledge. It predicts the influence of miRNAs on the expression of chondrogenic marker genes and therefore proposes novel regulatory relations in differentiation control. By analysing the inferred network, we identified a previously unknown regulatory effect of miR-524-5p on the expression of the transcription factor SOX9 and the chondrogenic marker genes COL2A1, ACAN and COL10A1. Genome-wide exploration of miRNA-mRNA regulatory relationships is a reasonable approach to identify miRNAs which have so far not been associated with the investigated differentiation process. The NetGenerator tool is able to identify valid gene regulatory networks on the basis of miRNA and mRNA time series data.
Naegle, Kristen M; Welsch, Roy E; Yaffe, Michael B; White, Forest M; Lauffenburger, Douglas A
2011-07-01
Advances in proteomic technologies continue to substantially accelerate capability for generating experimental data on protein levels, states, and activities in biological samples. For example, studies on receptor tyrosine kinase signaling networks can now capture the phosphorylation state of hundreds to thousands of proteins across multiple conditions. However, little is known about the function of many of these protein modifications, or the enzymes responsible for modifying them. To address this challenge, we have developed an approach that enhances the power of clustering techniques to infer functional and regulatory meaning of protein states in cell signaling networks. We have created a new computational framework for applying clustering to biological data in order to overcome the typical dependence on specific a priori assumptions and expert knowledge concerning the technical aspects of clustering. Multiple clustering analysis methodology ('MCAM') employs an array of diverse data transformations, distance metrics, set sizes, and clustering algorithms, in a combinatorial fashion, to create a suite of clustering sets. These sets are then evaluated based on their ability to produce biological insights through statistical enrichment of metadata relating to knowledge concerning protein functions, kinase substrates, and sequence motifs. We applied MCAM to a set of dynamic phosphorylation measurements of the ERRB network to explore the relationships between algorithmic parameters and the biological meaning that could be inferred and report on interesting biological predictions. Further, we applied MCAM to multiple phosphoproteomic datasets for the ERBB network, which allowed us to compare independent and incomplete overlapping measurements of phosphorylation sites in the network. We report specific and global differences of the ERBB network stimulated with different ligands and with changes in HER2 expression. Overall, we offer MCAM as a broadly-applicable approach for analysis of proteomic data which may help increase the current understanding of molecular networks in a variety of biological problems. © 2011 Naegle et al.
Statistical Inferences from Formaldehyde Dna-Protein Cross-Link Data
Physiologically-based pharmacokinetic (PBPK) modeling has reached considerable sophistication in its application in the pharmacological and environmental health areas. Yet, mature methodologies for making statistical inferences have not been routinely incorporated in these applic...
Construction of a cDNA microarray derived from the ascidian Ciona intestinalis.
Azumi, Kaoru; Takahashi, Hiroki; Miki, Yasufumi; Fujie, Manabu; Usami, Takeshi; Ishikawa, Hisayoshi; Kitayama, Atsusi; Satou, Yutaka; Ueno, Naoto; Satoh, Nori
2003-10-01
A cDNA microarray was constructed from a basal chordate, the ascidian Ciona intestinalis. The draft genome of Ciona has been read and inferred to contain approximately 16,000 protein-coding genes, and cDNAs for transcripts of 13,464 genes have been characterized and compiled as the "Ciona intestinalis Gene Collection Release I". In the present study, we constructed a cDNA microarray of these 13,464 Ciona genes. A preliminary experiment with Cy3- and Cy5-labeled probes showed extensive differential gene expression between fertilized eggs and larvae. In addition, there was a good correlation between results obtained by the present microarray analysis and those from previous EST analyses. This first microarray of a large collection of Ciona intestinalis cDNA clones should facilitate the analysis of global gene expression and gene networks during the embryogenesis of basal chordates.
Computational analysis and functional expression of ancestral copepod luciferase.
Takenaka, Yasuhiro; Noda-Ogura, Akiko; Imanishi, Tadashi; Yamaguchi, Atsushi; Gojobori, Takashi; Shigeri, Yasushi
2013-10-10
We recently reported the cDNA sequences of 11 copepod luciferases from the superfamily Augaptiloidea in the order Calanoida. They were classified into two groups, Metridinidae and Heterorhabdidae/Lucicutiidae families, by phylogenetic analyses. To elucidate the evolutionary processes, we have now further isolated 12 copepod luciferases from Augaptiloidea species (Metridia asymmetrica, Metridia curticauda, Pleuromamma scutullata, Pleuromamma xiphias, Lucicutia ovaliformis and Heterorhabdus tanneri). Codon-based synonymous/nonsynonymous tests of positive selection for 25 identified copepod luciferases suggested that positive Darwinian selection operated in the evolution of Heterorhabdidae luciferases, whereas two types of Metridinidae luciferases had diversified via neutral mechanism. By in silico analysis of the decoded amino acid sequences of 25 copepod luciferases, we inferred two protein sequences as ancestral copepod luciferases. They were expressed in HEK293 cells where they exhibited notable luciferase activity both in intracellular lysates and cultured media, indicating that the luciferase activity was established before evolutionary diversification of these copepod species. © 2013.
Integration of multi-omics data for integrative gene regulatory network inference.
Zarayeneh, Neda; Ko, Euiseong; Oh, Jung Hun; Suh, Sang; Liu, Chunyu; Gao, Jean; Kim, Donghyun; Kang, Mingon
2017-01-01
Gene regulatory networks provide comprehensive insights and indepth understanding of complex biological processes. The molecular interactions of gene regulatory networks are inferred from a single type of genomic data, e.g., gene expression data in most research. However, gene expression is a product of sequential interactions of multiple biological processes, such as DNA sequence variations, copy number variations, histone modifications, transcription factors, and DNA methylations. The recent rapid advances of high-throughput omics technologies enable one to measure multiple types of omics data, called 'multi-omics data', that represent the various biological processes. In this paper, we propose an Integrative Gene Regulatory Network inference method (iGRN) that incorporates multi-omics data and their interactions in gene regulatory networks. In addition to gene expressions, copy number variations and DNA methylations were considered for multi-omics data in this paper. The intensive experiments were carried out with simulation data, where iGRN's capability that infers the integrative gene regulatory network is assessed. Through the experiments, iGRN shows its better performance on model representation and interpretation than other integrative methods in gene regulatory network inference. iGRN was also applied to a human brain dataset of psychiatric disorders, and the biological network of psychiatric disorders was analysed.
Integration of multi-omics data for integrative gene regulatory network inference
Zarayeneh, Neda; Ko, Euiseong; Oh, Jung Hun; Suh, Sang; Liu, Chunyu; Gao, Jean; Kim, Donghyun
2017-01-01
Gene regulatory networks provide comprehensive insights and indepth understanding of complex biological processes. The molecular interactions of gene regulatory networks are inferred from a single type of genomic data, e.g., gene expression data in most research. However, gene expression is a product of sequential interactions of multiple biological processes, such as DNA sequence variations, copy number variations, histone modifications, transcription factors, and DNA methylations. The recent rapid advances of high-throughput omics technologies enable one to measure multiple types of omics data, called ‘multi-omics data’, that represent the various biological processes. In this paper, we propose an Integrative Gene Regulatory Network inference method (iGRN) that incorporates multi-omics data and their interactions in gene regulatory networks. In addition to gene expressions, copy number variations and DNA methylations were considered for multi-omics data in this paper. The intensive experiments were carried out with simulation data, where iGRN’s capability that infers the integrative gene regulatory network is assessed. Through the experiments, iGRN shows its better performance on model representation and interpretation than other integrative methods in gene regulatory network inference. iGRN was also applied to a human brain dataset of psychiatric disorders, and the biological network of psychiatric disorders was analysed. PMID:29354189
Protein-driven inference of miRNA–disease associations
Mørk, Søren; Pletscher-Frankild, Sune; Palleja Caro, Albert; Gorodkin, Jan; Jensen, Lars Juhl
2014-01-01
Motivation: MicroRNAs (miRNAs) are a highly abundant class of non-coding RNA genes involved in cellular regulation and thus also diseases. Despite miRNAs being important disease factors, miRNA–disease associations remain low in number and of variable reliability. Furthermore, existing databases and prediction methods do not explicitly facilitate forming hypotheses about the possible molecular causes of the association, thereby making the path to experimental follow-up longer. Results: Here we present miRPD in which miRNA–Protein–Disease associations are explicitly inferred. Besides linking miRNAs to diseases, it directly suggests the underlying proteins involved, which can be used to form hypotheses that can be experimentally tested. The inference of miRNAs and diseases is made by coupling known and predicted miRNA–protein associations with protein–disease associations text mined from the literature. We present scoring schemes that allow us to rank miRNA–disease associations inferred from both curated and predicted miRNA targets by reliability and thereby to create high- and medium-confidence sets of associations. Analyzing these, we find statistically significant enrichment for proteins involved in pathways related to cancer and type I diabetes mellitus, suggesting either a literature bias or a genuine biological trend. We show by example how the associations can be used to extract proteins for disease hypothesis. Availability and implementation: All datasets, software and a searchable Web site are available at http://mirpd.jensenlab.org. Contact: lars.juhl.jensen@cpr.ku.dk or gorodkin@rth.dk PMID:24273243
Inference of gene regulatory networks from time series by Tsallis entropy
2011-01-01
Background The inference of gene regulatory networks (GRNs) from large-scale expression profiles is one of the most challenging problems of Systems Biology nowadays. Many techniques and models have been proposed for this task. However, it is not generally possible to recover the original topology with great accuracy, mainly due to the short time series data in face of the high complexity of the networks and the intrinsic noise of the expression measurements. In order to improve the accuracy of GRNs inference methods based on entropy (mutual information), a new criterion function is here proposed. Results In this paper we introduce the use of generalized entropy proposed by Tsallis, for the inference of GRNs from time series expression profiles. The inference process is based on a feature selection approach and the conditional entropy is applied as criterion function. In order to assess the proposed methodology, the algorithm is applied to recover the network topology from temporal expressions generated by an artificial gene network (AGN) model as well as from the DREAM challenge. The adopted AGN is based on theoretical models of complex networks and its gene transference function is obtained from random drawing on the set of possible Boolean functions, thus creating its dynamics. On the other hand, DREAM time series data presents variation of network size and its topologies are based on real networks. The dynamics are generated by continuous differential equations with noise and perturbation. By adopting both data sources, it is possible to estimate the average quality of the inference with respect to different network topologies, transfer functions and network sizes. Conclusions A remarkable improvement of accuracy was observed in the experimental results by reducing the number of false connections in the inferred topology by the non-Shannon entropy. The obtained best free parameter of the Tsallis entropy was on average in the range 2.5 ≤ q ≤ 3.5 (hence, subextensive entropy), which opens new perspectives for GRNs inference methods based on information theory and for investigation of the nonextensivity of such networks. The inference algorithm and criterion function proposed here were implemented and included in the DimReduction software, which is freely available at http://sourceforge.net/projects/dimreduction and http://code.google.com/p/dimreduction/. PMID:21545720
da Silva Lima, Fabiana; Rogero, Marcelo Macedo; Ramos, Mayara Caldas; Borelli, Primavera; Fock, Ricardo Ambrósio
2013-06-01
Protein malnutrition affects resistance to infection by impairing the inflammatory response, modifying the function of effector cells, such as macrophages. Recent studies have revealed that glutamine-a non-essential amino acid, which could become conditionally essential in some situations like trauma, infection, post-surgery and sepsis-is able to modulate the synthesis of cytokines. The aim of this study was to evaluate the effect of glutamine on the expression of proteins involved in the nuclear factor-kappa B (NF-κB) signalling pathway of peritoneal macrophages from malnourished mice. Two-month-old male Balb/c mice were submitted to protein-energy malnutrition (n = 10) with a low-protein diet containing 2 % protein, whereas control mice (n = 10) were fed a 12 % protein-containing diet. The haemogram and analysis of plasma glutamine and corticosterone were evaluated. Peritoneal macrophages were pre-treated in vitro with glutamine (0, 0.6, 2 and 10 mmol/L) for 24 h and then stimulated with 1.25 μg LPS for 30 min, and the synthesis of TNF-α and IL-1α and the expression of proteins related to the NF-κB pathway were evaluated. Malnourished animals had anaemia, leucopoenia, lower plasma glutamine and increased corticosterone levels. TNF-α production of macrophages stimulated with LPS was significantly lower in cells from malnourished animals when cultivated in supraphysiological (2 and 10 mmol/L) concentrations of glutamine. Further, glutamine has a dose-dependent effect on the activation of macrophages, in both groups, when stimulated with LPS, inducing a decrease in TNF-α and IL-1α production and negatively modulating the NF-κB signalling pathway. These data lead us to infer that the protein malnutrition state interferes with the activation of macrophages and that higher glutamine concentrations, in vitro, have the capacity to act negatively in the NF-κB signalling pathway.
An atlas of gene expression and gene co-regulation in the human retina.
Pinelli, Michele; Carissimo, Annamaria; Cutillo, Luisa; Lai, Ching-Hung; Mutarelli, Margherita; Moretti, Maria Nicoletta; Singh, Marwah Veer; Karali, Marianthi; Carrella, Diego; Pizzo, Mariateresa; Russo, Francesco; Ferrari, Stefano; Ponzin, Diego; Angelini, Claudia; Banfi, Sandro; di Bernardo, Diego
2016-07-08
The human retina is a specialized tissue involved in light stimulus transduction. Despite its unique biology, an accurate reference transcriptome is still missing. Here, we performed gene expression analysis (RNA-seq) of 50 retinal samples from non-visually impaired post-mortem donors. We identified novel transcripts with high confidence (Observed Transcriptome (ObsT)) and quantified the expression level of known transcripts (Reference Transcriptome (RefT)). The ObsT included 77 623 transcripts (23 960 genes) covering 137 Mb (35 Mb new transcribed genome). Most of the transcripts (92%) were multi-exonic: 81% with known isoforms, 16% with new isoforms and 3% belonging to new genes. The RefT included 13 792 genes across 94 521 known transcripts. Mitochondrial genes were among the most highly expressed, accounting for about 10% of the reads. Of all the protein-coding genes in Gencode, 65% are expressed in the retina. We exploited inter-individual variability in gene expression to infer a gene co-expression network and to identify genes specifically expressed in photoreceptor cells. We experimentally validated the photoreceptors localization of three genes in human retina that had not been previously reported. RNA-seq data and the gene co-expression network are available online (http://retina.tigem.it). © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Stewart, Suzanne L K; Schepman, Astrid; Haigh, Matthew; McHugh, Rhian; Stewart, Andrew J
2018-03-14
The recognition of emotional facial expressions is often subject to contextual influence, particularly when the face and the context convey similar emotions. We investigated whether spontaneous, incidental affective theory of mind inferences made while reading vignettes describing social situations would produce context effects on the identification of same-valenced emotions (Experiment 1) as well as differently-valenced emotions (Experiment 2) conveyed by subsequently presented faces. Crucially, we found an effect of context on reaction times in both experiments while, in line with previous work, we found evidence for a context effect on accuracy only in Experiment 1. This demonstrates that affective theory of mind inferences made at the pragmatic level of a text can automatically, contextually influence the perceptual processing of emotional facial expressions in a separate task even when those emotions are of a distinctive valence. Thus, our novel findings suggest that language acts as a contextual influence to the recognition of emotional facial expressions for both same and different valences.
Shariff, Azim F; Tracy, Jessica L; Markusoff, Jeffrey L
2012-09-01
How do we decide who merits social status? According to functionalist theories of emotion, the nonverbal expressions of pride and shame play a key role, functioning as automatically perceived status signals. In this view, observers automatically make status inferences about expressers on the basis of these expressions, even when contradictory contextual information about the expressers' status is available. In four studies, the authors tested whether implicit and explicit status perceptions are influenced by pride and shame expressions even when these expressions' status-related messages are contradicted by contextual information. Results indicate that emotion expressions powerfully influence implicit and explicit status inferences, at times neutralizing or even overriding situational knowledge. These findings demonstrate the irrepressible communicative power of emotion displays and indicate that status judgments can be informed as much (and often more) by automatic responses to nonverbal expressions of emotion as by rational, contextually bound knowledge.
Inferring Time-Varying Network Topologies from Gene Expression Data
2007-01-01
Most current methods for gene regulatory network identification lead to the inference of steady-state networks, that is, networks prevalent over all times, a hypothesis which has been challenged. There has been a need to infer and represent networks in a dynamic, that is, time-varying fashion, in order to account for different cellular states affecting the interactions amongst genes. In this work, we present an approach, regime-SSM, to understand gene regulatory networks within such a dynamic setting. The approach uses a clustering method based on these underlying dynamics, followed by system identification using a state-space model for each learnt cluster—to infer a network adjacency matrix. We finally indicate our results on the mouse embryonic kidney dataset as well as the T-cell activation-based expression dataset and demonstrate conformity with reported experimental evidence. PMID:18309363
Inferring time-varying network topologies from gene expression data.
Rao, Arvind; Hero, Alfred O; States, David J; Engel, James Douglas
2007-01-01
Most current methods for gene regulatory network identification lead to the inference of steady-state networks, that is, networks prevalent over all times, a hypothesis which has been challenged. There has been a need to infer and represent networks in a dynamic, that is, time-varying fashion, in order to account for different cellular states affecting the interactions amongst genes. In this work, we present an approach, regime-SSM, to understand gene regulatory networks within such a dynamic setting. The approach uses a clustering method based on these underlying dynamics, followed by system identification using a state-space model for each learnt cluster--to infer a network adjacency matrix. We finally indicate our results on the mouse embryonic kidney dataset as well as the T-cell activation-based expression dataset and demonstrate conformity with reported experimental evidence.
Inferring subunit stoichiometry from single molecule photobleaching
2013-01-01
Single molecule photobleaching is a powerful tool for determining the stoichiometry of protein complexes. By attaching fluorophores to proteins of interest, the number of associated subunits in a complex can be deduced by imaging single molecules and counting fluorophore photobleaching steps. Because some bleaching steps might be unobserved, the ensemble of steps will be binomially distributed. In this work, it is shown that inferring the true composition of a complex from such data is nontrivial because binomially distributed observations present an ill-posed inference problem. That is, a unique and optimal estimate of the relevant parameters cannot be extracted from the observations. Because of this, a method has not been firmly established to quantify confidence when using this technique. This paper presents a general inference model for interpreting such data and provides methods for accurately estimating parameter confidence. The formalization and methods presented here provide a rigorous analytical basis for this pervasive experimental tool. PMID:23712552
Tian, Xia; Liu, Meng; Zhu, Qingxi; Tan, Jie; Liu, Weijie; Wang, Yanfen; Chen, Wei; Zou, Yanli; Cai, Yishan; Han, Zheng; Huang, Xiaodong
2017-09-01
The aim of the present study was to explore the signaling pathway of noscapine which induces apoptosis by blocking liver-intestine cadherin (CDH17) gene in colon cancer SW480 cells. Human colon cancer SW480 cells were transfected with CDH17 interference vector and treatment with 10 µmol/L noscapine. The proliferation and apoptosis of SW480 cells were detected by MTT assay and AnnexinV-FITC/PI flow cytometry kit (BD), respectively. Cell invasion were assessed by transwell assays. Apoptosis related proteins (Cyt-c, Bax, Bcl-2 and Bcl-xL) levels were evaluated by western blot. Compared to the noscapine group, the proliferation was decreased significantly and the apoptosis was increased significantly in SW480 cells of the siCDH17+noscapine group. Cyt-c and Bax protein levels in siCDH17+noscapine group was higher than that of the noscapine group, but Bcl-2 and Bcl-xL protein levels in siCDH17+noscapine group were lower than that of the noscapine group. Moreover, up-expression of CDH17 inhibited the efficacy of noscapine-induced apoptosis in SW480 cells. We inferred that down-expression of extrinsic CDH17 gene can conspicuously promote apoptosis-inducing effects of noscapine on human colon cancer SW480 cells, which is a novel strategy to improve chemotherapeutic effects on colon cancer.
Malmström, Erik; Kilsgård, Ola; Hauri, Simon; Smeds, Emanuel; Herwald, Heiko; Malmström, Lars; Malmström, Johan
2016-01-01
The plasma proteome is highly dynamic and variable, composed of proteins derived from surrounding tissues and cells. To investigate the complex processes that control the composition of the plasma proteome, we developed a mass spectrometry-based proteomics strategy to infer the origin of proteins detected in murine plasma. The strategy relies on the construction of a comprehensive protein tissue atlas from cells and highly vascularized organs using shotgun mass spectrometry. The protein tissue atlas was transformed to a spectral library for highly reproducible quantification of tissue-specific proteins directly in plasma using SWATH-like data-independent mass spectrometry analysis. We show that the method can determine drastic changes of tissue-specific protein profiles in blood plasma from mouse animal models with sepsis. The strategy can be extended to several other species advancing our understanding of the complex processes that contribute to the plasma proteome dynamics. PMID:26732734
Zheng, Tingting; Ni, Yueqiong; Li, Jun; Chow, Billy K. C.; Panagiotou, Gianni
2017-01-01
Background: A range of computational methods that rely on the analysis of genome-wide expression datasets have been developed and successfully used for drug repositioning. The success of these methods is based on the hypothesis that introducing a factor (in this case, a drug molecule) that could reverse the disease gene expression signature will lead to a therapeutic effect. However, it has also been shown that globally reversing the disease expression signature is not a prerequisite for drug activity. On the other hand, the basic idea of significant anti-correlation in expression profiles could have great value for establishing diet-disease associations and could provide new insights into the role of dietary interventions in disease. Methods: We performed an integrated analysis of publicly available gene expression profiles for foods, diseases and drugs, by calculating pairwise similarity scores for diet and disease gene expression signatures and characterizing their topological features in protein-protein interaction networks. Results: We identified 485 diet-disease pairs where diet could positively influence disease development and 472 pairs where specific diets should be avoided in a disease state. Multiple evidence suggests that orange, whey and coconut fat could be beneficial for psoriasis, lung adenocarcinoma and macular degeneration, respectively. On the other hand, fructose-rich diet should be restricted in patients with chronic intermittent hypoxia and ovarian cancer. Since humans normally do not consume foods in isolation, we also applied different algorithms to predict synergism; as a result, 58 food pairs were predicted. Interestingly, the diets identified as anti-correlated with diseases showed a topological proximity to the disease proteins similar to that of the corresponding drugs. Conclusions: In conclusion, we provide a computational framework for establishing diet-disease associations and additional information on the role of diet in disease development. Due to the complexity of analyzing the food composition and eating patterns of individuals our in silico analysis, using large-scale gene expression datasets and network-based topological features, may serve as a proof-of-concept in nutritional systems biology for identifying diet-disease relationships and subsequently designing dietary recommendations. PMID:29033850
Plaschke, Jens; Krüger, Stefan; Jeske, Birgit; Theissig, Franz; Kreuz, Friedmar R; Pistorius, Steffen; Saeger, Hans D; Iaccarino, Ingram; Marra, Giancarlo; Schackert, Hans K
2004-02-01
Mononucleotide repeat sequences are particularly prone to frameshift mutations in tumors with biallelic inactivation of the mismatch repair (MMR) genes MLH1 or MSH2. In these tumors, several genes harboring mononucleotide repeats in their coding region have been proposed as targets involved in tumor progression, among which are also the MMR genes MSH3 and MSH6. We have analyzed the expression of the MSH3 and MSH6 proteins by immunohistochemistry in 31 colorectal carcinomas in which MLH1 was inactivated. Loss of MSH3 expression was identified in 15 tumors (48.5%), whereas all tumors expressed MSH6. Frameshift mutations at coding microsatellites were more frequent in MSH3 (16 of 31) than in MSH6 (3 of 31; Fisher's exact test, P < 0.001). Frameshift mutations and allelic losses of MSH3 were more frequent in MSH3-negative tumors compared with those with normal expression (22 mutations in 30 alleles versus 8 mutations in 28 alleles; chi(2), P = 0.001). Biallelic inactivation was evident or inferred for 60% of MSH3-negative tumors but none of the tumors with normal MSH3 expression. In contrast, we did not identify frameshift mutations in the (A)8 tract of MSH3 in a control group of 18 colorectal carcinomas in which the MMR deficiency was based on the inactivation of MSH2. As it has been suggested that mutations of MSH3 might play a role in tumor progression, we studied the association between MSH3 expression and disease stage assessed by lymph node and distant metastases status. Dukes stages C and D were more frequent in primary tumors with loss of MSH3 expression (9 of 13), compared with tumors with retained expression (1 of 14; Fisher's exact test, P = 0.001), suggesting that MSH3 abrogation may be a predictor of metastatic disease or even favor tumor cell spread in MLH1-deficient colorectal cancers.
How to talk about protein‐level false discovery rates in shotgun proteomics
The, Matthew; Tasnim, Ayesha
2016-01-01
A frequently sought output from a shotgun proteomics experiment is a list of proteins that we believe to have been present in the analyzed sample before proteolytic digestion. The standard technique to control for errors in such lists is to enforce a preset threshold for the false discovery rate (FDR). Many consider protein‐level FDRs a difficult and vague concept, as the measurement entities, spectra, are manifestations of peptides and not proteins. Here, we argue that this confusion is unnecessary and provide a framework on how to think about protein‐level FDRs, starting from its basic principle: the null hypothesis. Specifically, we point out that two competing null hypotheses are used concurrently in today's protein inference methods, which has gone unnoticed by many. Using simulations of a shotgun proteomics experiment, we show how confusing one null hypothesis for the other can lead to serious discrepancies in the FDR. Furthermore, we demonstrate how the same simulations can be used to verify FDR estimates of protein inference methods. In particular, we show that, for a simple protein inference method, decoy models can be used to accurately estimate protein‐level FDRs for both competing null hypotheses. PMID:27503675
Nagaraj, Shivashankar H.; Gasser, Robin B.; Ranganathan, Shoba
2008-01-01
Background Parasitic nematodes of humans, other animals and plants continue to impose a significant public health and economic burden worldwide, due to the diseases they cause. Promising antiparasitic drug and vaccine candidates have been discovered from excreted or secreted (ES) proteins released from the parasite and exposed to the immune system of the host. Mining the entire expressed sequence tag (EST) data available from parasitic nematodes represents an approach to discover such ES targets. Methods and Findings In this study, we predicted, using EST2Secretome, a novel, high-throughput, computational workflow system, 4,710 ES proteins from 452,134 ESTs derived from 39 different species of nematodes, parasitic in animals (including humans) or plants. In total, 2,632, 786, and 1,292 ES proteins were predicted for animal-, human-, and plant-parasitic nematodes. Subsequently, we systematically analysed ES proteins using computational methods. Of these 4,710 proteins, 2,490 (52.8%) had orthologues in Caenorhabditis elegans, whereas 621 (13.8%) appeared to be novel, currently having no significant match to any molecule available in public databases. Of the C. elegans homologues, 267 had strong “loss-of-function” phenotypes by RNA interference (RNAi) in this nematode. We could functionally classify 1,948 (41.3%) sequences using the Gene Ontology (GO) terms, establish pathway associations for 573 (12.2%) sequences using Kyoto Encyclopaedia of Genes and Genomes (KEGG), and identify protein interaction partners for 1,774 (37.6%) molecules. We also mapped 758 (16.1%) proteins to protein domains including the nematode-specific protein family “transthyretin-like” and “chromadorea ALT,” considered as vaccine candidates against filariasis in humans. Conclusions We report the large-scale analysis of ES proteins inferred from EST data for a range of parasitic nematodes. This set of ES proteins provides an inventory of known and novel members of ES proteins as a foundation for studies focused on understanding the biology of parasitic nematodes and their interactions with their hosts, as well as for the development of novel drugs or vaccines for parasite intervention and control. PMID:18820748
We and others have shown that transition and maintenance of biological states is controlled by master regulator proteins, which can be inferred by interrogating tissue-specific regulatory models (interactomes) with transcriptional signatures, using the VIPER algorithm. Yet, some tissues may lack molecular profiles necessary for interactome inference (orphan tissues), or, as for single cells isolated from heterogeneous samples, their tissue context may be undetermined.
QuASAR: quantitative allele-specific analysis of reads
Harvey, Chris T.; Moyerbrailean, Gregory A.; Davis, Gordon O.; Wen, Xiaoquan; Luca, Francesca; Pique-Regi, Roger
2015-01-01
Motivation: Expression quantitative trait loci (eQTL) studies have discovered thousands of genetic variants that regulate gene expression, enabling a better understanding of the functional role of non-coding sequences. However, eQTL studies are costly, requiring large sample sizes and genome-wide genotyping of each sample. In contrast, analysis of allele-specific expression (ASE) is becoming a popular approach to detect the effect of genetic variation on gene expression, even within a single individual. This is typically achieved by counting the number of RNA-seq reads matching each allele at heterozygous sites and testing the null hypothesis of a 1:1 allelic ratio. In principle, when genotype information is not readily available, it could be inferred from the RNA-seq reads directly. However, there are currently no existing methods that jointly infer genotypes and conduct ASE inference, while considering uncertainty in the genotype calls. Results: We present QuASAR, quantitative allele-specific analysis of reads, a novel statistical learning method for jointly detecting heterozygous genotypes and inferring ASE. The proposed ASE inference step takes into consideration the uncertainty in the genotype calls, while including parameters that model base-call errors in sequencing and allelic over-dispersion. We validated our method with experimental data for which high-quality genotypes are available. Results for an additional dataset with multiple replicates at different sequencing depths demonstrate that QuASAR is a powerful tool for ASE analysis when genotypes are not available. Availability and implementation: http://github.com/piquelab/QuASAR. Contact: fluca@wayne.edu or rpique@wayne.edu Supplementary information: Supplementary Material is available at Bioinformatics online. PMID:25480375
van Doorn, Evert A.; van Kleef, Gerben A.; van der Pligt, Joop
2015-01-01
Emotional expressions constitute a rich source of information. Integrating theorizing on attribution, appraisal processes, and the use of emotions as social information, we examined how emotional expressions influence attributions of agency and responsibility under conditions of ambiguity. Three vignette studies involving different scenarios indicate that participants used information about others’ emotional expressions to make sense of ambiguous social situations. Expressions of regret fueled inferences that the expresser was responsible for an adverse situation, whereas expressions of anger fueled inferences that someone else was responsible. Also, expressions of anger were interpreted as a sign of injustice, and expressions of disappointment increased prosocial intentions (i.e., to help the expresser). The results show that emotional expressions can help people understand ambiguous social situations by informing attributions that correspond with each emotion’s associated appraisal structures. The findings advance understanding of the ways in which emotional expressions help individuals understand and coordinate social life. PMID:26284001
Functional networks inference from rule-based machine learning models.
Lazzarini, Nicola; Widera, Paweł; Williamson, Stuart; Heer, Rakesh; Krasnogor, Natalio; Bacardit, Jaume
2016-01-01
Functional networks play an important role in the analysis of biological processes and systems. The inference of these networks from high-throughput (-omics) data is an area of intense research. So far, the similarity-based inference paradigm (e.g. gene co-expression) has been the most popular approach. It assumes a functional relationship between genes which are expressed at similar levels across different samples. An alternative to this paradigm is the inference of relationships from the structure of machine learning models. These models are able to capture complex relationships between variables, that often are different/complementary to the similarity-based methods. We propose a protocol to infer functional networks from machine learning models, called FuNeL. It assumes, that genes used together within a rule-based machine learning model to classify the samples, might also be functionally related at a biological level. The protocol is first tested on synthetic datasets and then evaluated on a test suite of 8 real-world datasets related to human cancer. The networks inferred from the real-world data are compared against gene co-expression networks of equal size, generated with 3 different methods. The comparison is performed from two different points of view. We analyse the enriched biological terms in the set of network nodes and the relationships between known disease-associated genes in a context of the network topology. The comparison confirms both the biological relevance and the complementary character of the knowledge captured by the FuNeL networks in relation to similarity-based methods and demonstrates its potential to identify known disease associations as core elements of the network. Finally, using a prostate cancer dataset as a case study, we confirm that the biological knowledge captured by our method is relevant to the disease and consistent with the specialised literature and with an independent dataset not used in the inference process. The implementation of our network inference protocol is available at: http://ico2s.org/software/funel.html.
De novo inference of protein function from coarse-grained dynamics.
Bhadra, Pratiti; Pal, Debnath
2014-10-01
Inference of molecular function of proteins is the fundamental task in the quest for understanding cellular processes. The task is getting increasingly difficult with thousands of new proteins discovered each day. The difficulty arises primarily due to lack of high-throughput experimental technique for assessing protein molecular function, a lacunae that computational approaches are trying hard to fill. The latter too faces a major bottleneck in absence of clear evidence based on evolutionary information. Here we propose a de novo approach to annotate protein molecular function through structural dynamics match for a pair of segments from two dissimilar proteins, which may share even <10% sequence identity. To screen these matches, corresponding 1 µs coarse-grained (CG) molecular dynamics trajectories were used to compute normalized root-mean-square-fluctuation graphs and select mobile segments, which were, thereafter, matched for all pairs using unweighted three-dimensional autocorrelation vectors. Our in-house custom-built forcefield (FF), extensively validated against dynamics information obtained from experimental nuclear magnetic resonance data, was specifically used to generate the CG dynamics trajectories. The test for correspondence of dynamics-signature of protein segments and function revealed 87% true positive rate and 93.5% true negative rate, on a dataset of 60 experimentally validated proteins, including moonlighting proteins and those with novel functional motifs. A random test against 315 unique fold/function proteins for a negative test gave >99% true recall. A blind prediction on a novel protein appears consistent with additional evidences retrieved therein. This is the first proof-of-principle of generalized use of structural dynamics for inferring protein molecular function leveraging our custom-made CG FF, useful to all. © 2014 Wiley Periodicals, Inc.
Kislinger, Thomas; Gramolini, Anthony O; MacLennan, David H; Emili, Andrew
2005-08-01
An optimized analytical expression profiling strategy based on gel-free multidimensional protein identification technology (MudPIT) is reported for the systematic investigation of biochemical (mal)-adaptations associated with healthy and diseased heart tissue. Enhanced shotgun proteomic detection coverage and improved biological inference is achieved by pre-fractionation of excised mouse cardiac muscle into subcellular components, with each organellar fraction investigated exhaustively using multiple repeat MudPIT analyses. Functional-enrichment, high-confidence identification, and relative quantification of hundreds of organelle- and tissue-specific proteins are achieved readily, including detection of low abundance transcriptional regulators, signaling factors, and proteins linked to cardiac disease. Important technical issues relating to data validation, including minimization of artifacts stemming from biased under-sampling and spurious false discovery, together with suggestions for further fine-tuning of sample preparation, are discussed. A framework for follow-up bioinformatic examination, pattern recognition, and data mining is also presented in the context of a stringent application of MudPIT for probing fundamental aspects of heart muscle physiology as well as the discovery of perturbations associated with heart failure.
Murata, Aiko; Saito, Hisamichi; Schug, Joanna; Ogawa, Kenji; Kameda, Tatsuya
2016-01-01
A number of studies have shown that individuals often spontaneously mimic the facial expressions of others, a tendency known as facial mimicry. This tendency has generally been considered a reflex-like "automatic" response, but several recent studies have shown that the degree of mimicry may be moderated by contextual information. However, the cognitive and motivational factors underlying the contextual moderation of facial mimicry require further empirical investigation. In this study, we present evidence that the degree to which participants spontaneously mimic a target's facial expressions depends on whether participants are motivated to infer the target's emotional state. In the first study we show that facial mimicry, assessed by facial electromyography, occurs more frequently when participants are specifically instructed to infer a target's emotional state than when given no instruction. In the second study, we replicate this effect using the Facial Action Coding System to show that participants are more likely to mimic facial expressions of emotion when they are asked to infer the target's emotional state, rather than make inferences about a physical trait unrelated to emotion. These results provide convergent evidence that the explicit goal of understanding a target's emotional state affects the degree of facial mimicry shown by the perceiver, suggesting moderation of reflex-like motor activities by higher cognitive processes.
Murata, Aiko; Saito, Hisamichi; Schug, Joanna; Ogawa, Kenji; Kameda, Tatsuya
2016-01-01
A number of studies have shown that individuals often spontaneously mimic the facial expressions of others, a tendency known as facial mimicry. This tendency has generally been considered a reflex-like “automatic” response, but several recent studies have shown that the degree of mimicry may be moderated by contextual information. However, the cognitive and motivational factors underlying the contextual moderation of facial mimicry require further empirical investigation. In this study, we present evidence that the degree to which participants spontaneously mimic a target’s facial expressions depends on whether participants are motivated to infer the target’s emotional state. In the first study we show that facial mimicry, assessed by facial electromyography, occurs more frequently when participants are specifically instructed to infer a target’s emotional state than when given no instruction. In the second study, we replicate this effect using the Facial Action Coding System to show that participants are more likely to mimic facial expressions of emotion when they are asked to infer the target’s emotional state, rather than make inferences about a physical trait unrelated to emotion. These results provide convergent evidence that the explicit goal of understanding a target’s emotional state affects the degree of facial mimicry shown by the perceiver, suggesting moderation of reflex-like motor activities by higher cognitive processes. PMID:27055206
Reinforce: An Ensemble Approach for Inferring PPI Network from AP-MS Data.
Tian, Bo; Duan, Qiong; Zhao, Can; Teng, Ben; He, Zengyou
2017-05-17
Affinity Purification-Mass Spectrometry (AP-MS) is one of the most important technologies for constructing protein-protein interaction (PPI) networks. In this paper, we propose an ensemble method, Reinforce, for inferring PPI network from AP-MS data set. The new algorithm named Reinforce is based on rank aggregation and false discovery rate control. Under the null hypothesis that the interaction scores from different scoring methods are randomly generated, Reinforce follows three steps to integrate multiple ranking results from different algorithms or different data sets. The experimental results show that Reinforce can get more stable and accurate inference results than existing algorithms. The source codes of Reinforce and data sets used in the experiments are available at: https://sourceforge.net/projects/reinforce/.
Golan-Lavi, Roni; Giacomelli, Chiara; Fuks, Garold; Zeisel, Amit; Sonntag, Johanna; Sinha, Sanchari; Köstler, Wolfgang; Wiemann, Stefan; Korf, Ulrike; Yarden, Yosef; Domany, Eytan
2017-03-28
Protein responses to extracellular cues are governed by gene transcription, mRNA degradation and translation, and protein degradation. In order to understand how these time-dependent processes cooperate to generate dynamic responses, we analyzed the response of human mammary cells to the epidermal growth factor (EGF). Integrating time-dependent transcript and protein data into a mathematical model, we inferred for several proteins their pre-and post-stimulus translation and degradation coefficients and found that they exhibit complex, time-dependent variation. Specifically, we identified strategies of protein production and degradation acting in concert to generate rapid, transient protein bursts in response to EGF. Remarkably, for some proteins, for which the response necessitates rapidly decreased abundance, cells exhibit a transient increase in the corresponding degradation coefficient. Our model and analysis allow inference of the kinetics of mRNA translation and protein degradation, without perturbing cells, and open a way to understanding the fundamental processes governing time-dependent protein abundance profiles. Copyright © 2017 The Author(s). Published by Elsevier Inc. All rights reserved.
Evol and ProDy for bridging protein sequence evolution and structural dynamics
Mao, Wenzhi; Liu, Ying; Chennubhotla, Chakra; Lezon, Timothy R.; Bahar, Ivet
2014-01-01
Correlations between sequence evolution and structural dynamics are of utmost importance in understanding the molecular mechanisms of function and their evolution. We have integrated Evol, a new package for fast and efficient comparative analysis of evolutionary patterns and conformational dynamics, into ProDy, a computational toolbox designed for inferring protein dynamics from experimental and theoretical data. Using information-theoretic approaches, Evol coanalyzes conservation and coevolution profiles extracted from multiple sequence alignments of protein families with their inferred dynamics. Availability and implementation: ProDy and Evol are open-source and freely available under MIT License from http://prody.csb.pitt.edu/. Contact: bahar@pitt.edu PMID:24849577
The mechanisms of temporal inference
NASA Technical Reports Server (NTRS)
Fox, B. R.; Green, S. R.
1987-01-01
The properties of a temporal language are determined by its constituent elements: the temporal objects which it can represent, the attributes of those objects, the relationships between them, the axioms which define the default relationships, and the rules which define the statements that can be formulated. The methods of inference which can be applied to a temporal language are derived in part from a small number of axioms which define the meaning of equality and order and how those relationships can be propagated. More complex inferences involve detailed analysis of the stated relationships. Perhaps the most challenging area of temporal inference is reasoning over disjunctive temporal constraints. Simple forms of disjunction do not sufficiently increase the expressive power of a language while unrestricted use of disjunction makes the analysis NP-hard. In many cases a set of disjunctive constraints can be converted to disjunctive normal form and familiar methods of inference can be applied to the conjunctive sub-expressions. This process itself is NP-hard but it is made more tractable by careful expansion of a tree-structured search space.
Introduction to bioinformatics.
Can, Tolga
2014-01-01
Bioinformatics is an interdisciplinary field mainly involving molecular biology and genetics, computer science, mathematics, and statistics. Data intensive, large-scale biological problems are addressed from a computational point of view. The most common problems are modeling biological processes at the molecular level and making inferences from collected data. A bioinformatics solution usually involves the following steps: Collect statistics from biological data. Build a computational model. Solve a computational modeling problem. Test and evaluate a computational algorithm. This chapter gives a brief introduction to bioinformatics by first providing an introduction to biological terminology and then discussing some classical bioinformatics problems organized by the types of data sources. Sequence analysis is the analysis of DNA and protein sequences for clues regarding function and includes subproblems such as identification of homologs, multiple sequence alignment, searching sequence patterns, and evolutionary analyses. Protein structures are three-dimensional data and the associated problems are structure prediction (secondary and tertiary), analysis of protein structures for clues regarding function, and structural alignment. Gene expression data is usually represented as matrices and analysis of microarray data mostly involves statistics analysis, classification, and clustering approaches. Biological networks such as gene regulatory networks, metabolic pathways, and protein-protein interaction networks are usually modeled as graphs and graph theoretic approaches are used to solve associated problems such as construction and analysis of large-scale networks.
The systematic annotation of the three main GPCR families in Reactome.
Jassal, Bijay; Jupe, Steven; Caudy, Michael; Birney, Ewan; Stein, Lincoln; Hermjakob, Henning; D'Eustachio, Peter
2010-07-29
Reactome is an open-source, freely available database of human biological pathways and processes. A major goal of our work is to provide an integrated view of cellular signalling processes that spans from ligand-receptor interactions to molecular readouts at the level of metabolic and transcriptional events. To this end, we have built the first catalogue of all human G protein-coupled receptors (GPCRs) known to bind endogenous or natural ligands. The UniProt database has records for 797 proteins classified as GPCRs and sorted into families A/1, B/2 and C/3 on the basis of amino acid sequence. To these records we have added details from the IUPHAR database and our own manual curation of relevant literature to create reactions in which 563 GPCRs bind ligands and also interact with specific G-proteins to initiate signalling cascades. We believe the remaining 234 GPCRs are true orphans. The Reactome GPCR pathway can be viewed as a detailed interactive diagram and can be exported in many forms. It provides a template for the orthology-based inference of GPCR reactions for diverse model organism species, and can be overlaid with protein-protein interaction and gene expression datasets to facilitate overrepresentation studies and other forms of pathway analysis. Database URL: http://www.reactome.org.
Banyuls, N; Hernández-Rodríguez, C S; Van Rie, J; Ferré, J
2018-05-15
Vip3 vegetative insecticidal proteins from Bacillus thuringiensis are an important tool for crop protection against caterpillar pests in IPM strategies. While there is wide consensus on their general mode of action, the details of their mode of action are not completely elucidated and their structure remains unknown. In this work the alanine scanning technique was performed on 558 out of the total of 788 amino acids of the Vip3Af1 protein. From the 558 residue substitutions, 19 impaired protein expression and other 19 substitutions severely compromised the insecticidal activity against Spodoptera frugiperda. The latter 19 substitutions mainly clustered in two regions of the protein sequence (amino acids 167-272 and amino acids 689-741). Most of these substitutions also decreased the activity to Agrotis segetum. The characterisation of the sensitivity to proteases of the mutant proteins displaying decreased insecticidal activity revealed 6 different band patterns as evaluated by SDS-PAGE. The study of the intrinsic fluorescence of most selected mutants revealed only slight shifts in the emission peak, likely indicating only minor changes in the tertiary structure. An in silico modelled 3D structure of Vip3Af1 is proposed for the first time.
Yao, Shaolun; Jiang, Chuan; Huang, Ziyue; Torres-Jerez, Ivone; Chang, Junil; Zhang, Heng; Udvardi, Michael; Liu, Renyi; Verdier, Jerome
2016-10-01
Legume research and cultivar development are important for sustainable food production, especially of high-protein seed. Thanks to the development of deep-sequencing technologies, crop species have been taken to the front line, even without completion of their genome sequences. Black-eyed pea (Vigna unguiculata) is a legume species widely grown in semi-arid regions, which has high potential to provide stable seed protein production in a broad range of environments, including drought conditions. The black-eyed pea reference genotype has been used to generate a gene expression atlas of the major plant tissues (i.e. leaf, root, stem, flower, pod and seed), with a developmental time series for pods and seeds. From these various organs, 27 cDNA libraries were generated and sequenced, resulting in more than one billion reads. Following filtering, these reads were de novo assembled into 36 529 transcript sequences that were annotated and quantified across the different tissues. A set of 24 866 unique transcript sequences, called Unigenes, was identified. All the information related to transcript identification, annotation and quantification were stored into a gene expression atlas webserver (http://vugea.noble.org), providing a user-friendly interface and necessary tools to analyse transcript expression in black-eyed pea organs and to compare data with other legume species. Using this gene expression atlas, we inferred details of molecular processes that are active during seed development, and identified key putative regulators of seed maturation. Additionally, we found evidence for conservation of regulatory mechanisms involving miRNA in plant tissues subjected to drought and seeds undergoing desiccation. © 2016 The Authors. The Plant Journal published by Society for Experimental Biology and John Wiley & Sons Ltd.
Roles of factorial noise in inducing bimodal gene expression
NASA Astrophysics Data System (ADS)
Liu, Peijiang; Yuan, Zhanjiang; Huang, Lifang; Zhou, Tianshou
2015-06-01
Some gene regulatory systems can exhibit bimodal distributions of mRNA or protein although the deterministic counterparts are monostable. This noise-induced bimodality is an interesting phenomenon and has important biological implications, but it is unclear how different sources of expression noise (each source creates so-called factorial noise that is defined as a component of the total noise) contribute separately to this stochastic bimodality. Here we consider a minimal model of gene regulation, which is monostable in the deterministic case. Although simple, this system contains factorial noise of two main kinds: promoter noise due to switching between gene states and transcriptional (or translational) noise due to synthesis and degradation of mRNA (or protein). To better trace the roles of factorial noise in inducing bimodality, we also analyze two limit models, continuous and adiabatic approximations, apart from the exact model. We show that in the case of slow gene switching, the continuous model where only promoter noise is considered can exhibit bimodality; in the case of fast switching, the adiabatic model where only transcriptional or translational noise is considered can also exhibit bimodality but the exact model cannot; and in other cases, both promoter noise and transcriptional or translational noise can cooperatively induce bimodality. Since slow gene switching and large protein copy numbers are characteristics of eukaryotic cells, whereas fast gene switching and small protein copy numbers are characteristics of prokaryotic cells, we infer that eukaryotic stochastic bimodality is induced mainly by promoter noise, whereas prokaryotic stochastic bimodality is induced primarily by transcriptional or translational noise.
Gene network analysis: from heart development to cardiac therapy.
Ferrazzi, Fulvia; Bellazzi, Riccardo; Engel, Felix B
2015-03-01
Networks offer a flexible framework to represent and analyse the complex interactions between components of cellular systems. In particular gene networks inferred from expression data can support the identification of novel hypotheses on regulatory processes. In this review we focus on the use of gene network analysis in the study of heart development. Understanding heart development will promote the elucidation of the aetiology of congenital heart disease and thus possibly improve diagnostics. Moreover, it will help to establish cardiac therapies. For example, understanding cardiac differentiation during development will help to guide stem cell differentiation required for cardiac tissue engineering or to enhance endogenous repair mechanisms. We introduce different methodological frameworks to infer networks from expression data such as Boolean and Bayesian networks. Then we present currently available temporal expression data in heart development and discuss the use of network-based approaches in published studies. Collectively, our literature-based analysis indicates that gene network analysis constitutes a promising opportunity to infer therapy-relevant regulatory processes in heart development. However, the use of network-based approaches has so far been limited by the small amount of samples in available datasets. Thus, we propose to acquire high-resolution temporal expression data to improve the mathematical descriptions of regulatory processes obtained with gene network inference methodologies. Especially probabilistic methods that accommodate the intrinsic variability of biological systems have the potential to contribute to a deeper understanding of heart development.
Acerbi, Enzo; Viganò, Elena; Poidinger, Michael; Mortellaro, Alessandra; Zelante, Teresa; Stella, Fabio
2016-01-01
T helper 17 (TH17) cells represent a pivotal adaptive cell subset involved in multiple immune disorders in mammalian species. Deciphering the molecular interactions regulating TH17 cell differentiation is particularly critical for novel drug target discovery designed to control maladaptive inflammatory conditions. Using continuous time Bayesian networks over a time-course gene expression dataset, we inferred the global regulatory network controlling TH17 differentiation. From the network, we identified the Prdm1 gene encoding the B lymphocyte-induced maturation protein 1 as a crucial negative regulator of human TH17 cell differentiation. The results have been validated by perturbing Prdm1 expression on freshly isolated CD4+ naïve T cells: reduction of Prdm1 expression leads to augmentation of IL-17 release. These data unravel a possible novel target to control TH17 polarization in inflammatory disorders. Furthermore, this study represents the first in vitro validation of continuous time Bayesian networks as gene network reconstruction method and as hypothesis generation tool for wet-lab biological experiments. PMID:26976045
Modrák, Martin; Vohradský, Jiří
2018-04-13
Identifying regulons of sigma factors is a vital subtask of gene network inference. Integrating multiple sources of data is essential for correct identification of regulons and complete gene regulatory networks. Time series of expression data measured with microarrays or RNA-seq combined with static binding experiments (e.g., ChIP-seq) or literature mining may be used for inference of sigma factor regulatory networks. We introduce Genexpi: a tool to identify sigma factors by combining candidates obtained from ChIP experiments or literature mining with time-course gene expression data. While Genexpi can be used to infer other types of regulatory interactions, it was designed and validated on real biological data from bacterial regulons. In this paper, we put primary focus on CyGenexpi: a plugin integrating Genexpi with the Cytoscape software for ease of use. As a part of this effort, a plugin for handling time series data in Cytoscape called CyDataseries has been developed and made available. Genexpi is also available as a standalone command line tool and an R package. Genexpi is a useful part of gene network inference toolbox. It provides meaningful information about the composition of regulons and delivers biologically interpretable results.
Kudapa, Himabindu; Garg, Vanika; Chitikineni, Annapurna; Varshney, Rajeev K
2018-04-10
Chickpea is one of the world's largest cultivated food legumes and is an excellent source of high-quality protein to the human diet. Plant growth and development are controlled by programmed expression of a suite of genes at the given time, stage, and tissue. Understanding how the underlying genome sequence translates into specific plant phenotypes at key developmental stages, information on gene expression patterns is crucial. Here, we present a comprehensive Cicer arietinum Gene Expression Atlas (CaGEA) across different plant developmental stages and organs covering the entire life cycle of chickpea. One of the widely used drought tolerant cultivars, ICC 4958 has been used to generate RNA-Seq data from 27 samples at 5 major developmental stages of the plant. A total of 816 million raw reads were generated and of these, 794 million filtered reads after quality control (QC) were subjected to downstream analysis. A total of 15,947 unique number of differentially expressed genes across different pairwise tissue combinations were identified. Significant differences in gene expression patterns contributing in the process of flowering, nodulation, and seed and root development were inferred in this study. Furthermore, differentially expressed candidate genes from "QTL-hotspot" region associated with drought stress response in chickpea were validated. © 2018 The Authors. Plant, Cell & Environment Published by John Wiley & Sons Ltd.
NASA Astrophysics Data System (ADS)
Weigt, Martin
Over the last years, biological research has been revolutionized by experimental high-throughput techniques, in particular by next-generation sequencing technology. Unprecedented amounts of data are accumulating, and there is a growing request for computational methods unveiling the information hidden in raw data, thereby increasing our understanding of complex biological systems. Statistical-physics models based on the maximum-entropy principle have, in the last few years, played an important role in this context. To give a specific example, proteins and many non-coding RNA show a remarkable degree of structural and functional conservation in the course of evolution, despite a large variability in amino acid sequences. We have developed a statistical-mechanics inspired inference approach - called Direct-Coupling Analysis - to link this sequence variability (easy to observe in sequence alignments, which are available in public sequence databases) to bio-molecular structure and function. In my presentation I will show, how this methodology can be used (i) to infer contacts between residues and thus to guide tertiary and quaternary protein structure prediction and RNA structure prediction, (ii) to discriminate interacting from non-interacting protein families, and thus to infer conserved protein-protein interaction networks, and (iii) to reconstruct mutational landscapes and thus to predict the phenotypic effect of mutations. References [1] M. Figliuzzi, H. Jacquier, A. Schug, O. Tenaillon and M. Weigt ''Coevolutionary landscape inference and the context-dependence of mutations in beta-lactamase TEM-1'', Mol. Biol. Evol. (2015), doi: 10.1093/molbev/msv211 [2] E. De Leonardis, B. Lutz, S. Ratz, S. Cocco, R. Monasson, A. Schug, M. Weigt ''Direct-Coupling Analysis of nucleotide coevolution facilitates RNA secondary and tertiary structure prediction'', Nucleic Acids Research (2015), doi: 10.1093/nar/gkv932 [3] F. Morcos, A. Pagnani, B. Lunt, A. Bertolino, D. Marks, C. Sander, R. Zecchina, J.N. Onuchic, T. Hwa, M. Weigt, ''Direct-coupling analysis of residue co-evolution captures native contacts across many protein families'', Proc. Natl. Acad. Sci. 108, E1293-E1301 (2011).
González-Thuillier, Irene; Venegas-Calerón, Mónica; Garcés, Rafael; von Wettstein-Knowles, Penny; Martínez-Force, Enrique
2015-01-01
Enoyl-[acyl carrier protein]-reductases from sunflower. A major factor contributing to the amount of fatty acids in plant oils are the first steps of their synthesis. The intraplastidic fatty acid biosynthetic pathway in plants is catalysed by type II fatty acid synthase (FAS). The last step in each elongation cycle is carried out by the enoyl-[ACP]-reductase, which reduces the dehydrated product of β-hydroxyacyl-[ACP] dehydrase using NADPH or NADH. To determine the mechanisms involved in the biosynthesis of fatty acids in sunflower (Helianthus annuus) seeds, two enoyl-[ACP]-reductase genes have been identified and cloned from developing seeds with 75 % identity: HaENR1 (GenBank HM021137) and HaENR2 (HM021138). The two genes belong to the ENRA and ENRB families in dicotyledons, respectively. The genetic duplication most likely originated after the separation of di- and monocotyledons. RT-qPCR revealed distinct tissue-specific expression patterns. Highest expression of HaENR1 was in roots, stems and developing cotyledons whereas that of H a ENR2 was in leaves and early stages of seed development. Genomic DNA gel blot analyses suggest that both are single-copy genes. In vivo activity of the ENR enzymes was tested by complementation experiments with the JP1111 fabI(ts) E. coli strain. Both enzymes were functional demonstrating that they interacted with the bacterial FAS components. That different fatty acid profiles resulted infers that the two Helianthus proteins have different structures, substrate specificities and/or reaction rates. The latter possibility was confirmed by in vitro analysis with affinity-purified heterologous-expressed enzymes that reduced the crotonyl-CoA substrate using NADH with different V max.
The evolution and regulation of the mucosal immune complexity in the basal chordate amphioxus.
Huang, Shengfeng; Wang, Xin; Yan, Qingyu; Guo, Lei; Yuan, Shaochun; Huang, Guangrui; Huang, Huiqing; Li, Jun; Dong, Meiling; Chen, Shangwu; Xu, Anlong
2011-02-15
Both amphioxus and the sea urchin encode a complex innate immune gene repertoire in their genomes, but the composition and mechanisms of their innate immune systems, as well as the fundamental differences between two systems, remain largely unexplored. In this study, we dissect the mucosal immune complexity of amphioxus into different evolutionary-functional modes and regulatory patterns by integrating information from phylogenetic inferences, genome-wide digital expression profiles, time course expression dynamics, and functional analyses. With these rich data, we reconstruct several major immune subsystems in amphioxus and analyze their regulation during mucosal infection. These include the TNF/IL-1R network, TLR and NLR networks, complement system, apoptosis network, oxidative pathways, and other effector genes (e.g., peptidoglycan recognition proteins, Gram-negative binding proteins, and chitin-binding proteins). We show that beneath the superficial similarity to that of the sea urchin, the amphioxus innate system, despite preserving critical invertebrate components, is more similar to that of the vertebrates in terms of composition, expression regulation, and functional strategies. For example, major effectors in amphioxus gut mucous tissue are the well-developed complement and oxidative-burst systems, and the signaling network in amphioxus seems to emphasize signal transduction/modulation more than initiation. In conclusion, we suggest that the innate immune systems of amphioxus and the sea urchin are strategically different, possibly representing two successful cases among many expanded immune systems that arose at the age of the Cambrian explosion. We further suggest that the vertebrate innate immune system should be derived from one of these expanded systems, most likely from the same one that was shared by amphioxus.
Liu, Bei; Staron, Matthew; Li, Zihai
2012-01-01
Basophil has been implicated in anti-parasite defense, allergy and in polarizing T(H)2 response. Mouse model has been commonly used to study basophil function although the difference between human and mouse basophils is underappreciated. As an essential chaperone for multiple Toll-like receptors and integrins in the endoplasmic reticulum, gp96 also participates in general protein homeostasis and in the ER unfolded protein response to ensure cell survival during stress. The roles of gp96 in basophil development are unknown. We genetically delete gp96 in mice and examined the expression of gp96 in basophils by Western blot and flow cytometry. We compared the expression pattern of gp96 between human and mouse basophils. We found that gp96 was dispensable for murine basophil development. Moreover, gp96 was cleaved by serine protease(s) in murine but not human basophils leading to accumulation of a nun-functional N-terminal ∼50 kDa fragment and striking induction of the unfolded protein response. The alteration of gp96 was unique to basophils and was not observed in any other cell types including mast cells. We also demonstrated that the ectopic expression of a mouse-specific tryptase mMCP11 does not lead to gp96 cleavage in human basophils. Our study revealed a remarkable biochemical event of gp96 silencing in murine but not human basophils, highlighting the need for caution in using mouse models to infer the function of basophils in human immune response. Our study also reveals a novel mechanism of shutting down gp96 post-translationally in regulating its function.
Molecular analysis of urothelial cancer cell lines for modeling tumor biology and drug response.
Nickerson, M L; Witte, N; Im, K M; Turan, S; Owens, C; Misner, K; Tsang, S X; Cai, Z; Wu, S; Dean, M; Costello, J C; Theodorescu, D
2017-01-05
The utility of tumor-derived cell lines is dependent on their ability to recapitulate underlying genomic aberrations and primary tumor biology. Here, we sequenced the exomes of 25 bladder cancer (BCa) cell lines and compared mutations, copy number alterations (CNAs), gene expression and drug response to BCa patient profiles in The Cancer Genome Atlas (TCGA). We observed a mutation pattern associated with altered CpGs and APOBEC-family cytosine deaminases similar to mutation signatures derived from somatic alterations in muscle-invasive (MI) primary tumors, highlighting a major mechanism(s) contributing to cancer-associated alterations in the BCa cell line exomes. Non-silent sequence alterations were confirmed in 76 cancer-associated genes, including mutations that likely activate oncogenes TERT and PIK3CA, and alter chromatin-associated proteins (MLL3, ARID1A, CHD6 and KDM6A) and established BCa genes (TP53, RB1, CDKN2A and TSC1). We identified alterations in signaling pathways and proteins with related functions, including the PI3K/mTOR pathway, altered in 60% of lines; BRCA DNA repair, 44%; and SYNE1-SYNE2, 60%. Homozygous deletions of chromosome 9p21 are known to target the cell cycle regulators CDKN2A and CDKN2B. This loci was commonly lost in BCa cell lines and we show the deletions extended to the polyamine enzyme methylthioadenosine (MTA) phosphorylase (MTAP) in 36% of lines, transcription factor DMRTA1 (27%) and antiviral interferon epsilon (IFNE, 19%). Overall, the BCa cell line genomic aberrations were concordant with those found in BCa patient tumors. We used gene expression and copy number data to infer pathway activities for cell lines, then used the inferred pathway activities to build a predictive model of cisplatin response. When applied to platinum-treated patients gathered from TCGA, the model predicted treatment-specific response. Together, these data and analysis represent a valuable community resource to model basic tumor biology and to study the pharmacogenomics of BCa.
Ge, Lin-Quan; Jiang, Yi-Ping; Xia, Ting; Song, Qi-Sheng; Stanley, David; Kuai, Peng; Lu, Xiu-Li; Yang, Guo-Qing; Wu, Jin-Cai
2015-07-17
The brown planthopper (BPH), Nilaparvata lugens, sugar transporter gene 6 (Nlst6) is a facilitative glucose/fructose transporter (often called a passive carrier) expressed in midgut that mediates sugar transport from the midgut lumen to hemolymph. The influence of down regulating expression of sugar transporter genes on insect growth, development, and fecundity is unknown. Nonetheless, it is reasonable to suspect that transporter-mediated uptake of dietary sugar is essential to the biology of phloem-feeding insects. Based on this reasoning, we posed the hypothesis that silencing, or reducing expression, of a BPH sugar transporter gene would be deleterious to the insects. To test our hypothesis, we examined the effects of Nlst6 knockdown on BPH biology. Reducing expression of Nlst6 led to profound effects on BPHs. It significantly prolonged the pre-oviposition period, shortened the oviposition period, decreased the number of eggs deposited and reduced body weight, compared to controls. Nlst6 knockdown also significantly decreased fat body and ovarian (particularly vitellogenin) protein content as well as vitellogenin gene expression. Experimental BPHs accumulated less fat body glucose compared to controls. We infer that Nlst6 acts in BPH growth and fecundity, and has potential as a novel target gene for control of phloem-feeding pest insects.
QuASAR: quantitative allele-specific analysis of reads.
Harvey, Chris T; Moyerbrailean, Gregory A; Davis, Gordon O; Wen, Xiaoquan; Luca, Francesca; Pique-Regi, Roger
2015-04-15
Expression quantitative trait loci (eQTL) studies have discovered thousands of genetic variants that regulate gene expression, enabling a better understanding of the functional role of non-coding sequences. However, eQTL studies are costly, requiring large sample sizes and genome-wide genotyping of each sample. In contrast, analysis of allele-specific expression (ASE) is becoming a popular approach to detect the effect of genetic variation on gene expression, even within a single individual. This is typically achieved by counting the number of RNA-seq reads matching each allele at heterozygous sites and testing the null hypothesis of a 1:1 allelic ratio. In principle, when genotype information is not readily available, it could be inferred from the RNA-seq reads directly. However, there are currently no existing methods that jointly infer genotypes and conduct ASE inference, while considering uncertainty in the genotype calls. We present QuASAR, quantitative allele-specific analysis of reads, a novel statistical learning method for jointly detecting heterozygous genotypes and inferring ASE. The proposed ASE inference step takes into consideration the uncertainty in the genotype calls, while including parameters that model base-call errors in sequencing and allelic over-dispersion. We validated our method with experimental data for which high-quality genotypes are available. Results for an additional dataset with multiple replicates at different sequencing depths demonstrate that QuASAR is a powerful tool for ASE analysis when genotypes are not available. http://github.com/piquelab/QuASAR. fluca@wayne.edu or rpique@wayne.edu Supplementary Material is available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
2010-01-01
Background Nonparametric Bayesian techniques have been developed recently to extend the sophistication of factor models, allowing one to infer the number of appropriate factors from the observed data. We consider such techniques for sparse factor analysis, with application to gene-expression data from three virus challenge studies. Particular attention is placed on employing the Beta Process (BP), the Indian Buffet Process (IBP), and related sparseness-promoting techniques to infer a proper number of factors. The posterior density function on the model parameters is computed using Gibbs sampling and variational Bayesian (VB) analysis. Results Time-evolving gene-expression data are considered for respiratory syncytial virus (RSV), Rhino virus, and influenza, using blood samples from healthy human subjects. These data were acquired in three challenge studies, each executed after receiving institutional review board (IRB) approval from Duke University. Comparisons are made between several alternative means of per-forming nonparametric factor analysis on these data, with comparisons as well to sparse-PCA and Penalized Matrix Decomposition (PMD), closely related non-Bayesian approaches. Conclusions Applying the Beta Process to the factor scores, or to the singular values of a pseudo-SVD construction, the proposed algorithms infer the number of factors in gene-expression data. For real data the "true" number of factors is unknown; in our simulations we consider a range of noise variances, and the proposed Bayesian models inferred the number of factors accurately relative to other methods in the literature, such as sparse-PCA and PMD. We have also identified a "pan-viral" factor of importance for each of the three viruses considered in this study. We have identified a set of genes associated with this pan-viral factor, of interest for early detection of such viruses based upon the host response, as quantified via gene-expression data. PMID:21062443
Gene expression inference with deep learning.
Chen, Yifei; Li, Yi; Narayan, Rajiv; Subramanian, Aravind; Xie, Xiaohui
2016-06-15
Large-scale gene expression profiling has been widely used to characterize cellular states in response to various disease conditions, genetic perturbations, etc. Although the cost of whole-genome expression profiles has been dropping steadily, generating a compendium of expression profiling over thousands of samples is still very expensive. Recognizing that gene expressions are often highly correlated, researchers from the NIH LINCS program have developed a cost-effective strategy of profiling only ∼1000 carefully selected landmark genes and relying on computational methods to infer the expression of remaining target genes. However, the computational approach adopted by the LINCS program is currently based on linear regression (LR), limiting its accuracy since it does not capture complex nonlinear relationship between expressions of genes. We present a deep learning method (abbreviated as D-GEX) to infer the expression of target genes from the expression of landmark genes. We used the microarray-based Gene Expression Omnibus dataset, consisting of 111K expression profiles, to train our model and compare its performance to those from other methods. In terms of mean absolute error averaged across all genes, deep learning significantly outperforms LR with 15.33% relative improvement. A gene-wise comparative analysis shows that deep learning achieves lower error than LR in 99.97% of the target genes. We also tested the performance of our learned model on an independent RNA-Seq-based GTEx dataset, which consists of 2921 expression profiles. Deep learning still outperforms LR with 6.57% relative improvement, and achieves lower error in 81.31% of the target genes. D-GEX is available at https://github.com/uci-cbcl/D-GEX CONTACT: xhx@ics.uci.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Gene expression inference with deep learning
Chen, Yifei; Li, Yi; Narayan, Rajiv; Subramanian, Aravind; Xie, Xiaohui
2016-01-01
Motivation: Large-scale gene expression profiling has been widely used to characterize cellular states in response to various disease conditions, genetic perturbations, etc. Although the cost of whole-genome expression profiles has been dropping steadily, generating a compendium of expression profiling over thousands of samples is still very expensive. Recognizing that gene expressions are often highly correlated, researchers from the NIH LINCS program have developed a cost-effective strategy of profiling only ∼1000 carefully selected landmark genes and relying on computational methods to infer the expression of remaining target genes. However, the computational approach adopted by the LINCS program is currently based on linear regression (LR), limiting its accuracy since it does not capture complex nonlinear relationship between expressions of genes. Results: We present a deep learning method (abbreviated as D-GEX) to infer the expression of target genes from the expression of landmark genes. We used the microarray-based Gene Expression Omnibus dataset, consisting of 111K expression profiles, to train our model and compare its performance to those from other methods. In terms of mean absolute error averaged across all genes, deep learning significantly outperforms LR with 15.33% relative improvement. A gene-wise comparative analysis shows that deep learning achieves lower error than LR in 99.97% of the target genes. We also tested the performance of our learned model on an independent RNA-Seq-based GTEx dataset, which consists of 2921 expression profiles. Deep learning still outperforms LR with 6.57% relative improvement, and achieves lower error in 81.31% of the target genes. Availability and implementation: D-GEX is available at https://github.com/uci-cbcl/D-GEX. Contact: xhx@ics.uci.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26873929
Relative evolutionary rate inference in HyPhy with LEISR.
Spielman, Stephanie J; Kosakovsky Pond, Sergei L
2018-01-01
We introduce LEISR (Likehood Estimation of Individual Site Rates, pronounced "laser"), a tool to infer relative evolutionary rates from protein and nucleotide data, implemented in HyPhy. LEISR is based on the popular Rate4Site (Pupko et al., 2002) approach for inferring relative site-wise evolutionary rates, primarily from protein data. We extend the original method for more general use in several key ways: (i) we increase the support for nucleotide data with additional models, (ii) we allow for datasets of arbitrary size, (iii) we support analysis of site-partitioned datasets to correct for the presence of recombination breakpoints, (iv) we produce rate estimates at all sites rather than at just a subset of sites, and (v) we implemented LEISR as MPI-enabled to support rapid, high-throughput analysis. LEISR is available in HyPhy starting with version 2.3.8, and it is accessible as an option in the HyPhy analysis menu ("Relative evolutionary rate inference"), which calls the HyPhy batchfile LEISR.bf.
NASA Astrophysics Data System (ADS)
Shekhar, Karthik; Ruberman, Claire F.; Ferguson, Andrew L.; Barton, John P.; Kardar, Mehran; Chakraborty, Arup K.
2013-12-01
Mutational escape from vaccine-induced immune responses has thwarted the development of a successful vaccine against AIDS, whose causative agent is HIV, a highly mutable virus. Knowing the virus' fitness as a function of its proteomic sequence can enable rational design of potent vaccines, as this information can focus vaccine-induced immune responses to target mutational vulnerabilities of the virus. Spin models have been proposed as a means to infer intrinsic fitness landscapes of HIV proteins from patient-derived viral protein sequences. These sequences are the product of nonequilibrium viral evolution driven by patient-specific immune responses and are subject to phylogenetic constraints. How can such sequence data allow inference of intrinsic fitness landscapes? We combined computer simulations and variational theory á la Feynman to show that, in most circumstances, spin models inferred from patient-derived viral sequences reflect the correct rank order of the fitness of mutant viral strains. Our findings are relevant for diverse viruses.
Distribution and Evolution of Yersinia Leucine-Rich Repeat Proteins
Hu, Yueming; Huang, He; Hui, Xinjie; Cheng, Xi; White, Aaron P.
2016-01-01
Leucine-rich repeat (LRR) proteins are widely distributed in bacteria, playing important roles in various protein-protein interaction processes. In Yersinia, the well-characterized type III secreted effector YopM also belongs to the LRR protein family and is encoded by virulence plasmids. However, little has been known about other LRR members encoded by Yersinia genomes or their evolution. In this study, the Yersinia LRR proteins were comprehensively screened, categorized, and compared. The LRR proteins encoded by chromosomes (LRR1 proteins) appeared to be more similar to each other and different from those encoded by plasmids (LRR2 proteins) with regard to repeat-unit length, amino acid composition profile, and gene expression regulation circuits. LRR1 proteins were also different from LRR2 proteins in that the LRR1 proteins contained an E3 ligase domain (NEL domain) in the C-terminal region or an NEL domain-encoding nucleotide relic in flanking genomic sequences. The LRR1 protein-encoding genes (LRR1 genes) varied dramatically and were categorized into 4 subgroups (a to d), with the LRR1a to -c genes evolving from the same ancestor and LRR1d genes evolving from another ancestor. The consensus and ancestor repeat-unit sequences were inferred for different LRR1 protein subgroups by use of a maximum parsimony modeling strategy. Structural modeling disclosed very similar repeat-unit structures between LRR1 and LRR2 proteins despite the different unit lengths and amino acid compositions. Structural constraints may serve as the driving force to explain the observed mutations in the LRR regions. This study suggests that there may be functional variation and lays the foundation for future experiments investigating the functions of the chromosomally encoded LRR proteins of Yersinia. PMID:27217422
Forming Facial Expressions Influences Assessment of Others' Dominance but Not Trustworthiness.
Ueda, Yoshiyuki; Nagoya, Kie; Yoshikawa, Sakiko; Nomura, Michio
2017-01-01
Forming specific facial expressions influences emotions and perception. Bearing this in mind, studies should be reconsidered in which observers expressing neutral emotions inferred personal traits from the facial expressions of others. In the present study, participants were asked to make happy, neutral, and disgusted facial expressions: for "happy," they held a wooden chopstick in their molars to form a smile; for "neutral," they clasped the chopstick between their lips, making no expression; for "disgusted," they put the chopstick between their upper lip and nose and knit their brows in a scowl. However, they were not asked to intentionally change their emotional state. Observers judged happy expression images as more trustworthy, competent, warm, friendly, and distinctive than disgusted expression images, regardless of the observers' own facial expression. Observers judged disgusted expression images as more dominant than happy expression images. However, observers expressing disgust overestimated dominance in observed disgusted expression images and underestimated dominance in happy expression images. In contrast, observers with happy facial forms attenuated dominance for disgusted expression images. These results suggest that dominance inferred from facial expressions is unstable and influenced by not only the observed facial expression, but also the observers' own physiological states.
Trébulle, Pauline; Nicaud, Jean-Marc; Leplat, Christophe; Elati, Mohamed
2017-01-01
Complex phenotypes, such as lipid accumulation, result from cooperativity between regulators and the integration of multiscale information. However, the elucidation of such regulatory programs by experimental approaches may be challenging, particularly in context-specific conditions. In particular, we know very little about the regulators of lipid accumulation in the oleaginous yeast of industrial interest Yarrowia lipolytica . This lack of knowledge limits the development of this yeast as an industrial platform, due to the time-consuming and costly laboratory efforts required to design strains with the desired phenotypes. In this study, we aimed to identify context-specific regulators and mechanisms, to guide explorations of the regulation of lipid accumulation in Y. lipolytica . Using gene regulatory network inference, and considering the expression of 6539 genes over 26 time points from GSE35447 for biolipid production and a list of 151 transcription factors, we reconstructed a gene regulatory network comprising 111 transcription factors, 4451 target genes and 17048 regulatory interactions (YL-GRN-1) supported by evidence of protein-protein interactions. This study, based on network interrogation and wet laboratory validation (a) highlights the relevance of our proposed measure, the transcription factors influence, for identifying phases corresponding to changes in physiological state without prior knowledge (b) suggests new potential regulators and drivers of lipid accumulation and (c) experimentally validates the impact of six of the nine regulators identified on lipid accumulation, with variations in lipid content from +43.2% to -31.2% on glucose or glycerol.
Genome-Scale Analysis of Translation Elongation with a Ribosome Flow Model
Meilijson, Isaac; Kupiec, Martin; Ruppin, Eytan
2011-01-01
We describe the first large scale analysis of gene translation that is based on a model that takes into account the physical and dynamical nature of this process. The Ribosomal Flow Model (RFM) predicts fundamental features of the translation process, including translation rates, protein abundance levels, ribosomal densities and the relation between all these variables, better than alternative (‘non-physical’) approaches. In addition, we show that the RFM can be used for accurate inference of various other quantities including genes' initiation rates and translation costs. These quantities could not be inferred by previous predictors. We find that increasing the number of available ribosomes (or equivalently the initiation rate) increases the genomic translation rate and the mean ribosome density only up to a certain point, beyond which both saturate. Strikingly, assuming that the translation system is tuned to work at the pre-saturation point maximizes the predictive power of the model with respect to experimental data. This result suggests that in all organisms that were analyzed (from bacteria to Human), the global initiation rate is optimized to attain the pre-saturation point. The fact that similar results were not observed for heterologous genes indicates that this feature is under selection. Remarkably, the gap between the performance of the RFM and alternative predictors is strikingly large in the case of heterologous genes, testifying to the model's promising biotechnological value in predicting the abundance of heterologous proteins before expressing them in the desired host. PMID:21909250
Li, Xueling; Zhu, Min; Brasier, Allan R; Kudlicki, Andrzej S
2015-04-01
How different pathways lead to the activation of a specific transcription factor (TF) with specific effects is not fully understood. We model context-specific transcriptional regulation as a modulatory network: triplets composed of a TF, target gene, and modulator. Modulators usually affect the activity of a specific TF at the posttranscriptional level in a target gene-specific action mode. This action may be classified as enhancement, attenuation, or inversion of either activation or inhibition. As a case study, we inferred, from a large collection of expression profiles, all potential modulations of NF-κB/RelA. The predicted modulators include many proteins previously not reported as physically binding to RelA but with relevant functions, such as RNA processing, cell cycle, mitochondrion, ubiquitin-dependent proteolysis, and chromatin modification. Modulators from different processes exert specific prevalent action modes on distinct pathways. Modulators from noncoding RNA, RNA-binding proteins, TFs, and kinases modulate the NF-κB/RelA activity with specific action modes consistent with their molecular functions and modulation level. The modulatory networks of NF-κB/RelA in the context epithelial-mesenchymal transition (EMT) and burn injury have different modulators, including those involved in extracellular matrix (FBN1), cytoskeletal regulation (ACTN1), and metastasis-associated lung adenocarcinoma transcript 1 (MALAT1), a long intergenic nonprotein coding RNA, and tumor suppression (FOXP1) for EMT, and TXNIP, GAPDH, PKM2, IFIT5, LDHA, NID1, and TPP1 for burn injury.
Wang, Wei; Xia, Minxuan; Chen, Jie; Deng, Fenni; Yuan, Rui; Zhang, Xiaopei; Shen, Fafu
2016-12-01
The data presented in this paper is supporting the research article "Genome-Wide Analysis of Superoxide Dismutase Gene Family in Gossypium raimondii and G. arboreum" [1]. In this data article, we present phylogenetic tree showing dichotomy with two different clusters of SODs inferred by the Bayesian method of MrBayes (version 3.2.4), "Bayesian phylogenetic inference under mixed models" [2], Ramachandran plots of G. raimondii and G. arboreum SODs, the protein sequence used to generate 3D sructure of proteins and the template accession via SWISS-MODEL server, "SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information." [3] and motif sequences of SODs identified by InterProScan (version 4.8) with the Pfam database, "Pfam: the protein families database" [4].
2013-01-01
Background Several β-galactosidases of the Glycosyl Hydrolase 35 (GH35) family have been characterized, and many of these modify cell wall components, including pectins, xyloglucans, and arabinogalactan proteins. The phloem fibres of flax (Linum usitatissimum) have gelatinous-type cell walls that are rich in crystalline cellulose and depend on β-galactosidase activity for their normal development. In this study, we investigate the transcript expression patterns and inferred evolutionary relationships of the complete set of flax GH35 genes, to better understand the functions of these genes in flax and other species. Results Using the recently published flax genome assembly, we identified 43 β-galactosidase-like (BGAL) genes, based on the presence of a GH35 domain. Phylogenetic analyses of their protein sequences clustered them into eight sub-families. Sub-family B, whose members in other species were known to be expressed in developing flowers and pollen, was greatly under represented in flax (p-value < 0.01). Sub-family A5, whose sole member from arabidopsis has been described as its primary xyloglucan BGAL, was greatly expanded in flax (p-value < 0.01). A number of flax BGALs were also observed to contain non-consensus GH35 active sites. Expression patterns of the flax BGALs were investigated using qRT-PCR and publicly available microarray data. All predicted flax BGALs showed evidence of expression in at least one tissue. Conclusion Flax has a large number of BGAL genes, which display a distinct distribution among the BGAL sub-families, in comparison to other closely related species with available whole genome assemblies. Almost every flax BGAL was expressed in fibres, the majority of which expressed predominately in fibres as compared to other tissues, suggesting an important role for the expansion of this gene family in the development of this species as a fibre crop. Variations displayed in the canonical GH35 active site suggest a variety of roles unique to flax, which will require further characterization. PMID:23701735
Hobson, Neil; Deyholos, Michael K
2013-05-23
Several β-galactosidases of the Glycosyl Hydrolase 35 (GH35) family have been characterized, and many of these modify cell wall components, including pectins, xyloglucans, and arabinogalactan proteins. The phloem fibres of flax (Linum usitatissimum) have gelatinous-type cell walls that are rich in crystalline cellulose and depend on β-galactosidase activity for their normal development. In this study, we investigate the transcript expression patterns and inferred evolutionary relationships of the complete set of flax GH35 genes, to better understand the functions of these genes in flax and other species. Using the recently published flax genome assembly, we identified 43 β-galactosidase-like (BGAL) genes, based on the presence of a GH35 domain. Phylogenetic analyses of their protein sequences clustered them into eight sub-families. Sub-family B, whose members in other species were known to be expressed in developing flowers and pollen, was greatly under represented in flax (p-value < 0.01). Sub-family A5, whose sole member from arabidopsis has been described as its primary xyloglucan BGAL, was greatly expanded in flax (p-value < 0.01). A number of flax BGALs were also observed to contain non-consensus GH35 active sites. Expression patterns of the flax BGALs were investigated using qRT-PCR and publicly available microarray data. All predicted flax BGALs showed evidence of expression in at least one tissue. Flax has a large number of BGAL genes, which display a distinct distribution among the BGAL sub-families, in comparison to other closely related species with available whole genome assemblies. Almost every flax BGAL was expressed in fibres, the majority of which expressed predominately in fibres as compared to other tissues, suggesting an important role for the expansion of this gene family in the development of this species as a fibre crop. Variations displayed in the canonical GH35 active site suggest a variety of roles unique to flax, which will require further characterization.
NASA Technical Reports Server (NTRS)
Mjolsness, Eric; Castano, Rebecca; Mann, Tobias; Wold, Barbara
2000-01-01
We provide preliminary evidence that existing algorithms for inferring small-scale gene regulation networks from gene expression data can be adapted to large-scale gene expression data coming from hybridization microarrays. The essential steps are (I) clustering many genes by their expression time-course data into a minimal set of clusters of co-expressed genes, (2) theoretically modeling the various conditions under which the time-courses are measured using a continuous-time analog recurrent neural network for the cluster mean time-courses, (3) fitting such a regulatory model to the cluster mean time courses by simulated annealing with weight decay, and (4) analysing several such fits for commonalities in the circuit parameter sets including the connection matrices. This procedure can be used to assess the adequacy of existing and future gene expression time-course data sets for determining transcriptional regulatory relationships such as coregulation.
RAIN: RNA–protein Association and Interaction Networks
Junge, Alexander; Refsgaard, Jan C.; Garde, Christian; Pan, Xiaoyong; Santos, Alberto; Alkan, Ferhat; Anthon, Christian; von Mering, Christian; Workman, Christopher T.; Jensen, Lars Juhl; Gorodkin, Jan
2017-01-01
Protein association networks can be inferred from a range of resources including experimental data, literature mining and computational predictions. These types of evidence are emerging for non-coding RNAs (ncRNAs) as well. However, integration of ncRNAs into protein association networks is challenging due to data heterogeneity. Here, we present a database of ncRNA–RNA and ncRNA–protein interactions and its integration with the STRING database of protein–protein interactions. These ncRNA associations cover four organisms and have been established from curated examples, experimental data, interaction predictions and automatic literature mining. RAIN uses an integrative scoring scheme to assign a confidence score to each interaction. We demonstrate that RAIN outperforms the underlying microRNA-target predictions in inferring ncRNA interactions. RAIN can be operated through an easily accessible web interface and all interaction data can be downloaded. Database URL: http://rth.dk/resources/rain PMID:28077569
Xu, Yungang; Guo, Maozu; Zou, Quan; Liu, Xiaoyan; Wang, Chunyu; Liu, Yang
2014-01-01
Cellular interactome, in which genes and/or their products interact on several levels, forming transcriptional regulatory-, protein interaction-, metabolic-, signal transduction networks, etc., has attracted decades of research focuses. However, such a specific type of network alone can hardly explain the various interactive activities among genes. These networks characterize different interaction relationships, implying their unique intrinsic properties and defects, and covering different slices of biological information. Functional gene network (FGN), a consolidated interaction network that models fuzzy and more generalized notion of gene-gene relations, have been proposed to combine heterogeneous networks with the goal of identifying functional modules supported by multiple interaction types. There are yet no successful precedents of FGNs on sparsely studied non-model organisms, such as soybean (Glycine max), due to the absence of sufficient heterogeneous interaction data. We present an alternative solution for inferring the FGNs of soybean (SoyFGNs), in a pioneering study on the soybean interactome, which is also applicable to other organisms. SoyFGNs exhibit the typical characteristics of biological networks: scale-free, small-world architecture and modularization. Verified by co-expression and KEGG pathways, SoyFGNs are more extensive and accurate than an orthology network derived from Arabidopsis. As a case study, network-guided disease-resistance gene discovery indicates that SoyFGNs can provide system-level studies on gene functions and interactions. This work suggests that inferring and modelling the interactome of a non-model plant are feasible. It will speed up the discovery and definition of the functions and interactions of other genes that control important functions, such as nitrogen fixation and protein or lipid synthesis. The efforts of the study are the basis of our further comprehensive studies on the soybean functional interactome at the genome and microRNome levels. Additionally, a web tool for information retrieval and analysis of SoyFGNs can be accessed at SoyFN: http://nclab.hit.edu.cn/SoyFN.
Xu, Yungang; Guo, Maozu; Zou, Quan; Liu, Xiaoyan; Wang, Chunyu; Liu, Yang
2014-01-01
Cellular interactome, in which genes and/or their products interact on several levels, forming transcriptional regulatory-, protein interaction-, metabolic-, signal transduction networks, etc., has attracted decades of research focuses. However, such a specific type of network alone can hardly explain the various interactive activities among genes. These networks characterize different interaction relationships, implying their unique intrinsic properties and defects, and covering different slices of biological information. Functional gene network (FGN), a consolidated interaction network that models fuzzy and more generalized notion of gene-gene relations, have been proposed to combine heterogeneous networks with the goal of identifying functional modules supported by multiple interaction types. There are yet no successful precedents of FGNs on sparsely studied non-model organisms, such as soybean (Glycine max), due to the absence of sufficient heterogeneous interaction data. We present an alternative solution for inferring the FGNs of soybean (SoyFGNs), in a pioneering study on the soybean interactome, which is also applicable to other organisms. SoyFGNs exhibit the typical characteristics of biological networks: scale-free, small-world architecture and modularization. Verified by co-expression and KEGG pathways, SoyFGNs are more extensive and accurate than an orthology network derived from Arabidopsis. As a case study, network-guided disease-resistance gene discovery indicates that SoyFGNs can provide system-level studies on gene functions and interactions. This work suggests that inferring and modelling the interactome of a non-model plant are feasible. It will speed up the discovery and definition of the functions and interactions of other genes that control important functions, such as nitrogen fixation and protein or lipid synthesis. The efforts of the study are the basis of our further comprehensive studies on the soybean functional interactome at the genome and microRNome levels. Additionally, a web tool for information retrieval and analysis of SoyFGNs can be accessed at SoyFN: http://nclab.hit.edu.cn/SoyFN. PMID:25423109
Zhou, Wei; Zhang, Yan; Li, Yue-Hua; Wang, Shuang; Zhang, Jing-Jing; Zhang, Cui-Xia; Zhang, Zhi-Sheng
2017-02-01
This work aimed to identify dysregulated pathways for Staphylococcus aureus (SA) exposed macrophages based on pathway interaction network (PIN). The inference of dysregulated pathways was comprised of four steps: preparing gene expression data, protein-protein interaction (PPI) data and pathway data; constructing a PIN dependent on the data and Pearson correlation coefficient (PCC); selecting seed pathway from PIN by computing activity score for each pathway according to principal component analysis (PCA) method; and investigating dysregulated pathways in a minimum set of pathways (MSP) utilizing seed pathway and the area under the receiver operating characteristics curve (AUC) index implemented in support vector machines (SVM) model. A total of 20,545 genes, 449,833 interactions and 1189 pathways were obtained in the gene expression data, PPI data and pathway data, respectively. The PIN was consisted of 8388 interactions and 1189 nodes, and Respiratory electron transport, ATP synthesis by chemiosmotic coupling, and heat production by uncoupling proteins was identified as the seed pathway. Finally, 15 dysregulated pathways in MSP (AUC=0.999) were obtained for SA infected samples, such as Respiratory electron transport and DNA Replication. We have identified 15 dysregulated pathways for SA infected macrophages based on PIN. The findings might provide potential biomarkers for early detection and therapy of SA infection, and give insights to reveal the molecular mechanism underlying SA infections. However, how these dysregulated pathways worked together still needs to be studied. Copyright © 2016 Elsevier Ltd. All rights reserved.
Multiple hot-deck imputation for network inference from RNA sequencing data.
Imbert, Alyssa; Valsesia, Armand; Le Gall, Caroline; Armenise, Claudia; Lefebvre, Gregory; Gourraud, Pierre-Antoine; Viguerie, Nathalie; Villa-Vialaneix, Nathalie
2018-05-15
Network inference provides a global view of the relations existing between gene expression in a given transcriptomic experiment (often only for a restricted list of chosen genes). However, it is still a challenging problem: even if the cost of sequencing techniques has decreased over the last years, the number of samples in a given experiment is still (very) small compared to the number of genes. We propose a method to increase the reliability of the inference when RNA-seq expression data have been measured together with an auxiliary dataset that can provide external information on gene expression similarity between samples. Our statistical approach, hd-MI, is based on imputation for samples without available RNA-seq data that are considered as missing data but are observed on the secondary dataset. hd-MI can improve the reliability of the inference for missing rates up to 30% and provides more stable networks with a smaller number of false positive edges. On a biological point of view, hd-MI was also found relevant to infer networks from RNA-seq data acquired in adipose tissue during a nutritional intervention in obese individuals. In these networks, novel links between genes were highlighted, as well as an improved comparability between the two steps of the nutritional intervention. Software and sample data are available as an R package, RNAseqNet, that can be downloaded from the Comprehensive R Archive Network (CRAN). alyssa.imbert@inra.fr or nathalie.villa-vialaneix@inra.fr. Supplementary data are available at Bioinformatics online.
Positive selection sites in tertiary structure of Leguminosae chalcone isomerase 1.
Wang, R K; Zhan, S F; Zhao, T J; Zhou, X L; Wang, C E
2015-03-20
Isoflavonoids and the related synthesis enzyme, chalcone isomerase 1 (CHI1), are unique in the Leguminosae, with diverse biological functions. Among the Leguminosae, the soybean is an important oil, protein crop, and model plant. In this study, we aimed to detect the generation pattern of Leguminosae CHI1. Genome-wide sequence analysis of CHI in 3 Leguminosae and 3 other closely related model plants was performed; the expression levels of soybean chalcone isomerases were also analyzed. By comparing positively selected sites and their protein structures, we retrieved the evolution patterns for Leguminosae CHI1. A total of 28 CHI and 7 FAP3 (CHI4) genes were identified and separated into 4 clades: CHI1, CHI2, CHI3, and FAP3. Soybean genes belonging to the same chalcone isomerase subfamily had similar expression patterns. CHI1, the unique chalcone isomerase subfamily in Leguminosae, showed signs of significant positive selection as well as special expression characteristics, indicating an accelerated evolution throughout its divergence. Eight sites were identified as undergoing positive selection with high confidence. When mapped onto the tertiary structure of CHI1, these 8 sites were observed surrounding the enzyme substrate only; some of them connected to the catalytic core of CHI. Thus, we inferred that the generation of Leguminosae CHI1 is dependent on the positively selected amino acids surrounding its catalytic substrate. In other words, the evolution of CHI1 was driven by specific selection or processing conditions within the substrate.
Aye, Tin Tin; Shim, Jae-Kyoung; Rhee, In-Koo; Lee, Kyeong-Yeoll
2008-08-01
Expression of hemolin, which generates an immune protein, was up-regulated in wandering fifth instar larval stage of Plodia interpunctella. The mRNA level peaked in the middle of the wandering stage. Major expression was in the epidermis, rather than in the fat body or gut. To test a possible ecdysteroid effect on hemolin induction we treated with RH-5992, an ecdysteroid agonist, and KK-42, which inhibits ecdysteroid biosynthesis in both feeding and wandering fifth instar larvae. When feeding larvae were treated with RH-5992 the hemolin mRNA level was increased. When wandering larvae were treated with KK-42 its level was reduced. In addition, when KK-42-treated larvae were subsequently treated with RH-5992 the hemolin mRNA level was recovered. These results strongly suggest that ecdysteroid up-regulates the expression of hemolin mRNA. Hormonal and bacterial effects on hemolin induction were further analyzed at the tissue level. Major induction of hemolin mRNA was detected following both RH-5992 treatment and bacterial injection in the epidermis of both feeding and wandering larvae. Minor induction of hemolin was detected in the fat body following a bacterial injection, but not RH-5992 treatment. We infer that in P. interpunctella larvae, the epidermis is the major tissue for hemolin induction in naïve insects and in insects manipulated with bacterial and hormonal treatments.
Furchtgott, Leon A; Melton, Samuel; Menon, Vilas; Ramanathan, Sharad
2017-01-01
Computational analysis of gene expression to determine both the sequence of lineage choices made by multipotent cells and to identify the genes influencing these decisions is challenging. Here we discover a pattern in the expression levels of a sparse subset of genes among cell types in B- and T-cell developmental lineages that correlates with developmental topologies. We develop a statistical framework using this pattern to simultaneously infer lineage transitions and the genes that determine these relationships. We use this technique to reconstruct the early hematopoietic and intestinal developmental trees. We extend this framework to analyze single-cell RNA-seq data from early human cortical development, inferring a neocortical-hindbrain split in early progenitor cells and the key genes that could control this lineage decision. Our work allows us to simultaneously infer both the identity and lineage of cell types as well as a small set of key genes whose expression patterns reflect these relationships. DOI: http://dx.doi.org/10.7554/eLife.20488.001 PMID:28296636
Genetic Network Inference: From Co-Expression Clustering to Reverse Engineering
NASA Technical Reports Server (NTRS)
Dhaeseleer, Patrik; Liang, Shoudan; Somogyi, Roland
2000-01-01
Advances in molecular biological, analytical, and computational technologies are enabling us to systematically investigate the complex molecular processes underlying biological systems. In particular, using high-throughput gene expression assays, we are able to measure the output of the gene regulatory network. We aim here to review datamining and modeling approaches for conceptualizing and unraveling the functional relationships implicit in these datasets. Clustering of co-expression profiles allows us to infer shared regulatory inputs and functional pathways. We discuss various aspects of clustering, ranging from distance measures to clustering algorithms and multiple-duster memberships. More advanced analysis aims to infer causal connections between genes directly, i.e., who is regulating whom and how. We discuss several approaches to the problem of reverse engineering of genetic networks, from discrete Boolean networks, to continuous linear and non-linear models. We conclude that the combination of predictive modeling with systematic experimental verification will be required to gain a deeper insight into living organisms, therapeutic targeting, and bioengineering.
Is pride a prosocial emotion? Interpersonal effects of authentic and hubristic pride.
Wubben, Maarten J J; De Cremer, David; van Dijk, Eric
2012-01-01
Pride is associated with both prosocial and antisocial behaviour. Do others also infer such behaviours when pride is expressed and does this affect their own prosocial behaviour? We expected that authentic pride (i.e., confidence, accomplishment) would signal and elicit more prosocial behaviour than hubristic pride (i.e., arrogance, conceit). In a first laboratory experiment, a target in a public-good dilemma was inferred to have acted less prosocially when displaying a nonverbal expression of pride versus no emotion. As predicted, inferences of hubristic pride-but not authentic pride-mediated this effect. Participants themselves also responded less prosocially. A second laboratory experiment where a target verbally expressed authentic pride, hubristic pride, or no emotion replicated the effects of hubristic pride and showed that authentically proud targets were assumed to have acted prosocially, but especially by perceivers with a dispositional tendency to take the perspective of others. We conclude that authentic pride is generally perceived as a more prosocial emotion than hubristic pride.
Bickel, David R.; Montazeri, Zahra; Hsieh, Pei-Chun; Beatty, Mary; Lawit, Shai J.; Bate, Nicholas J.
2009-01-01
Motivation: Measurements of gene expression over time enable the reconstruction of transcriptional networks. However, Bayesian networks and many other current reconstruction methods rely on assumptions that conflict with the differential equations that describe transcriptional kinetics. Practical approximations of kinetic models would enable inferring causal relationships between genes from expression data of microarray, tag-based and conventional platforms, but conclusions are sensitive to the assumptions made. Results: The representation of a sufficiently large portion of genome enables computation of an upper bound on how much confidence one may place in influences between genes on the basis of expression data. Information about which genes encode transcription factors is not necessary but may be incorporated if available. The methodology is generalized to cover cases in which expression measurements are missing for many of the genes that might control the transcription of the genes of interest. The assumption that the gene expression level is roughly proportional to the rate of translation led to better empirical performance than did either the assumption that the gene expression level is roughly proportional to the protein level or the Bayesian model average of both assumptions. Availability: http://www.oisb.ca points to R code implementing the methods (R Development Core Team 2004). Contact: dbickel@uottawa.ca Supplementary information: http://www.davidbickel.com PMID:19218351
Guo, Xiaobo; Zhang, Ye; Hu, Wenhao; Tan, Haizhu; Wang, Xueqin
2014-01-01
Nonlinear dependence is general in regulation mechanism of gene regulatory networks (GRNs). It is vital to properly measure or test nonlinear dependence from real data for reconstructing GRNs and understanding the complex regulatory mechanisms within the cellular system. A recently developed measurement called the distance correlation (DC) has been shown powerful and computationally effective in nonlinear dependence for many situations. In this work, we incorporate the DC into inferring GRNs from the gene expression data without any underling distribution assumptions. We propose three DC-based GRNs inference algorithms: CLR-DC, MRNET-DC and REL-DC, and then compare them with the mutual information (MI)-based algorithms by analyzing two simulated data: benchmark GRNs from the DREAM challenge and GRNs generated by SynTReN network generator, and an experimentally determined SOS DNA repair network in Escherichia coli. According to both the receiver operator characteristic (ROC) curve and the precision-recall (PR) curve, our proposed algorithms significantly outperform the MI-based algorithms in GRNs inference.
Inferring Nonlinear Gene Regulatory Networks from Gene Expression Data Based on Distance Correlation
Guo, Xiaobo; Zhang, Ye; Hu, Wenhao; Tan, Haizhu; Wang, Xueqin
2014-01-01
Nonlinear dependence is general in regulation mechanism of gene regulatory networks (GRNs). It is vital to properly measure or test nonlinear dependence from real data for reconstructing GRNs and understanding the complex regulatory mechanisms within the cellular system. A recently developed measurement called the distance correlation (DC) has been shown powerful and computationally effective in nonlinear dependence for many situations. In this work, we incorporate the DC into inferring GRNs from the gene expression data without any underling distribution assumptions. We propose three DC-based GRNs inference algorithms: CLR-DC, MRNET-DC and REL-DC, and then compare them with the mutual information (MI)-based algorithms by analyzing two simulated data: benchmark GRNs from the DREAM challenge and GRNs generated by SynTReN network generator, and an experimentally determined SOS DNA repair network in Escherichia coli. According to both the receiver operator characteristic (ROC) curve and the precision-recall (PR) curve, our proposed algorithms significantly outperform the MI-based algorithms in GRNs inference. PMID:24551058
Zhou, Haotian; Majka, Elizabeth A; Epley, Nicholas
2017-04-01
People use at least two strategies to solve the challenge of understanding another person's mind: inferring that person's perspective by reading his or her behavior (theorization) and getting that person's perspective by experiencing his or her situation (simulation). The five experiments reported here demonstrate a strong tendency for people to underestimate the value of simulation. Predictors estimated a stranger's emotional reactions toward 50 pictures. They could either infer the stranger's perspective by reading his or her facial expressions or simulate the stranger's perspective by watching the pictures he or she viewed. Predictors were substantially more accurate when they got perspective through simulation, but overestimated the accuracy they had achieved by inferring perspective. Predictors' miscalibrated confidence stemmed from overestimating the information revealed through facial expressions and underestimating the similarity in people's reactions to a given situation. People seem to underappreciate a useful strategy for understanding the minds of others, even after they gain firsthand experience with both strategies.
Chen, Y-F; Chiu, H-H; Wu, C-H; Wang, J-Y; Chen, F-M; Tzou, W-H; Shin, S-J; Lin, S-R
2003-10-01
Our previous studies have shown that the cell proliferation rate, mRNA levels of p450scc, p450c17, and 3betaHSD, and secretion of cortisol were significantly increased in human adrenocortical cells stably transfected with mutated K-ras expression plasmid "pK568MRSV" after being inducted with IPTG. In addition, the increased level was a time-dependent manner. However, the levels of p450, p450scc, p450c17, 3betaHSD, cortisol, and cell proliferation rate were inhibited by a MEK phospholation inhibitor, PD098059. The above results prove that mutated K-ras oncogene is able to regulate tumorigenesis and steroidogenesis through a Ras-RAF-MEK-MAPK signal transduction pathway. The aim of this study was to investigate regulated factors in this pathway and also examine whether the other signal transduction pathways or other moles involved in tumorigenesis or steroidogenesis. In the first year, we analyzed gene profiles of mutant K-ras-transfected adrenocortical cells by DNA microarray to determine the gene expression related to cell cycle, signal transduction, apoptosis, tumorigenesis, steroidogenesis, and other expressed sequence tag. After being affected by the K-ras mutant, gene expression was significantly increased in some upregulated genes. Human zinc-finger protein 22 increased by 28.5 times, Osteopontin increased by 5.8 times, LIM domain Kinase 2 (LIMK2) increased by 3.3 times, Homo sapiens dual-specificity tyrosine-(Y)-phosphorylation regulated Kinase 2 (DYRK2) increased by 2.2 times, and human syntaxin 3 increased by two times. On the other hand, significant decreases in gene expression were also observed in some downregulated genes. Retinoblastoma binding protein 1 (RBBP1) decreased by four times, Homo sapiens craniofacial development protein 1 (CFDP1) decreased by 2.4 times, DAP Kinase-related apoptosis-inducing protein Kinase 1 (DRAK1) decreased by 2.3 times, SKI-interacting protein (SKIP) decreased by 2.2 times, and human poly(A)-Binding protein (PABP) decreased by 2.1 times. In all significant differentially expressed genes, preliminary analysis by bioinformatics revealed that after induced K-ras mutant expression by isopropyl thiogalctoside (IPTG), the downregulation of RBBP1 gene was most correlated to cell proliferation. RBBP1 can bind with RB/E2F to form a mSIN3-HDAC complex, which induces cell cycle arrest in the G1/G0 stage by repressing transcription of E2F-regulated genes. The result of a Northern blot showed that RBBP1 were inhibited after an induction of IPTG for 36 h. Another Northern blot analysis proved that mRNA levels of cyclin D1 and c-myc increased in proportion to K-ras expression. Finally, Western blot was carried out, and the results showed that phosphorylated pRB also increased. Taken together, we infer that the mutant K-ras oncogene promoted the cells to proceed to the G1/S stage by the inhibiting the formation of RB/RBBP1-dependent repressor complex from binding with the SIN3-HDAC complex, which resulted in the acetylation of histone to active transcription of E2F-regulated genes. However, the roles of the other differentially expressed genes involved in cell proliferation, cell morphologic change, tumorigenesis, or steroidogenesis still need further investigation.
Behdani, Elham; Bakhtiarizadeh, Mohammad Reza
2017-10-01
The immune system is an important biological system that is negatively impacted by stress. This study constructed an integrated regulatory network to enhance our understanding of the regulatory gene network used in the stress-related immune system. Module inference was used to construct modules of co-expressed genes with bovine leukocyte RNA-Seq data. Transcription factors (TFs) were then assigned to these modules using Lemon-Tree algorithms. In addition, the TFs assigned to each module were confirmed using the promoter analysis and protein-protein interactions data. Therefore, our integrated method identified three TFs which include one TF that is previously known to be involved in immune response (MYBL2) and two TFs (E2F8 and FOXS1) that had not been recognized previously and were identified for the first time in this study as novel regulatory candidates in immune response. This study provides valuable insights on the regulatory programs of genes involved in the stress-related immune system.
The merged basins of signal transduction pathways in spatiotemporal cell biology.
Hou, Yingchun; Hou, Yang; He, Siyu; Ma, Caixia; Sun, Mengyao; He, Huimin; Gao, Ning
2014-03-01
Numerous evidences have indicated that a signal system is composed by signal pathways, each pathway is composed by sub-pathways, and the sub-pathway is composed by the original signal terminals initiated with a protein/gene. We infer the terminal signals merged signal transduction system as "signal basin". In this article, we discussed the composition and regulation of signal basins, and the relationship between the signal basin control and triple W of spatiotemporal cell biology. Finally, we evaluated the importance of the systemic regulation to gene expression by signal basins under triple W. We hope our discussion will be the beginning to cause the attention for this area from the scientists of life science. © 2013 Wiley Periodicals, Inc.
USDA-ARS?s Scientific Manuscript database
The role of PROTEIN ISOASPARTYL-METHYLTRANSFERASE (PIMT) in repairing a wide assortment of damaged proteins in a host of organisms has been inferred from the affinity of the enzyme for isoaspartyl residues in a plethora of amino acid contexts. The identification of specific PIMT target proteins in p...
Nordström, Henrik; Laukka, Petri; Thingujam, Nutankumar S; Schubert, Emery; Elfenbein, Hillary Anger
2017-11-01
This study explored the perception of emotion appraisal dimensions on the basis of speech prosody in a cross-cultural setting. Professional actors from Australia and India vocally portrayed different emotions (anger, fear, happiness, pride, relief, sadness, serenity and shame) by enacting emotion-eliciting situations. In a balanced design, participants from Australia and India then inferred aspects of the emotion-eliciting situation from the vocal expressions, described in terms of appraisal dimensions (novelty, intrinsic pleasantness, goal conduciveness, urgency, power and norm compatibility). Bayesian analyses showed that the perceived appraisal profiles for the vocally expressed emotions were generally consistent with predictions based on appraisal theories. Few group differences emerged, which suggests that the perceived appraisal profiles are largely universal. However, some differences between Australian and Indian participants were also evident, mainly for ratings of norm compatibility. The appraisal ratings were further correlated with a variety of acoustic measures in exploratory analyses, and inspection of the acoustic profiles suggested similarity across groups. In summary, results showed that listeners may infer several aspects of emotion-eliciting situations from the non-verbal aspects of a speaker's voice. These appraisal inferences also seem to be relatively independent of the cultural background of the listener and the speaker.
Thingujam, Nutankumar S.; Schubert, Emery
2017-01-01
This study explored the perception of emotion appraisal dimensions on the basis of speech prosody in a cross-cultural setting. Professional actors from Australia and India vocally portrayed different emotions (anger, fear, happiness, pride, relief, sadness, serenity and shame) by enacting emotion-eliciting situations. In a balanced design, participants from Australia and India then inferred aspects of the emotion-eliciting situation from the vocal expressions, described in terms of appraisal dimensions (novelty, intrinsic pleasantness, goal conduciveness, urgency, power and norm compatibility). Bayesian analyses showed that the perceived appraisal profiles for the vocally expressed emotions were generally consistent with predictions based on appraisal theories. Few group differences emerged, which suggests that the perceived appraisal profiles are largely universal. However, some differences between Australian and Indian participants were also evident, mainly for ratings of norm compatibility. The appraisal ratings were further correlated with a variety of acoustic measures in exploratory analyses, and inspection of the acoustic profiles suggested similarity across groups. In summary, results showed that listeners may infer several aspects of emotion-eliciting situations from the non-verbal aspects of a speaker's voice. These appraisal inferences also seem to be relatively independent of the cultural background of the listener and the speaker. PMID:29291085
DMirNet: Inferring direct microRNA-mRNA association networks.
Lee, Minsu; Lee, HyungJune
2016-12-05
MicroRNAs (miRNAs) play important regulatory roles in the wide range of biological processes by inducing target mRNA degradation or translational repression. Based on the correlation between expression profiles of a miRNA and its target mRNA, various computational methods have previously been proposed to identify miRNA-mRNA association networks by incorporating the matched miRNA and mRNA expression profiles. However, there remain three major issues to be resolved in the conventional computation approaches for inferring miRNA-mRNA association networks from expression profiles. 1) Inferred correlations from the observed expression profiles using conventional correlation-based methods include numerous erroneous links or over-estimated edge weight due to the transitive information flow among direct associations. 2) Due to the high-dimension-low-sample-size problem on the microarray dataset, it is difficult to obtain an accurate and reliable estimate of the empirical correlations between all pairs of expression profiles. 3) Because the previously proposed computational methods usually suffer from varying performance across different datasets, a more reliable model that guarantees optimal or suboptimal performance across different datasets is highly needed. In this paper, we present DMirNet, a new framework for identifying direct miRNA-mRNA association networks. To tackle the aforementioned issues, DMirNet incorporates 1) three direct correlation estimation methods (namely Corpcor, SPACE, Network deconvolution) to infer direct miRNA-mRNA association networks, 2) the bootstrapping method to fully utilize insufficient training expression profiles, and 3) a rank-based Ensemble aggregation to build a reliable and robust model across different datasets. Our empirical experiments on three datasets demonstrate the combinatorial effects of necessary components in DMirNet. Additional performance comparison experiments show that DMirNet outperforms the state-of-the-art Ensemble-based model [1] which has shown the best performance across the same three datasets, with a factor of up to 1.29. Further, we identify 43 putative novel multi-cancer-related miRNA-mRNA association relationships from an inferred Top 1000 direct miRNA-mRNA association network. We believe that DMirNet is a promising method to identify novel direct miRNA-mRNA relations and to elucidate the direct miRNA-mRNA association networks. Since DMirNet infers direct relationships from the observed data, DMirNet can contribute to reconstructing various direct regulatory pathways, including, but not limited to, the direct miRNA-mRNA association networks.
Towards Inferring Protein Interactions: Challenges and Solutions
NASA Astrophysics Data System (ADS)
Zhang, Ya; Zha, Hongyuan; Chu, Chao-Hsien; Ji, Xiang
2006-12-01
Discovering interacting proteins has been an essential part of functional genomics. However, existing experimental techniques only uncover a small portion of any interactome. Furthermore, these data often have a very high false rate. By conceptualizing the interactions at domain level, we provide a more abstract representation of interactome, which also facilitates the discovery of unobserved protein-protein interactions. Although several domain-based approaches have been proposed to predict protein-protein interactions, they usually assume that domain interactions are independent on each other for the convenience of computational modeling. A new framework to predict protein interactions is proposed in this paper, where no assumption is made about domain interactions. Protein interactions may be the result of multiple domain interactions which are dependent on each other. A conjunctive norm form representation is used to capture the relationships between protein interactions and domain interactions. The problem of interaction inference is then modeled as a constraint satisfiability problem and solved via linear programing. Experimental results on a combined yeast data set have demonstrated the robustness and the accuracy of the proposed algorithm. Moreover, we also map some predicted interacting domains to three-dimensional structures of protein complexes to show the validity of our predictions.
Expressing pride: Effects on perceived agency, communality, and stereotype-based gender disparities.
Brosi, Prisca; Spörrle, Matthias; Welpe, Isabell M; Heilman, Madeline E
2016-09-01
Two experimental studies were conducted to investigate how the expression of pride shapes agency-related and communality-related judgments, and how those judgments differ when the pride expresser is a man or a woman. Results indicated that the expression of pride (as compared to the expression of happiness) had positive effects on perceptions of agency and inferences about task-oriented leadership competence, and negative effects on perceptions of communality and inferences about people-oriented leadership competence. Pride expression also elevated ascriptions of interpersonal hostility. For agency-related judgments and ascriptions of interpersonal hostility, these effects were consistently stronger when the pride expresser was a woman than a man. Moreover, the expression of pride was found to affect disparities in judgments about men and women, eliminating the stereotype-consistent differences that were evident when happiness was expressed. With a display of pride women were not seen as any more deficient in agency-related attributes and competencies, nor were they seen as any more exceptional in communality-related attributes and competencies, than were men. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Godsey, Brian; Heiser, Diane; Civin, Curt
2012-01-01
MicroRNAs (miRs) are known to play an important role in mRNA regulation, often by binding to complementary sequences in "target" mRNAs. Recently, several methods have been developed by which existing sequence-based target predictions can be combined with miR and mRNA expression data to infer true miR-mRNA targeting relationships. It has been shown that the combination of these two approaches gives more reliable results than either by itself. While a few such algorithms give excellent results, none fully addresses expression data sets with a natural ordering of the samples. If the samples in an experiment can be ordered or partially ordered by their expected similarity to one another, such as for time-series or studies of development processes, stages, or types, (e.g. cell type, disease, growth, aging), there are unique opportunities to infer miR-mRNA interactions that may be specific to the underlying processes, and existing methods do not exploit this. We propose an algorithm which specifically addresses [partially] ordered expression data and takes advantage of sample similarities based on the ordering structure. This is done within a Bayesian framework which specifies posterior distributions and therefore statistical significance for each model parameter and latent variable. We apply our model to a previously published expression data set of paired miR and mRNA arrays in five partially ordered conditions, with biological replicates, related to multiple myeloma, and we show how considering potential orderings can improve the inference of miR-mRNA interactions, as measured by existing knowledge about the involved transcripts.
Liu, Li-Zhi; Wu, Fang-Xiang; Zhang, Wen-Jun
2014-01-01
As an abstract mapping of the gene regulations in the cell, gene regulatory network is important to both biological research study and practical applications. The reverse engineering of gene regulatory networks from microarray gene expression data is a challenging research problem in systems biology. With the development of biological technologies, multiple time-course gene expression datasets might be collected for a specific gene network under different circumstances. The inference of a gene regulatory network can be improved by integrating these multiple datasets. It is also known that gene expression data may be contaminated with large errors or outliers, which may affect the inference results. A novel method, Huber group LASSO, is proposed to infer the same underlying network topology from multiple time-course gene expression datasets as well as to take the robustness to large error or outliers into account. To solve the optimization problem involved in the proposed method, an efficient algorithm which combines the ideas of auxiliary function minimization and block descent is developed. A stability selection method is adapted to our method to find a network topology consisting of edges with scores. The proposed method is applied to both simulation datasets and real experimental datasets. It shows that Huber group LASSO outperforms the group LASSO in terms of both areas under receiver operating characteristic curves and areas under the precision-recall curves. The convergence analysis of the algorithm theoretically shows that the sequence generated from the algorithm converges to the optimal solution of the problem. The simulation and real data examples demonstrate the effectiveness of the Huber group LASSO in integrating multiple time-course gene expression datasets and improving the resistance to large errors or outliers.
Prophetic Granger Causality to infer gene regulatory networks.
Carlin, Daniel E; Paull, Evan O; Graim, Kiley; Wong, Christopher K; Bivol, Adrian; Ryabinin, Peter; Ellrott, Kyle; Sokolov, Artem; Stuart, Joshua M
2017-01-01
We introduce a novel method called Prophetic Granger Causality (PGC) for inferring gene regulatory networks (GRNs) from protein-level time series data. The method uses an L1-penalized regression adaptation of Granger Causality to model protein levels as a function of time, stimuli, and other perturbations. When combined with a data-independent network prior, the framework outperformed all other methods submitted to the HPN-DREAM 8 breast cancer network inference challenge. Our investigations reveal that PGC provides complementary information to other approaches, raising the performance of ensemble learners, while on its own achieves moderate performance. Thus, PGC serves as a valuable new tool in the bioinformatics toolkit for analyzing temporal datasets. We investigate the general and cell-specific interactions predicted by our method and find several novel interactions, demonstrating the utility of the approach in charting new tumor wiring.
Prophetic Granger Causality to infer gene regulatory networks
Carlin, Daniel E.; Paull, Evan O.; Graim, Kiley; Wong, Christopher K.; Bivol, Adrian; Ryabinin, Peter; Ellrott, Kyle; Sokolov, Artem
2017-01-01
We introduce a novel method called Prophetic Granger Causality (PGC) for inferring gene regulatory networks (GRNs) from protein-level time series data. The method uses an L1-penalized regression adaptation of Granger Causality to model protein levels as a function of time, stimuli, and other perturbations. When combined with a data-independent network prior, the framework outperformed all other methods submitted to the HPN-DREAM 8 breast cancer network inference challenge. Our investigations reveal that PGC provides complementary information to other approaches, raising the performance of ensemble learners, while on its own achieves moderate performance. Thus, PGC serves as a valuable new tool in the bioinformatics toolkit for analyzing temporal datasets. We investigate the general and cell-specific interactions predicted by our method and find several novel interactions, demonstrating the utility of the approach in charting new tumor wiring. PMID:29211761
Protein-based forensic identification using genetically variant peptides in human bone.
Mason, Katelyn Elizabeth; Anex, Deon; Grey, Todd; Hart, Bradley; Parker, Glendon
2018-04-22
Bone tissue contains organic material that is useful for forensic investigations and may contain preserved endogenous protein that can persist in the environment for extended periods of time over a range of conditions. Single amino acid polymorphisms in these proteins reflect genetic information since they result from non-synonymous single nucleotide polymorphisms (SNPs) in DNA. Detection of genetically variant peptides (GVPs) - those peptides that contain amino acid polymorphisms - in digests of bone proteins allows for the corresponding SNP alleles to be inferred. Resulting genetic profiles can be used to calculate statistical measures of association between a bone sample and an individual. In this study proteomic analysis on rib cortical bone samples from 10 recently deceased individuals demonstrates this concept. A straight-forward acidic demineralization protocol yielded proteins that were digested with trypsin. Tryptic digests were analyzed by liquid chromatography mass spectrometry. A total of 1736 different proteins were identified across all resulting datasets. On average, individual samples contained 454±121 (x¯±σ) proteins. Thirty-five genetically variant peptides were identified from 15 observed proteins. Overall, 134 SNP inferences were made based on proteomically detected GVPs, which were confirmed by sequencing of subject DNA. Inferred individual SNP genetic profiles ranged in random match probability (RMP) from 1/6 to 1/42,472 when calculated with European population frequencies in the 1000 Genomes Project, Phase 3. Similarly, RMPs based on African population frequencies were calculated for each SNP genetic profile and likelihood ratios (LR) were obtained by dividing each European RMP by the corresponding African RMP. Resulting LR values ranged from 1.4 to 825 with a median value of 16. GVP markers offer a basis for the identification of compromised skeletal remains independent of the presence of DNA template. Published by Elsevier B.V.
Svensson, Katrin J; Christianson, Helena C; Wittrup, Anders; Bourseau-Guilmain, Erika; Lindqvist, Eva; Svensson, Lena M; Mörgelin, Matthias; Belting, Mattias
2013-06-14
The role of exosomes in cancer can be inferred from the observation that they transfer tumor cell derived genetic material and signaling proteins, resulting in e.g. increased tumor angiogenesis and metastasis. However, the membrane transport mechanisms and the signaling events involved in the uptake of these virus-like particles remain ill-defined. We now report that internalization of exosomes derived from glioblastoma (GBM) cells involves nonclassical, lipid raft-dependent endocytosis. Importantly, we show that the lipid raft-associated protein caveolin-1 (CAV1), in analogy with its previously described role in virus uptake, negatively regulates the uptake of exosomes. We find that exosomes induce the phosphorylation of several downstream targets known to associate with lipid rafts as signaling and sorting platforms, such as extracellular signal-regulated kinase-1/2 (ERK1/2) and heat shock protein 27 (HSP27). Interestingly, exosome uptake appears dependent on unperturbed ERK1/2-HSP27 signaling, and ERK1/2 phosphorylation is under negative influence by CAV1 during internalization of exosomes. These findings significantly advance our general understanding of exosome-mediated uptake and offer potential strategies for how this pathway may be targeted through modulation of CAV1 expression and ERK1/2 signaling.
cDREM: inferring dynamic combinatorial gene regulation.
Wise, Aaron; Bar-Joseph, Ziv
2015-04-01
Genes are often combinatorially regulated by multiple transcription factors (TFs). Such combinatorial regulation plays an important role in development and facilitates the ability of cells to respond to different stresses. While a number of approaches have utilized sequence and ChIP-based datasets to study combinational regulation, these have often ignored the combinational logic and the dynamics associated with such regulation. Here we present cDREM, a new method for reconstructing dynamic models of combinatorial regulation. cDREM integrates time series gene expression data with (static) protein interaction data. The method is based on a hidden Markov model and utilizes the sparse group Lasso to identify small subsets of combinatorially active TFs, their time of activation, and the logical function they implement. We tested cDREM on yeast and human data sets. Using yeast we show that the predicted combinatorial sets agree with other high throughput genomic datasets and improve upon prior methods developed to infer combinatorial regulation. Applying cDREM to study human response to flu, we were able to identify several combinatorial TF sets, some of which were known to regulate immune response while others represent novel combinations of important TFs.
Nonparametric Inference of Doubly Stochastic Poisson Process Data via the Kernel Method
Zhang, Tingting; Kou, S. C.
2010-01-01
Doubly stochastic Poisson processes, also known as the Cox processes, frequently occur in various scientific fields. In this article, motivated primarily by analyzing Cox process data in biophysics, we propose a nonparametric kernel-based inference method. We conduct a detailed study, including an asymptotic analysis, of the proposed method, and provide guidelines for its practical use, introducing a fast and stable regression method for bandwidth selection. We apply our method to real photon arrival data from recent single-molecule biophysical experiments, investigating proteins' conformational dynamics. Our result shows that conformational fluctuation is widely present in protein systems, and that the fluctuation covers a broad range of time scales, highlighting the dynamic and complex nature of proteins' structure. PMID:21258615
Nonparametric Inference of Doubly Stochastic Poisson Process Data via the Kernel Method.
Zhang, Tingting; Kou, S C
2010-01-01
Doubly stochastic Poisson processes, also known as the Cox processes, frequently occur in various scientific fields. In this article, motivated primarily by analyzing Cox process data in biophysics, we propose a nonparametric kernel-based inference method. We conduct a detailed study, including an asymptotic analysis, of the proposed method, and provide guidelines for its practical use, introducing a fast and stable regression method for bandwidth selection. We apply our method to real photon arrival data from recent single-molecule biophysical experiments, investigating proteins' conformational dynamics. Our result shows that conformational fluctuation is widely present in protein systems, and that the fluctuation covers a broad range of time scales, highlighting the dynamic and complex nature of proteins' structure.
Evol and ProDy for bridging protein sequence evolution and structural dynamics.
Bakan, Ahmet; Dutta, Anindita; Mao, Wenzhi; Liu, Ying; Chennubhotla, Chakra; Lezon, Timothy R; Bahar, Ivet
2014-09-15
Correlations between sequence evolution and structural dynamics are of utmost importance in understanding the molecular mechanisms of function and their evolution. We have integrated Evol, a new package for fast and efficient comparative analysis of evolutionary patterns and conformational dynamics, into ProDy, a computational toolbox designed for inferring protein dynamics from experimental and theoretical data. Using information-theoretic approaches, Evol coanalyzes conservation and coevolution profiles extracted from multiple sequence alignments of protein families with their inferred dynamics. ProDy and Evol are open-source and freely available under MIT License from http://prody.csb.pitt.edu/. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Reconstructing directed gene regulatory network by only gene expression data.
Zhang, Lu; Feng, Xi Kang; Ng, Yen Kaow; Li, Shuai Cheng
2016-08-18
Accurately identifying gene regulatory network is an important task in understanding in vivo biological activities. The inference of such networks is often accomplished through the use of gene expression data. Many methods have been developed to evaluate gene expression dependencies between transcription factor and its target genes, and some methods also eliminate transitive interactions. The regulatory (or edge) direction is undetermined if the target gene is also a transcription factor. Some methods predict the regulatory directions in the gene regulatory networks by locating the eQTL single nucleotide polymorphism, or by observing the gene expression changes when knocking out/down the candidate transcript factors; regrettably, these additional data are usually unavailable, especially for the samples deriving from human tissues. In this study, we propose the Context Based Dependency Network (CBDN), a method that is able to infer gene regulatory networks with the regulatory directions from gene expression data only. To determine the regulatory direction, CBDN computes the influence of source to target by evaluating the magnitude changes of expression dependencies between the target gene and the others with conditioning on the source gene. CBDN extends the data processing inequality by involving the dependency direction to distinguish between direct and transitive relationship between genes. We also define two types of important regulators which can influence a majority of the genes in the network directly or indirectly. CBDN can detect both of these two types of important regulators by averaging the influence functions of candidate regulator to the other genes. In our experiments with simulated and real data, even with the regulatory direction taken into account, CBDN outperforms the state-of-the-art approaches for inferring gene regulatory network. CBDN identifies the important regulators in the predicted network: 1. TYROBP influences a batch of genes that are related to Alzheimer's disease; 2. ZNF329 and RB1 significantly regulate those 'mesenchymal' gene expression signature genes for brain tumors. By merely leveraging gene expression data, CBDN can efficiently infer the existence of gene-gene interactions as well as their regulatory directions. The constructed networks are helpful in the identification of important regulators for complex diseases.
Population-expression models of immune response
NASA Astrophysics Data System (ADS)
Stromberg, Sean P.; Antia, Rustom; Nemenman, Ilya
2013-06-01
The immune response to a pathogen has two basic features. The first is the expansion of a few pathogen-specific cells to form a population large enough to control the pathogen. The second is the process of differentiation of cells from an initial naive phenotype to an effector phenotype which controls the pathogen, and subsequently to a memory phenotype that is maintained and responsible for long-term protection. The expansion and the differentiation have been considered largely independently. Changes in cell populations are typically described using ecologically based ordinary differential equation models. In contrast, differentiation of single cells is studied within systems biology and is frequently modeled by considering changes in gene and protein expression in individual cells. Recent advances in experimental systems biology make available for the first time data to allow the coupling of population and high dimensional expression data of immune cells during infections. Here we describe and develop population-expression models which integrate these two processes into systems biology on the multicellular level. When translated into mathematical equations, these models result in non-conservative, non-local advection-diffusion equations. We describe situations where the population-expression approach can make correct inference from data while previous modeling approaches based on common simplifying assumptions would fail. We also explore how model reduction techniques can be used to build population-expression models, minimizing the complexity of the model while keeping the essential features of the system. While we consider problems in immunology in this paper, we expect population-expression models to be more broadly applicable.
Pinto, Diana; Pinto, Carla; Guerra, Joana; Pinheiro, Manuela; Santos, Rui; Vedeld, Hege Marie; Yohannes, Zeremariam; Peixoto, Ana; Santos, Catarina; Pinto, Pedro; Lopes, Paula; Lothe, Ragnhild; Lind, Guro Elisabeth; Henrique, Rui; Teixeira, Manuel R
2018-02-01
Constitutional epimutation of the two major mismatch repair genes, MLH1 and MSH2, has been identified as an alternative mechanism that predisposes to the development of Lynch syndrome. In the present work, we aimed to investigate the prevalence of MLH1 constitutional methylation in colorectal cancer (CRC) patients with abnormal expression of the MLH1 protein in their tumors. In a series of 38 patients who met clinical criteria for Lynch syndrome genetic testing, with loss of MLH1 expression in the tumor and with no germline mutations in the MLH1 gene (35/38) or with tumors presenting the BRAF p.Val600Glu mutation (3/38), we screened for constitutional methylation of the MLH1 gene promoter using methylation-specific multiplex ligation-dependent probe amplification (MS-MLPA) in various biological samples. We found four (4/38; 10.5%) patients with constitutional methylation in the MLH1 gene promoter. RNA studies demonstrated decreased MLH1 expression in the cases with constitutional methylation when compared with controls. We could infer the mosaic nature of MLH1 constitutional hypermethylation in tissues originated from different embryonic germ layers, and in one family we could show that it occurred de novo. We conclude that constitutional MLH1 methylation occurs in a significant proportion of patients who have loss of MLH1 protein expression in their tumors and no MLH1 pathogenic germline mutation. Furthermore, we provide evidence that MLH1 constitutional hypermethylation is the molecular mechanism behind about 3% of Lynch syndrome families diagnosed in our institution, especially in patients with early onset or multiple primary tumors without significant family history. © 2018 The Authors. Cancer Medicine published by John Wiley & Sons Ltd.
Functional rescue of mutant ABCA1 proteins by sodium 4-phenylbutyrate.
Sorrenson, Brie; Suetani, Rachel J; Williams, Michael J A; Bickley, Vivienne M; George, Peter M; Jones, Gregory T; McCormick, Sally P A
2013-01-01
Mutations in the ATP-binding cassette transporter A1 (ABCA1) are a major cause of decreased HDL cholesterol (HDL-C), which infers an increased risk of cardiovascular disease (CVD). Many ABCA1 mutants show impaired localization to the plasma membrane. The aim of this study was to investigate whether the chemical chaperone, sodium 4-phenylbutyrate (4-PBA) could improve cellular localization and function of ABCA1 mutants. Nine different ABCA1 mutants (p.A594T, p.I659V, p.R1068H, p.T1512M, p.Y1767D, p.N1800H, p.R2004K, p.A2028V, p.Q2239N) expressed in HEK293 cells, displaying different degrees of mislocalization to the plasma membrane and discrete impacts on cholesterol efflux, were subject to treatment with 4-PBA. Treatment restored localization to the plasma membrane and increased cholesterol efflux function for the majority of mutants. Treatment with 4-PBA also increased ABCA1 protein expression in all transfected cell lines. In fibroblast cells obtained from low HDL-C subjects expressing two of the ABCA1 mutants (p.R1068H and p.N1800H), 4-PBA increased cholesterol efflux without any increase in ABCA1 expression. Our study is the first to investigate the effect of the chemical chaperone, 4-PBA on ABCA1 and shows that it is capable of restoring plasma membrane localization and enhancing the cholesterol efflux function of mutant ABCA1s both in vitro and ex vivo. These results suggest 4-PBA may warrant further investigation as a potential therapy for increasing cholesterol efflux and HDL-C levels.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Karpinets, Tatiana V; Pelletier, Dale A; Pan, Chongle
Understanding of cellular processes involved in the anaerobic degradation of complex organic compounds by microorganisms is crucial for development of innovative biotechnologies for bioethanol production and for efficient degradation of toxic organic compounds. In natural environment the degradation is usually accomplished by syntrophic consortia comprised of different bacterial species. Here we show that the metabolically versatile phototrophic bacterium Rhodopseudomonas palustris may form its own syntrophic consortia, when it grows anaerobically on p-coumarate or benzoate as a sole carbon source. In the study we reveal the consortia from a comparison of large-scale measurements of mRNA and protein expressions under p-coumarate andmore » benzoate degrading conditions using a novel computational approach referred as phenotype fingerprinting. In this approach marker genes for known R. palustris phenotypes are employed to calculate their expression from the gene and protein expressions in each studied condition. Subpopulations of the consortia are inferred from the expression of phenotypes and known metabolic modes of the R. palustris growth. We find that p-coumarate degrading condition leads to at least three R. palustris subpopulations utilizing p-coumarate, benzoate, and CO2 and H2. Benzoate degrading condition also produces at least three subpopulations utilizing benzoate, CO2 and H2, and N2 and formate. Communication among syntrophs and inter-syntrophic dynamics in each consortium are indicated by up-regulation of transporters and genes involved in the curli formation and chemotaxis. The photoautotrphic subpopulation found in both consortia is characterized by activation of two cbb operons and the uptake hydrogenase system. A specificity of N2-fixing subpopulation in the benzoate degrading consortium is the preferential activation of the vanadium nitrogenase over the molybdenum nitrogenase. The N2-fixing subpopulation in the consortium is confirmed by consumption of dissolved nitrogen gas under the benzoate degrading conditions.« less
Johnston, Iain G; Williams, Ben P
2016-02-24
Since their endosymbiotic origin, mitochondria have lost most of their genes. Although many selective mechanisms underlying the evolution of mitochondrial genomes have been proposed, a data-driven exploration of these hypotheses is lacking, and a quantitatively supported consensus remains absent. We developed HyperTraPS, a methodology coupling stochastic modeling with Bayesian inference, to identify the ordering of evolutionary events and suggest their causes. Using 2015 complete mitochondrial genomes, we inferred evolutionary trajectories of mtDNA gene loss across the eukaryotic tree of life. We find that proteins comprising the structural cores of the electron transport chain are preferentially encoded within mitochondrial genomes across eukaryotes. A combination of high GC content and high protein hydrophobicity is required to explain patterns of mtDNA gene retention; a model that accounts for these selective pressures can also predict the success of artificial gene transfer experiments in vivo. This work provides a general method for data-driven inference of the ordering of evolutionary and progressive events, here identifying the distinct features shaping mitochondrial genomes of present-day species. Copyright © 2016 Elsevier Inc. All rights reserved.
Effects of seawater acidification on gene expression: resolving broader-scale trends in sea urchins.
Evans, Tyler G; Watson-Wynn, Priscilla
2014-06-01
Sea urchins are ecologically and economically important calcifying organisms threatened by acidification of the global ocean caused by anthropogenic CO2 emissions. Propelled by the sequencing of the purple sea urchin (Strongylocentrotus purpuratus) genome, profiling changes in gene expression during exposure to high pCO2 seawater has emerged as a powerful and increasingly common method to infer the response of urchins to ocean change. However, analyses of gene expression are sensitive to experimental methodology, and comparisons between studies of genes regulated by ocean acidification are most often made in the context of major caveats. Here we perform meta-analyses as a means of minimizing experimental discrepancies and resolving broader-scale trends regarding the effects of ocean acidification on gene expression in urchins. Analyses across eight studies and four urchin species largely support prevailing hypotheses about the impact of ocean acidification on marine calcifiers. The predominant expression pattern involved the down-regulation of genes within energy-producing pathways, a clear indication of metabolic depression. Genes with functions in ion transport were significantly over-represented and are most plausibly contributing to intracellular pH regulation. Expression profiles provided extensive evidence for an impact on biomineralization, epitomized by the down-regulation of seven spicule matrix proteins. In contrast, expression profiles provided limited evidence for CO2-mediated developmental delay or induction of a cellular stress response. Congruence between studies of gene expression and the ocean acidification literature in general validates the accuracy of gene expression in predicting the consequences of ocean change and justifies its continued use in future studies. © 2014 Marine Biological Laboratory.
Wang, Juan; Peng, Yuan-De; He, Chao; Wei, Bao-Yang; Liang, Yun-Shan; Yang, Hui-Lin; Wang, Zhi; Stanley, David; Song, Qi-Sheng
2016-10-30
The impact of Bacillus thuringiensis (Bt) toxin proteins on non-target predatory arthropods is not well understood at the cellular and molecular levels. Here, we investigated the potential effects of Cry1Ab expressing rice on fecundity of the wolf spider, Pardosa pseudoannulata, and some of the underlying molecular mechanisms. The results indicated that brown planthoppers (BPHs) reared on Cry1Ab-expressing rice accumulated the Cry toxin and that reproductive parameters (pre-oviposition period, post-oviposition stage, number of eggs, and egg hatching rate) of the spiders that consumed BPHs reared on Bt rice were not different from those that consumed BPHs reared on the non-Bt control rice. The accumulated Cry1Ab did not influence several vitellin (Vt) parameters, including stored energy and amino acid composition, during one generation. We considered the possibility that the Cry toxins exert their influence on beneficial predators via more subtle effects detectable at the molecular level in terms of gene expression. This led us to transcriptome analysis to detect differentially expressed genes in the ovaries of spiders exposed to dietary Cry1Ab and their counterpart control spiders. Eight genes, associated with vitellogenesis, vitellogenin receptor activity, and vitellin membrane formation were not differentially expressed between ovaries from the treated and control spiders, confirmed by qPCR analysis. We infer that dietary Cry1Ab expressing rice does not influence fecundity, nor expression levels of Vt-associated genes in P. pseudoannulata. Copyright © 2016. Published by Elsevier B.V.
Javadi Khederi, Saeid; Khanjani, Mohammad; Gholami, Mansur; Bruno, Giovanni Luigi
2018-05-01
Real-time quantitative polymerase chain reaction was used to study the expression of some marker genes involved in the interaction between grape (Vitis vinifera L.) and the erineum mite Colomerus vitis Pagenstecher (Acari: Eriophyidae). Potted vines of cultivars Atabaki (resistant to C. vitis), Ghalati (susceptible to C. vitis) and Muscat Gordo (moderately resistant to C. vitis) were infested at the six-leaf stage. The expression of protease inhibitor (PIN), beta-1,3-glucanase (GLU), polygalacturonase inhibitor (PGIP), Vitis vinifera proline-rich protein 1 (PRP1), stilbene synthase (STS), and lipoxygenase (LOX) genes was assessed on young leaves collected 96, 120 and 144 h after mite infestation (hami). As a control, non-infested leaves collected 24 h before mite infestations were used. Differences were detected in expression of the selected genes during the C. vitis-grapevine interaction. The resistant cultivar Atabaki increased the expression of LOX, STS, GLU, PGIP and PRP1 genes during the first 120 hami. On the contrary, in the susceptible Ghalati, all selected genes showed an expression level similar or lower than non-infested leaves. Muscat Gordo increased the expression of all selected genes in comparison with non-infested leaves, but it was lower than in Atabaki. Significant transcript accumulation of PIN gene was detected for Muscat Gordo whereas it was slightly up-regulated in Ghalati and Atabaki. LOX, STS, PIN, GLU, PGIP and PRP1 genes were clearly expressed in response to C. vitis infestation. We therefore infer that expression of PGIP, PIN and PRP1 genes could represent a defense strategy against C. vitis infestations in grapevine leaves.
Chi-Wei Lan, John; Chang, Chih-Kai; Wu, Ho-Shing
2014-09-01
A mutant gene of rumen phytase (phyA-7) was cloned into pET23b(+) vector and expressed in the Escherichia coli BL21 under the control of the T7 promoter. The study of fermentation conditions includes the temperature impacts of mutant phytase expression, the effect of carbon supplements over induction stage, the inferences of acetic acid accumulation upon enzyme expression and the comparison of one-stage and two-stage operations in batch mode. The maximum value of phytase activity was reached 107.0 U mL(-1) at induction temperature of 30°C. Yeast extract supplement demonstrated a significant increase on both protein concentration and phytase activity. The acetic acid (2 g L(-1)) presented in the modified synthetic medium demonstrated a significant decrease on expressed phytase activity. A two-stage batch operation enhanced the level of phytase activity from 306 to 1204 U mL(-1) in the 20 L of fermentation scale. An overall 3.7-fold improvement in phytase yield (35,375.72-1,31,617.50 U g(-1) DCW) was achieved in the two-stage operation. Copyright © 2014 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.
Byström, Sanna; Eklund, Martin; Hong, Mun-Gwan; Fredolini, Claudia; Eriksson, Mikael; Czene, Kamila; Hall, Per; Schwenk, Jochen M; Gabrielson, Marike
2018-02-14
Mammographic breast density is one of the strongest risk factors for breast cancer, but molecular understanding of how breast density relates to cancer risk is less complete. Studies of proteins in blood plasma, possibly associated with mammographic density, are well-suited as these allow large-scale analyses and might shed light on the association between breast cancer and breast density. Plasma samples from 1329 women in the Swedish KARMA project, without prior history of breast cancer, were profiled with antibody suspension bead array (SBA) assays. Two sample sets comprising 729 and 600 women were screened by two different SBAs targeting a total number of 357 proteins. Protein targets were selected through searching the literature, for either being related to breast cancer or for being linked to the extracellular matrix. Association between proteins and absolute area-based breast density (AD) was assessed by quantile regression, adjusting for age and body mass index (BMI). Plasma profiling revealed linear association between 20 proteins and AD, concordant in the two sets of samples (p < 0.05). Plasma levels of seven proteins were positively associated and 13 proteins negatively associated with AD. For eleven of these proteins evidence for gene expression in breast tissue existed. Among these, ABCC11, TNFRSF10D, F11R and ERRF were positively associated with AD, and SHC1, CFLAR, ACOX2, ITGB6, RASSF1, FANCD2 and IRX5 were negatively associated with AD. Screening proteins in plasma indicates associations between breast density and processes of tissue homeostasis, DNA repair, cancer development and/or progression in breast cancer. Further validation and follow-up studies of the shortlisted protein candidates in independent cohorts will be needed to infer their role in breast density and its progression in premenopausal and postmenopausal women.
Regression Analysis of Combined Gene Expression Regulation in Acute Myeloid Leukemia
Li, Yue; Liang, Minggao; Zhang, Zhaolei
2014-01-01
Gene expression is a combinatorial function of genetic/epigenetic factors such as copy number variation (CNV), DNA methylation (DM), transcription factors (TF) occupancy, and microRNA (miRNA) post-transcriptional regulation. At the maturity of microarray/sequencing technologies, large amounts of data measuring the genome-wide signals of those factors became available from Encyclopedia of DNA Elements (ENCODE) and The Cancer Genome Atlas (TCGA). However, there is a lack of an integrative model to take full advantage of these rich yet heterogeneous data. To this end, we developed RACER (Regression Analysis of Combined Expression Regulation), which fits the mRNA expression as response using as explanatory variables, the TF data from ENCODE, and CNV, DM, miRNA expression signals from TCGA. Briefly, RACER first infers the sample-specific regulatory activities by TFs and miRNAs, which are then used as inputs to infer specific TF/miRNA-gene interactions. Such a two-stage regression framework circumvents a common difficulty in integrating ENCODE data measured in generic cell-line with the sample-specific TCGA measurements. As a case study, we integrated Acute Myeloid Leukemia (AML) data from TCGA and the related TF binding data measured in K562 from ENCODE. As a proof-of-concept, we first verified our model formalism by 10-fold cross-validation on predicting gene expression. We next evaluated RACER on recovering known regulatory interactions, and demonstrated its superior statistical power over existing methods in detecting known miRNA/TF targets. Additionally, we developed a feature selection procedure, which identified 18 regulators, whose activities clustered consistently with cytogenetic risk groups. One of the selected regulators is miR-548p, whose inferred targets were significantly enriched for leukemia-related pathway, implicating its novel role in AML pathogenesis. Moreover, survival analysis using the inferred activities identified C-Fos as a potential AML prognostic marker. Together, we provided a novel framework that successfully integrated the TCGA and ENCODE data in revealing AML-specific regulatory program at global level. PMID:25340776
Statistical inference of protein structural alignments using information and compression.
Collier, James H; Allison, Lloyd; Lesk, Arthur M; Stuckey, Peter J; Garcia de la Banda, Maria; Konagurthu, Arun S
2017-04-01
Structural molecular biology depends crucially on computational techniques that compare protein three-dimensional structures and generate structural alignments (the assignment of one-to-one correspondences between subsets of amino acids based on atomic coordinates). Despite its importance, the structural alignment problem has not been formulated, much less solved, in a consistent and reliable way. To overcome these difficulties, we present here a statistical framework for the precise inference of structural alignments, built on the Bayesian and information-theoretic principle of Minimum Message Length (MML). The quality of any alignment is measured by its explanatory power-the amount of lossless compression achieved to explain the protein coordinates using that alignment. We have implemented this approach in MMLigner , the first program able to infer statistically significant structural alignments. We also demonstrate the reliability of MMLigner 's alignment results when compared with the state of the art. Importantly, MMLigner can also discover different structural alignments of comparable quality, a challenging problem for oligomers and protein complexes. Source code, binaries and an interactive web version are available at http://lcb.infotech.monash.edu.au/mmligner . arun.konagurthu@monash.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Yang, Jianhua; Osman, Kim; Iqbal, Mudassar; Stekel, Dov J.; Luo, Zewei; Armstrong, Susan J.; Franklin, F. Chris H.
2013-01-01
Following successful completion of the Brassica rapa sequencing project, the next step is to investigate functions of individual genes/proteins. For Arabidopsis thaliana, large amounts of protein–protein interaction (PPI) data are available from the major PPI databases (DBs). It is known that Brassica crop species are closely related to A. thaliana. This provides an opportunity to infer the B. rapa interactome using PPI data available from A. thaliana. In this paper, we present an inferred B. rapa interactome that is based on the A. thaliana PPI data from two resources: (i) A. thaliana PPI data from three major DBs, BioGRID, IntAct, and TAIR. (ii) ortholog-based A. thaliana PPI predictions. Linking between B. rapa and A. thaliana was accomplished in three complementary ways: (i) ortholog predictions, (ii) identification of gene duplication based on synteny and collinearity, and (iii) BLAST sequence similarity search. A complementary approach was also applied, which used known/predicted domain–domain interaction data. Specifically, since the two species are closely related, we used PPI data from A. thaliana to predict interacting domains that might be conserved between the two species. The predicted interactome was investigated for the component that contains known A. thaliana meiotic proteins to demonstrate its usability. PMID:23293649
A standardized framing for reporting protein identifications in mzIdentML 1.2
Seymour, Sean L.; Farrah, Terry; Binz, Pierre-Alain; Chalkley, Robert J.; Cottrell, John S.; Searle, Brian C.; Tabb, David L.; Vizcaíno, Juan Antonio; Prieto, Gorka; Uszkoreit, Julian; Eisenacher, Martin; Martínez-Bartolomé, Salvador; Ghali, Fawaz; Jones, Andrew R.
2015-01-01
Inferring which protein species have been detected in bottom-up proteomics experiments has been a challenging problem for which solutions have been maturing over the past decade. While many inference approaches now function well in isolation, comparing and reconciling the results generated across different tools remains difficult. It presently stands as one of the greatest barriers in collaborative efforts such as the Human Proteome Project and public repositories like the PRoteomics IDEntifications (PRIDE) database. Here we present a framework for reporting protein identifications that seeks to improve capabilities for comparing results generated by different inference tools. This framework standardizes the terminology for describing protein identification results, associated with the HUPO-Proteomics Standards Initiative (PSI) mzIdentML standard, while still allowing for differing methodologies to reach that final state. It is proposed that developers of software for reporting identification results will adopt this terminology in their outputs. While the new terminology does not require any changes to the core mzIdentML model, it represents a significant change in practice, and, as such, the rules will be released via a new version of the mzIdentML specification (version 1.2) so that consumers of files are able to determine whether the new guidelines have been adopted by export software. PMID:25092112
Formation of compact myelin is required for maturation of the axonal cytoskeleton
NASA Technical Reports Server (NTRS)
Brady, S. T.; Witt, A. S.; Kirkpatrick, L. L.; de Waegh, S. M.; Readhead, C.; Tu, P. H.; Lee, V. M.
1999-01-01
Although traditional roles ascribed to myelinating glial cells are structural and supportive, the importance of compact myelin for proper functioning of the nervous system can be inferred from mutations in myelin proteins and neuropathologies associated with loss of myelin. Myelinating Schwann cells are known to affect local properties of peripheral axons (de Waegh et al., 1992), but little is known about effects of oligodendrocytes on CNS axons. The shiverer mutant mouse has a deletion in the myelin basic protein gene that eliminates compact myelin in the CNS. In shiverer mice, both local axonal features like phosphorylation of cytoskeletal proteins and neuronal perikaryon functions like cytoskeletal gene expression are altered. This leads to changes in the organization and composition of the axonal cytoskeleton in shiverer unmyelinated axons relative to age-matched wild-type myelinated fibers, although connectivity and patterns of neuronal activity are comparable. Remarkably, transgenic shiverer mice with thin myelin sheaths display an intermediate phenotype indicating that CNS neurons are sensitive to myelin sheath thickness. These results indicate that formation of a normal compact myelin sheath is required for normal maturation of the neuronal cytoskeleton in large CNS neurons.
Regulating the effects of GPR21, a novel target for type 2 diabetes
NASA Astrophysics Data System (ADS)
Leonard, Siobhán; Kinsella, Gemma K.; Benetti, Elisa; Findlay, John B. C.
2016-05-01
Type 2 diabetes is a chronic metabolic disorder primarily caused by insulin resistance to which obesity is a major contributor. Expression levels of an orphan G protein-coupled receptor (GPCR), GPR21, demonstrated a trend towards a significant increase in the epididymal fat pads of wild type high fat high sugar (HFHS)-fed mice. To gain further insight into the potential role this novel target may play in the development of obesity-associated type 2 diabetes, the signalling capabilities of the receptor were investigated. Overexpression studies in HEK293T cells revealed GPR21 to be a constitutively active receptor, which couples to Gαq type G proteins leading to the activation of mitogen activated protein kinases (MAPKs). Overexpression of GPR21 in vitro also markedly attenuated insulin signalling. Interestingly, the effect of GPR21 on the MAPKs and insulin signalling was reduced in the presence of serum, inferring the possibility of a native inhibitory ligand. Homology modelling and ligand docking studies led to the identification of a novel compound that inhibited GPR21 activity. Its effects offer potential as an anti-diabetic pharmacological strategy as it was found to counteract the influence of GPR21 on the insulin signalling pathway.
Evolutionarily diverse SYP1 Qa-SNAREs jointly sustain pollen tube growth in Arabidopsis.
Slane, Daniel; Reichardt, Ilka; El Kasmi, Farid; Bayer, Martin; Jürgens, Gerd
2017-11-01
Intracellular membrane fusion is effected by SNARE proteins that reside on adjacent membranes and form bridging trans-SNARE complexes. Qa-SNARE members of the Arabidopsis SYP1 family are involved in membrane fusion at the plasma membrane or during cell plate formation. Three SYP1 family members have been classified as pollen-specific as inferred from gene expression profiling studies, and two of them, SYP124 and SYP125, are confined to angiosperms. The SYP124 gene appears genetically unstable, whereas its sister gene SYP125 shows essentially no variation among Arabidopsis accessions. The third pollen-specific member SYP131 is sister to SYP132, which appears evolutionarily conserved in the plant lineage. Although evolutionarily diverse, the three SYP1 proteins are functionally overlapping in that only the triple mutant syp124 syp125 syp131 shows a specific and severe male gametophytic defect. While pollen development and germination appear normal, pollen tube growth is arrested during passage through the style. Our results suggest that angiosperm pollen tubes employ a combination of ancient and modern Qa-SNARE proteins to sustain their growth-promoting membrane dynamics during the reproductive process. © 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd.
Chen, Shuonan; Mar, Jessica C
2018-06-19
A fundamental fact in biology states that genes do not operate in isolation, and yet, methods that infer regulatory networks for single cell gene expression data have been slow to emerge. With single cell sequencing methods now becoming accessible, general network inference algorithms that were initially developed for data collected from bulk samples may not be suitable for single cells. Meanwhile, although methods that are specific for single cell data are now emerging, whether they have improved performance over general methods is unknown. In this study, we evaluate the applicability of five general methods and three single cell methods for inferring gene regulatory networks from both experimental single cell gene expression data and in silico simulated data. Standard evaluation metrics using ROC curves and Precision-Recall curves against reference sets sourced from the literature demonstrated that most of the methods performed poorly when they were applied to either experimental single cell data, or simulated single cell data, which demonstrates their lack of performance for this task. Using default settings, network methods were applied to the same datasets. Comparisons of the learned networks highlighted the uniqueness of some predicted edges for each method. The fact that different methods infer networks that vary substantially reflects the underlying mathematical rationale and assumptions that distinguish network methods from each other. This study provides a comprehensive evaluation of network modeling algorithms applied to experimental single cell gene expression data and in silico simulated datasets where the network structure is known. Comparisons demonstrate that most of these assessed network methods are not able to predict network structures from single cell expression data accurately, even if they are specifically developed for single cell methods. Also, single cell methods, which usually depend on more elaborative algorithms, in general have less similarity to each other in the sets of edges detected. The results from this study emphasize the importance for developing more accurate optimized network modeling methods that are compatible for single cell data. Newly-developed single cell methods may uniquely capture particular features of potential gene-gene relationships, and caution should be taken when we interpret these results.
Hypersensitivity linked to exposure of broad bean protein(s) in allergic patients and BALB/c mice.
Kumar, Dinesh; Kumar, Sandeep; Verma, Alok K; Sharma, Akanksha; Tripathi, Anurag; Chaudhari, Bhushan P; Kant, Surya; Das, Mukul; Jain, Swatantra K; Dwivedi, Premendra D
2014-01-01
Broad bean (Vicia faba L.), a common vegetable, belongs to the family Fabaceae and is consumed worldwide. Limited studies have been done on allergenicity of broad beans. The aim of this study was to determine if broad bean proteins have the ability to elicit allergic responses due to the presence of clinically relevant allergenic proteins. Simulated gastric fluid (SGF) assay and immunoglobulin E (IgE) immunoblotting were carried out to identify pepsin-resistant and IgE-binding proteins. The allergenicity of broad beans was assessed in allergic patients, BALB/c mice, splenocytes, and RBL-2H3 cells. Eight broad bean proteins of approximate molecular weight 70, 60, 48, 32, 23, 19, 15, and 10 kDa that remained undigested in SGF, showed IgE-binding capacity as well. Of 127 allergic patients studied, broad bean allergy was evident in 16 (12%). Mice sensitized with broad bean showed increased levels of histamine, total and specific IgE, and severe signs of systemic anaphylaxis compared with controls. Enhanced levels of histamine, prostaglandin D2, cysteinyl leukotriene, and β-hexosaminidase release were observed in the primed RBL-2H3 cells following broad bean exposure. The levels of interleukin IL-4, IL-5, IL-13 and regulated on activation, normal T-cell expressed and secreted were found enhanced in broad bean-treated splenocytes culture supernatant compared with controls. This study inferred that broad bean proteins have the ability to elicit allergic responses due to the presence of clinically relevant allergenic proteins. Copyright © 2014 Elsevier Inc. All rights reserved.
Harnessing Diversity towards the Reconstructing of Large Scale Gene Regulatory Networks
Yamanaka, Ryota; Kitano, Hiroaki
2013-01-01
Elucidating gene regulatory network (GRN) from large scale experimental data remains a central challenge in systems biology. Recently, numerous techniques, particularly consensus driven approaches combining different algorithms, have become a potentially promising strategy to infer accurate GRNs. Here, we develop a novel consensus inference algorithm, TopkNet that can integrate multiple algorithms to infer GRNs. Comprehensive performance benchmarking on a cloud computing framework demonstrated that (i) a simple strategy to combine many algorithms does not always lead to performance improvement compared to the cost of consensus and (ii) TopkNet integrating only high-performance algorithms provide significant performance improvement compared to the best individual algorithms and community prediction. These results suggest that a priori determination of high-performance algorithms is a key to reconstruct an unknown regulatory network. Similarity among gene-expression datasets can be useful to determine potential optimal algorithms for reconstruction of unknown regulatory networks, i.e., if expression-data associated with known regulatory network is similar to that with unknown regulatory network, optimal algorithms determined for the known regulatory network can be repurposed to infer the unknown regulatory network. Based on this observation, we developed a quantitative measure of similarity among gene-expression datasets and demonstrated that, if similarity between the two expression datasets is high, TopkNet integrating algorithms that are optimal for known dataset perform well on the unknown dataset. The consensus framework, TopkNet, together with the similarity measure proposed in this study provides a powerful strategy towards harnessing the wisdom of the crowds in reconstruction of unknown regulatory networks. PMID:24278007
Complete fold annotation of the human proteome using a novel structural feature space.
Middleton, Sarah A; Illuminati, Joseph; Kim, Junhyong
2017-04-13
Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 94% accuracy even when many folds have only one training example. We demonstrate the utility of this method by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into potential biological function, such as prediction of RNA-binding ability. Our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Middleton, Sarah A.; Illuminati, Joseph; Kim, Junhyong
Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 94% accuracy even when many folds have only one training example. We demonstrate the utility of this methodmore » by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into potential biological function, such as prediction of RNA-binding ability. Finally, our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families.« less
Complete fold annotation of the human proteome using a novel structural feature space
Middleton, Sarah A.; Illuminati, Joseph; Kim, Junhyong
2017-01-01
Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 94% accuracy even when many folds have only one training example. We demonstrate the utility of this method by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into potential biological function, such as prediction of RNA-binding ability. Our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families. PMID:28406174
Bacillus anthracis genome organization in light of whole transcriptome sequencing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Martin, Jeffrey; Zhu, Wenhan; Passalacqua, Karla D.
2010-03-22
Emerging knowledge of whole prokaryotic transcriptomes could validate a number of theoretical concepts introduced in the early days of genomics. What are the rules connecting gene expression levels with sequence determinants such as quantitative scores of promoters and terminators? Are translation efficiency measures, e.g. codon adaptation index and RBS score related to gene expression? We used the whole transcriptome shotgun sequencing of a bacterial pathogen Bacillus anthracis to assess correlation of gene expression level with promoter, terminator and RBS scores, codon adaptation index, as well as with a new measure of gene translational efficiency, average translation speed. We compared computationalmore » predictions of operon topologies with the transcript borders inferred from RNA-Seq reads. Transcriptome mapping may also improve existing gene annotation. Upon assessment of accuracy of current annotation of protein-coding genes in the B. anthracis genome we have shown that the transcriptome data indicate existence of more than a hundred genes missing in the annotation though predicted by an ab initio gene finder. Interestingly, we observed that many pseudogenes possess not only a sequence with detectable coding potential but also promoters that maintain transcriptional activity.« less
The Disappointing Gift: Dispositional and Situational Moderators of Emotional Expressions
ERIC Educational Resources Information Center
Tobin, Renee M.; Graziano, William G.
2011-01-01
Inferences about emotions in children are limited by studies that rely on only one research method. Convergence across methods provides a stronger basis for inference by identifying method variance. This multimethod study of 116 children (mean age = 8.21 years) examined emotional displays during social exchange. Each child received a desirable…
Statistical inference for remote sensing-based estimates of net deforestation
Ronald E. McRoberts; Brian F. Walters
2012-01-01
Statistical inference requires expression of an estimate in probabilistic terms, usually in the form of a confidence interval. An approach to constructing confidence intervals for remote sensing-based estimates of net deforestation is illustrated. The approach is based on post-classification methods using two independent forest/non-forest classifications because...
Construction of regulatory networks using expression time-series data of a genotyped population.
Yeung, Ka Yee; Dombek, Kenneth M; Lo, Kenneth; Mittler, John E; Zhu, Jun; Schadt, Eric E; Bumgarner, Roger E; Raftery, Adrian E
2011-11-29
The inference of regulatory and biochemical networks from large-scale genomics data is a basic problem in molecular biology. The goal is to generate testable hypotheses of gene-to-gene influences and subsequently to design bench experiments to confirm these network predictions. Coexpression of genes in large-scale gene-expression data implies coregulation and potential gene-gene interactions, but provide little information about the direction of influences. Here, we use both time-series data and genetics data to infer directionality of edges in regulatory networks: time-series data contain information about the chronological order of regulatory events and genetics data allow us to map DNA variations to variations at the RNA level. We generate microarray data measuring time-dependent gene-expression levels in 95 genotyped yeast segregants subjected to a drug perturbation. We develop a Bayesian model averaging regression algorithm that incorporates external information from diverse data types to infer regulatory networks from the time-series and genetics data. Our algorithm is capable of generating feedback loops. We show that our inferred network recovers existing and novel regulatory relationships. Following network construction, we generate independent microarray data on selected deletion mutants to prospectively test network predictions. We demonstrate the potential of our network to discover de novo transcription-factor binding sites. Applying our construction method to previously published data demonstrates that our method is competitive with leading network construction algorithms in the literature.
Petri Nets with Fuzzy Logic (PNFL): Reverse Engineering and Parametrization
Küffner, Robert; Petri, Tobias; Windhager, Lukas; Zimmer, Ralf
2010-01-01
Background The recent DREAM4 blind assessment provided a particularly realistic and challenging setting for network reverse engineering methods. The in silico part of DREAM4 solicited the inference of cycle-rich gene regulatory networks from heterogeneous, noisy expression data including time courses as well as knockout, knockdown and multifactorial perturbations. Methodology and Principal Findings We inferred and parametrized simulation models based on Petri Nets with Fuzzy Logic (PNFL). This completely automated approach correctly reconstructed networks with cycles as well as oscillating network motifs. PNFL was evaluated as the best performer on DREAM4 in silico networks of size 10 with an area under the precision-recall curve (AUPR) of 81%. Besides topology, we inferred a range of additional mechanistic details with good reliability, e.g. distinguishing activation from inhibition as well as dependent from independent regulation. Our models also performed well on new experimental conditions such as double knockout mutations that were not included in the provided datasets. Conclusions The inference of biological networks substantially benefits from methods that are expressive enough to deal with diverse datasets in a unified way. At the same time, overly complex approaches could generate multiple different models that explain the data equally well. PNFL appears to strike the balance between expressive power and complexity. This also applies to the intuitive representation of PNFL models combining a straightforward graphical notation with colloquial fuzzy parameters. PMID:20862218
The hippocampus and memory for orderly stimulus relations
Dusek, Jeffery A.; Eichenbaum, Howard
1997-01-01
Human declarative memory involves a systematic organization of information that supports generalizations and inferences from acquired knowledge. This kind of memory depends on the hippocampal region in humans, but the extent to which animals also have declarative memory, and whether inferential expression of memory depends on the hippocampus in animals, remains a major challenge in cognitive neuroscience. To examine these issues, we used a test of transitive inference pioneered by Piaget to assess capacities for systematic organization of knowledge and logical inference in children. In our adaptation of the test, rats were trained on a set of four overlapping odor discrimination problems that could be encoded either separately or as a single representation of orderly relations among the odor stimuli. Normal rats learned the problems and demonstrated the relational memory organization through appropriate transitive inferences about items not presented together during training. By contrast, after disconnection of the hippocampus from either its cortical or subcortical pathway, rats succeeded in acquiring the separate discrimination problems but did not demonstrate transitive inference, indicating that they had failed to develop or could not inferentially express the orderly organization of the stimulus elements. These findings strongly support the view that the hippocampus mediates a general declarative memory capacity in animals, as it does in humans. PMID:9192700
Inference of scale-free networks from gene expression time series.
Daisuke, Tominaga; Horton, Paul
2006-04-01
Quantitative time-series observation of gene expression is becoming possible, for example by cell array technology. However, there are no practical methods with which to infer network structures using only observed time-series data. As most computational models of biological networks for continuous time-series data have a high degree of freedom, it is almost impossible to infer the correct structures. On the other hand, it has been reported that some kinds of biological networks, such as gene networks and metabolic pathways, may have scale-free properties. We hypothesize that the architecture of inferred biological network models can be restricted to scale-free networks. We developed an inference algorithm for biological networks using only time-series data by introducing such a restriction. We adopt the S-system as the network model, and a distributed genetic algorithm to optimize models to fit its simulated results to observed time series data. We have tested our algorithm on a case study (simulated data). We compared optimization under no restriction, which allows for a fully connected network, and under the restriction that the total number of links must equal that expected from a scale free network. The restriction reduced both false positive and false negative estimation of the links and also the differences between model simulation and the given time-series data.
Dynamics of cellular level function and regulation derived from murine expression array data.
de Bivort, Benjamin; Huang, Sui; Bar-Yam, Yaneer
2004-12-21
A major open question of systems biology is how genetic and molecular components interact to create phenotypes at the cellular level. Although much recent effort has been dedicated to inferring effective regulatory influences within small networks of genes, the power of microarray bioinformatics has yet to be used to determine functional influences at the cellular level. In all cases of data-driven parameter estimation, the number of model parameters estimable from a set of data is strictly limited by the size of that set. Rather than infer parameters describing the detailed interactions of just a few genes, we chose a larger-scale investigation so that the cumulative effects of all gene interactions could be analyzed to identify the dynamics of cellular-level function. By aggregating genes into large groups with related behaviors (megamodules), we were able to determine the effective aggregate regulatory influences among 12 major gene groups in murine B lymphocytes over a variety of time steps. Intriguing observations about the behavior of cells at this high level of abstraction include: (i) a medium-term critical global transcriptional dependence on ATP-generating genes in the mitochondria, (ii) a longer-term dependence on glycolytic genes, (iii) the dual role of chromatin-reorganizing genes in transcriptional activation and repression, (iv) homeostasis-favoring influences, (v) the indication that, as a group, G protein-mediated signals are not concentration-dependent in their influence on target gene expression, and (vi) short-term-activating/long-term-repressing behavior of the cell-cycle system that reflects its oscillatory behavior.
Meyer, Vera; Wanka, Franziska; van Gent, Janneke; Arentshorst, Mark; van den Hondel, Cees A. M. J. J.; Ram, Arthur F. J.
2011-01-01
Filamentous fungi are the cause of serious human and plant diseases but are also exploited in biotechnology as production platforms. Comparative genomics has documented their genetic diversity, and functional genomics and systems biology approaches are under way to understand the functions and interaction of fungal genes and proteins. In these approaches, gene functions are usually inferred from deletion or overexpression mutants. However, studies at these extreme points give only limited information. Moreover, many overexpression studies use metabolism-dependent promoters, often causing pleiotropic effects and thus limitations in their significance. We therefore established and systematically evaluated a tunable expression system for Aspergillus niger that is independent of carbon and nitrogen metabolism and silent under noninduced conditions. The system consists of two expression modules jointly targeted to a defined genomic locus. One module ensures constitutive expression of the tetracycline-dependent transactivator rtTA2S-M2, and one module harbors the rtTA2S-M2-dependent promoter that controls expression of the gene of interest (the Tet-on system). We show here that the system is tight, responds within minutes after inducer addition, and allows fine-tuning based on the inducer concentration or gene copy number up to expression levels higher than the expression levels of the gpdA promoter. We also validate the Tet-on system for the generation of conditional overexpression mutants and demonstrate its power when combined with a gene deletion approach. Finally, we show that the system is especially suitable when the functions of essential genes must be examined. PMID:21378046
Gene expression complex networks: synthesis, identification, and analysis.
Lopes, Fabrício M; Cesar, Roberto M; Costa, Luciano Da F
2011-10-01
Thanks to recent advances in molecular biology, allied to an ever increasing amount of experimental data, the functional state of thousands of genes can now be extracted simultaneously by using methods such as cDNA microarrays and RNA-Seq. Particularly important related investigations are the modeling and identification of gene regulatory networks from expression data sets. Such a knowledge is fundamental for many applications, such as disease treatment, therapeutic intervention strategies and drugs design, as well as for planning high-throughput new experiments. Methods have been developed for gene networks modeling and identification from expression profiles. However, an important open problem regards how to validate such approaches and its results. This work presents an objective approach for validation of gene network modeling and identification which comprises the following three main aspects: (1) Artificial Gene Networks (AGNs) model generation through theoretical models of complex networks, which is used to simulate temporal expression data; (2) a computational method for gene network identification from the simulated data, which is founded on a feature selection approach where a target gene is fixed and the expression profile is observed for all other genes in order to identify a relevant subset of predictors; and (3) validation of the identified AGN-based network through comparison with the original network. The proposed framework allows several types of AGNs to be generated and used in order to simulate temporal expression data. The results of the network identification method can then be compared to the original network in order to estimate its properties and accuracy. Some of the most important theoretical models of complex networks have been assessed: the uniformly-random Erdös-Rényi (ER), the small-world Watts-Strogatz (WS), the scale-free Barabási-Albert (BA), and geographical networks (GG). The experimental results indicate that the inference method was sensitive to average degree
Inference of epistatic effects in a key mitochondrial protein
NASA Astrophysics Data System (ADS)
Nelson, Erik D.; Grishin, Nick V.
2018-06-01
We use Potts model inference to predict pair epistatic effects in a key mitochondrial protein—cytochrome c oxidase subunit 2—for ray-finned fishes. We examine the effect of phylogenetic correlations on our predictions using a simple exact fitness model, and we find that, although epistatic effects are underpredicted, they maintain a roughly linear relationship to their true (model) values. After accounting for this correction, epistatic effects in the protein are still relatively weak, leading to fitness valleys of depth 2 N s ≃-5 in compensatory double mutants. Interestingly, positive epistasis is more pronounced than negative epistasis, and the strongest positive effects capture nearly all sites subject to positive selection in fishes, similar to virus proteins evolving under selection pressure in the context of drug therapy.
Jørgensen, Mikkel G; Pandey, Deo P; Jaskolska, Milena; Gerdes, Kenn
2009-02-01
Toxin-antitoxin (TA) loci are common in free-living bacteria and archaea. TA loci encode a stable toxin that is neutralized by a metabolically unstable antitoxin. The antitoxin can be either a protein or an antisense RNA. So far, six different TA gene families, in which the antitoxins are proteins, have been identified. Recently, Makarova et al. (K. S. Makarova, N. V. Grishin, and E. V. Koonin, Bioinformatics 22:2581-2584, 2006) suggested that the hicAB loci constitute a novel TA gene family. Using the hicAB locus of Escherichia coli K-12 as a model system, we present evidence that supports this inference: expression of the small HicA protein (58 amino acids [aa]) induced cleavage in three model mRNAs and tmRNA. Concomitantly, the global rate of translation was severely reduced. Using tmRNA as a substrate, we show that HicA-induced cleavage does not require the target RNA to be translated. Expression of HicB (145 aa) prevented HicA-mediated inhibition of cell growth. These results suggest that HicB neutralizes HicA and therefore functions as an antitoxin. As with other antitoxins (RelB and MazF), HicB could resuscitate cells inhibited by HicA, indicating that ectopic production of HicA induces a bacteriostatic rather than a bactericidal condition. Nutrient starvation induced strong hicAB transcription that depended on Lon protease. Mining of 218 prokaryotic genomes revealed that hicAB loci are abundant in bacteria and archaea.
Singh, Reema; Schilde, Christina; Schaap, Pauline
2016-11-17
Dictyostelia are a well-studied group of organisms with colonial multicellularity, which are members of the mostly unicellular Amoebozoa. A phylogeny based on SSU rDNA data subdivided all Dictyostelia into four major groups, but left the position of the root and of six group-intermediate taxa unresolved. Recent phylogenies inferred from 30 or 213 proteins from sequenced genomes, positioned the root between two branches, each containing two major groups, but lacked data to position the group-intermediate taxa. Since the positions of these early diverging taxa are crucial for understanding the evolution of phenotypic complexity in Dictyostelia, we sequenced six representative genomes of early diverging taxa. We retrieved orthologs of 47 housekeeping proteins with an average size of 890 amino acids from six newly sequenced and eight published genomes of Dictyostelia and unicellular Amoebozoa and inferred phylogenies from single and concatenated protein sequence alignments. Concatenated alignments of all 47 proteins, and four out of five subsets of nine concatenated proteins all produced the same consensus phylogeny with 100% statistical support. Trees inferred from just two out of the 47 proteins, individually reproduced the consensus phylogeny, highlighting that single gene phylogenies will rarely reflect correct species relationships. However, sets of two or three concatenated proteins again reproduced the consensus phylogeny, indicating that a small selection of genes suffices for low cost classification of as yet unincorporated or newly discovered dictyostelid and amoebozoan taxa by gene amplification. The multi-locus consensus phylogeny shows that groups 1 and 2 are sister clades in branch I, with the group-intermediate taxon D. polycarpum positioned as outgroup to group 2. Branch II consists of groups 3 and 4, with the group-intermediate taxon Polysphondylium violaceum positioned as sister to group 4, and the group-intermediate taxon Dictyostelium polycephalum branching at the base of that whole clade. Given the data, the approximately unbiased test rejects all alternative topologies favoured by SSU rDNA and individual proteins with high statistical support. The test also rejects monophyletic origins for the genera Acytostelium, Polysphondylium and Dictyostelium. The current position of Acytostelium ellipticum in the consensus phylogeny indicates that somatic cells were lost twice in Dictyostelia.
Cheng, Yiming; Perocchi, Fabiana
2015-07-01
ProtPhylo is a web-based tool to identify proteins that are functionally linked to either a phenotype or a protein of interest based on co-evolution. ProtPhylo infers functional associations by comparing protein phylogenetic profiles (co-occurrence patterns of orthology relationships) for more than 9.7 million non-redundant protein sequences from all three domains of life. Users can query any of 2048 fully sequenced organisms, including 1678 bacteria, 255 eukaryotes and 115 archaea. In addition, they can tailor ProtPhylo to a particular kind of biological question by choosing among four main orthology inference methods based either on pair-wise sequence comparisons (One-way Best Hits and Best Reciprocal Hits) or clustering of orthologous proteins across multiple species (OrthoMCL and eggNOG). Next, ProtPhylo ranks phylogenetic neighbors of query proteins or phenotypic properties using the Hamming distance as a measure of similarity between pairs of phylogenetic profiles. Candidate hits can be easily and flexibly prioritized by complementary clues on subcellular localization, known protein-protein interactions, membrane spanning regions and protein domains. The resulting protein list can be quickly exported into a csv text file for further analyses. ProtPhylo is freely available at http://www.protphylo.org. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Detection of Significant Pneumococcal Meningitis Biomarkers by Ego Network.
Wang, Qian; Lou, Zhifeng; Zhai, Liansuo; Zhao, Haibin
2017-06-01
To identify significant biomarkers for detection of pneumococcal meningitis based on ego network. Based on the gene expression data of pneumococcal meningitis and global protein-protein interactions (PPIs) data recruited from open access databases, the authors constructed a differential co-expression network (DCN) to identify pneumococcal meningitis biomarkers in a network view. Here EgoNet algorithm was employed to screen the significant ego networks that could accurately distinguish pneumococcal meningitis from healthy controls, by sequentially seeking ego genes, searching candidate ego networks, refinement of candidate ego networks and significance analysis to identify ego networks. Finally, the functional inference of the ego networks was performed to identify significant pathways for pneumococcal meningitis. By differential co-expression analysis, the authors constructed the DCN that covered 1809 genes and 3689 interactions. From the DCN, a total of 90 ego genes were identified. Starting from these ego genes, three significant ego networks (Module 19, Module 70 and Module 71) that could predict clinical outcomes for pneumococcal meningitis were identified by EgoNet algorithm, and the corresponding ego genes were GMNN, MAD2L1 and TPX2, respectively. Pathway analysis showed that these three ego networks were related to CDT1 association with the CDC6:ORC:origin complex, inactivation of APC/C via direct inhibition of the APC/C complex pathway, and DNA strand elongation, respectively. The authors successfully screened three significant ego modules which could accurately predict the clinical outcomes for pneumococcal meningitis and might play important roles in host response to pathogen infection in pneumococcal meningitis.
NASA Technical Reports Server (NTRS)
1992-01-01
CBR Express software solves problems by adapting sorted solutions to new problems specified by a user. It is applicable to a wide range of situations. The technology was originally developed by Inference Corporation for Johnson Space Center's Advanced Software Development Workstation. The project focused on the reuse of software designs, and Inference used CBR as part of the ACCESS prototype software. The commercial CBR Express is used as a "help desk" for customer support, enabling reuse of existing information when necessary. It has been adopted by several companies, among them American Airlines, which uses it to solve reservation system software problems.
Action starring narratives and events: Structure and inference in visual narrative comprehension
Cohn, Neil; Wittenberg, Eva
2015-01-01
Studies of discourse have long placed focus on the inference generated by information that is not overtly expressed, and theories of visual narrative comprehension similarly focused on the inference generated between juxtaposed panels. Within the visual language of comics, star-shaped “flashes” commonly signify impacts, but can be enlarged to the size of a whole panel that can omit all other representational information. These “action star” panels depict a narrative culmination (a “Peak”), but have content which readers must infer, thereby posing a challenge to theories of inference generation in visual narratives that focus only on the semantic changes between juxtaposed images. This paper shows that action stars demand more inference than depicted events, and that they are more coherent in narrative sequences than scrambled sequences (Experiment 1). In addition, action stars play a felicitous narrative role in the sequence (Experiment 2). Together, these results suggest that visual narratives use conventionalized depictions that demand the generation of inferences while retaining narrative coherence of a visual sequence. PMID:26709362
Action starring narratives and events: Structure and inference in visual narrative comprehension.
Cohn, Neil; Wittenberg, Eva
Studies of discourse have long placed focus on the inference generated by information that is not overtly expressed, and theories of visual narrative comprehension similarly focused on the inference generated between juxtaposed panels. Within the visual language of comics, star-shaped "flashes" commonly signify impacts, but can be enlarged to the size of a whole panel that can omit all other representational information. These "action star" panels depict a narrative culmination (a "Peak"), but have content which readers must infer, thereby posing a challenge to theories of inference generation in visual narratives that focus only on the semantic changes between juxtaposed images. This paper shows that action stars demand more inference than depicted events, and that they are more coherent in narrative sequences than scrambled sequences (Experiment 1). In addition, action stars play a felicitous narrative role in the sequence (Experiment 2). Together, these results suggest that visual narratives use conventionalized depictions that demand the generation of inferences while retaining narrative coherence of a visual sequence.
Elanchezhian, R; Sakthivel, M; Geraldine, P; Thomas, P A
2010-03-30
Differential expression of apoptotic genes has been demonstrated in selenite-induced cataract. Acetyl-l-carnitine (ALCAR) has been shown to prevent selenite cataractogenesis by maintaining lenticular antioxidant enzyme and redox system components at near normal levels and also by inhibiting lenticular calpain activity. The aim of the present experiment was to investigate the possibility that ALCAR also prevents selenite-induced cataractogenesis by regulating the expression of antioxidant (catalase) and apoptotic [caspase-3, early growth response protein-1 (EGR-1) and cytochrome c oxidase subunit I (COX-I)] genes. The experiment was conducted on 9-day-old Wistar rat pups, which were divided into normal, cataract-untreated and cataract-treated groups. Putative changes in gene expression in whole lenses removed from the rats were determined by measuring mRNA transcript levels of the four genes by RT-PCR analysis, using glyceraldehyde-3-phosphate dehydrogenase (GAPDH) as an internal control. The expression of lenticular caspase-3 and EGR-1 genes appeared to be upregulated, as inferred by detecting increased mRNA transcript levels, while that of COX-I and catalase genes appeared to be downregulated (lowered mRNA transcript levels) in the lenses of cataract-untreated rats. However, in rats treated with ALCAR, the lenticular mRNA transcript levels were maintained at near normal (control) levels. These results suggest that ALCAR may prevent selenite-induced cataractogenesis by preventing abnormal expression of lenticular genes governing apoptosis.
Alves, João Nuno; Muir, Elizabeth M; Andrews, Melissa R; Ward, Anneliese; Michelmore, Nicholas; Dasgupta, Debayan; Verhaagen, Joost; Moloney, Elizabeth B; Keynes, Roger J; Fawcett, James W; Rogers, John H
2014-04-30
As part of a project to express chondroitinase ABC (ChABC) in neurons of the central nervous system, we have inserted a modified ChABC gene into an adeno-associated viral (AAV) vector and injected it into the vibrissal motor cortex in adult rats to determine the extent and distribution of expression of the enzyme. A similar vector for expression of green fluorescent protein (GFP) was injected into the same location. For each vector, two versions with minor differences were used, giving similar results. After 4 weeks, the brains were stained to show GFP and products of chondroitinase digestion. Chondroitinase was widely expressed, and the AAV-ChABC and AAV-GFP vectors gave similar expression patterns in many respects, consistent with the known projections from the directly transduced neurons in vibrissal motor cortex and adjacent cingulate cortex. In addition, diffusion of vector to deeper neuronal populations led to labelling of remote projection fields which was much more extensive with AAV-ChABC than with AAV-GFP. The most notable of these populations are inferred to be neurons of cortical layer 6, projecting widely in the thalamus, and neurons of the anterior pole of the hippocampus, projecting through most of the hippocampus. We conclude that, whereas GFP does not label the thinnest axonal branches of some neuronal types, chondroitinase is efficiently secreted from these arborisations and enables their extent to be sensitively visualised. After 12 weeks, chondroitinase expression was undiminished. Copyright © 2014 Elsevier B.V. All rights reserved.
Expression Differentiation Is Constrained to Low-Expression Proteins over Ecological Timescales
Margres, Mark J.; Wray, Kenneth P.; Seavy, Margaret; McGivern, James J.; Herrera, Nathanael D.; Rokyta, Darin R.
2016-01-01
Protein expression level is one of the strongest predictors of protein sequence evolutionary rate, with high-expression protein sequences evolving at slower rates than low-expression protein sequences largely because of constraints on protein folding and function. Expression evolutionary rates also have been shown to be negatively correlated with expression level across human and mouse orthologs over relatively long divergence times (i.e., ∼100 million years). Long-term evolutionary patterns, however, often cannot be extrapolated to microevolutionary processes (and vice versa), and whether this relationship holds for traits evolving under directional selection within a single species over ecological timescales (i.e., <5000 years) is unknown and not necessarily expected. Expression is a metabolically costly process, and the expression level of a particular protein is predicted to be a tradeoff between the benefit of its function and the costs of its expression. Selection should drive the expression level of all proteins close to values that maximize fitness, particularly for high-expression proteins because of the increased energetic cost of production. Therefore, stabilizing selection may reduce the amount of standing expression variation for high-expression proteins, and in combination with physiological constraints that may place an upper bound on the range of beneficial expression variation, these constraints could severely limit the availability of beneficial expression variants. To determine whether rapid-expression evolution was restricted to low-expression proteins owing to these constraints on highly expressed proteins over ecological timescales, we compared venom protein expression levels across mainland and island populations for three species of pit vipers. We detected significant differentiation in protein expression levels in two of the three species and found that rapid-expression differentiation was restricted to low-expression proteins. Our results suggest that various constraints on high-expression proteins reduce the availability of beneficial expression variants relative to low-expression proteins, enabling low-expression proteins to evolve and potentially lead to more rapid adaptation. PMID:26546003
Expanding the view of Clock and cycle gene evolution in Diptera.
Chahad-Ehlers, S; Arthur, L P; Lima, A L A; Gesto, J S M; Torres, F R; Peixoto, A A; de Brito, R A
2017-06-01
We expanded the view of Clock (Clk) and cycle (cyc) gene evolution in Diptera by studying the fruit fly Anastrepha fraterculus (Afra), a Brachycera. Despite the high conservation of clock genes amongst insect groups, striking structural and functional differences of some clocks have appeared throughout evolution. Clk and cyc nucleotide sequences and corresponding proteins were characterized, along with their mRNA expression data, to provide an evolutionary overview in the two major groups of Diptera: Lower Diptera and Higher Brachycera. We found that AfraCYC lacks the BMAL (Brain and muscle ARNT-like) C-terminus region (BCTR) domain and is constitutively expressed, suggesting that AfraCLK has the main transactivation function, which is corroborated by the presence of poly-Q repeats and an oscillatory pattern. Our analysis suggests that the loss of BCTR in CYC is not exclusive of drosophilids, as it also occurs in other Acalyptratae flies such as tephritids and drosophilids, however, but it is also present in some Calyptratae, such as Muscidae, Calliphoridae and Sarcophagidae. This indicates that BCTR is missing from CYC of all higher-level Brachycera and that it was lost during the evolution of Lower Brachycera. Thus, we can infer that CLK protein may play the main role in the CLK\\CYC transcription complex in these flies, like in its Drosophila orthologues. © 2017 The Royal Entomological Society.
Chakraborty, Mitun; Goel, Manish; Chinnadayyala, Somasekhar R.; Dahiya, Ujjwal Ranjan; Ghosh, Siddhartha Sankar; Goswami, Pranab
2014-01-01
The alcohol oxidase (AOx) cDNA from Aspergillus terreus MTCC6324 with an open reading frame (ORF) of 2001 bp was constructed from n-hexadecane induced cells and expressed in Escherichia coli with a yield of ∼4.2 mg protein g−1 wet cell. The deduced amino acid sequences of recombinant rAOx showed maximum structural homology with the chain B of aryl AOx from Pleurotus eryngii. A functionally active AOx was achieved by incubating the apo-AOx with flavin adenine dinucleotide (FAD) for ∼80 h at 16°C and pH 9.0. The isoelectric point and mass of the apo-AOx were found to be 6.5±0.1 and ∼74 kDa, respectively. Circular dichroism data of the rAOx confirmed its ordered structure. Docking studies with an ab-initio protein model demonstrated the presence of a conserved FAD binding domain with an active substrate binding site. The rAOx was specific for aryl alcohols and the order of its substrate preference was 4-methoxybenzyl alcohol >3-methoxybenzyl alcohol>3, 4-dimethoxybenzyl alcohol > benzyl alcohol. A significantly high aggregation to ∼1000 nm (diameter) and catalytic efficiency (kcat/Km) of 7829.5 min−1 mM−1 for 4-methoxybenzyl alcohol was also demonstrated for rAOx. The results infer the novelty of the AOx and its potential biocatalytic application. PMID:24752075
Protein kinase Cα (PKCα) regulates bone architecture and osteoblast activity.
Galea, Gabriel L; Meakin, Lee B; Williams, Christopher M; Hulin-Curtis, Sarah L; Lanyon, Lance E; Poole, Alastair W; Price, Joanna S
2014-09-12
Bones' strength is achieved and maintained through adaptation to load bearing. The role of the protein kinase PKCα in this process has not been previously reported. However, we observed a phenotype in the long bones of Prkca(-/-) female but not male mice, in which bone tissue progressively invades the medullary cavity in the mid-diaphysis. This bone deposition progresses with age and is prevented by disuse but unaffected by ovariectomy. Castration of male Prkca(-/-) but not WT mice results in the formation of small amounts of intramedullary bone. Osteoblast differentiation markers and Wnt target gene expression were up-regulated in osteoblast-like cells derived from cortical bone of female Prkca(-/-) mice compared with WT. Additionally, although osteoblastic cells derived from WT proliferate following exposure to estradiol or mechanical strain, those from Prkca(-/-) mice do not. Female Prkca(-/-) mice develop splenomegaly and reduced marrow GBA1 expression reminiscent of Gaucher disease, in which PKC involvement has been suggested previously. From these data, we infer that in female mice, PKCα normally serves to prevent endosteal bone formation stimulated by load bearing. This phenotype appears to be suppressed by testicular hormones in male Prkca(-/-) mice. Within osteoblastic cells, PKCα enhances proliferation and suppresses differentiation, and this regulation involves the Wnt pathway. These findings implicate PKCα as a target gene for therapeutic approaches in low bone mass conditions. © 2014 by The American Society for Biochemistry and Molecular Biology, Inc.
Chakraborty, Mitun; Goel, Manish; Chinnadayyala, Somasekhar R; Dahiya, Ujjwal Ranjan; Ghosh, Siddhartha Sankar; Goswami, Pranab
2014-01-01
The alcohol oxidase (AOx) cDNA from Aspergillus terreus MTCC6324 with an open reading frame (ORF) of 2001 bp was constructed from n-hexadecane induced cells and expressed in Escherichia coli with a yield of ∼4.2 mg protein g-1 wet cell. The deduced amino acid sequences of recombinant rAOx showed maximum structural homology with the chain B of aryl AOx from Pleurotus eryngii. A functionally active AOx was achieved by incubating the apo-AOx with flavin adenine dinucleotide (FAD) for ∼80 h at 16°C and pH 9.0. The isoelectric point and mass of the apo-AOx were found to be 6.5±0.1 and ∼74 kDa, respectively. Circular dichroism data of the rAOx confirmed its ordered structure. Docking studies with an ab-initio protein model demonstrated the presence of a conserved FAD binding domain with an active substrate binding site. The rAOx was specific for aryl alcohols and the order of its substrate preference was 4-methoxybenzyl alcohol >3-methoxybenzyl alcohol>3, 4-dimethoxybenzyl alcohol > benzyl alcohol. A significantly high aggregation to ∼1000 nm (diameter) and catalytic efficiency (kcat/Km) of 7829.5 min-1 mM-1 for 4-methoxybenzyl alcohol was also demonstrated for rAOx. The results infer the novelty of the AOx and its potential biocatalytic application.
Fan, Yue; Wang, Xiao; Peng, Qinke
2017-01-01
Gene regulatory networks (GRNs) play an important role in cellular systems and are important for understanding biological processes. Many algorithms have been developed to infer the GRNs. However, most algorithms only pay attention to the gene expression data but do not consider the topology information in their inference process, while incorporating this information can partially compensate for the lack of reliable expression data. Here we develop a Bayesian group lasso with spike and slab priors to perform gene selection and estimation for nonparametric models. B-spline basis functions are used to capture the nonlinear relationships flexibly and penalties are used to avoid overfitting. Further, we incorporate the topology information into the Bayesian method as a prior. We present the application of our method on DREAM3 and DREAM4 datasets and two real biological datasets. The results show that our method performs better than existing methods and the topology information prior can improve the result.
Bayesian estimation of the discrete coefficient of determination.
Chen, Ting; Braga-Neto, Ulisses M
2016-12-01
The discrete coefficient of determination (CoD) measures the nonlinear interaction between discrete predictor and target variables and has had far-reaching applications in Genomic Signal Processing. Previous work has addressed the inference of the discrete CoD using classical parametric and nonparametric approaches. In this paper, we introduce a Bayesian framework for the inference of the discrete CoD. We derive analytically the optimal minimum mean-square error (MMSE) CoD estimator, as well as a CoD estimator based on the Optimal Bayesian Predictor (OBP). For the latter estimator, exact expressions for its bias, variance, and root-mean-square (RMS) are given. The accuracy of both Bayesian CoD estimators with non-informative and informative priors, under fixed or random parameters, is studied via analytical and numerical approaches. We also demonstrate the application of the proposed Bayesian approach in the inference of gene regulatory networks, using gene-expression data from a previously published study on metastatic melanoma.
Inferring Domain-Domain Interactions from Protein-Protein Interactions with Formal Concept Analysis
Khor, Susan
2014-01-01
Identifying reliable domain-domain interactions will increase our ability to predict novel protein-protein interactions, to unravel interactions in protein complexes, and thus gain more information about the function and behavior of genes. One of the challenges of identifying reliable domain-domain interactions is domain promiscuity. Promiscuous domains are domains that can occur in many domain architectures and are therefore found in many proteins. This becomes a problem for a method where the score of a domain-pair is the ratio between observed and expected frequencies because the protein-protein interaction network is sparse. As such, many protein-pairs will be non-interacting and domain-pairs with promiscuous domains will be penalized. This domain promiscuity challenge to the problem of inferring reliable domain-domain interactions from protein-protein interactions has been recognized, and a number of work-arounds have been proposed. This paper reports on an application of Formal Concept Analysis to this problem. It is found that the relationship between formal concepts provides a natural way for rare domains to elevate the rank of promiscuous domain-pairs and enrich highly ranked domain-pairs with reliable domain-domain interactions. This piggybacking of promiscuous domain-pairs onto less promiscuous domain-pairs is possible only with concept lattices whose attribute-labels are not reduced and is enhanced by the presence of proteins that comprise both promiscuous and rare domains. PMID:24586450
Campbell, Kieran R.
2016-01-01
Single cell gene expression profiling can be used to quantify transcriptional dynamics in temporal processes, such as cell differentiation, using computational methods to label each cell with a ‘pseudotime’ where true time series experimentation is too difficult to perform. However, owing to the high variability in gene expression between individual cells, there is an inherent uncertainty in the precise temporal ordering of the cells. Pre-existing methods for pseudotime estimation have predominantly given point estimates precluding a rigorous analysis of the implications of uncertainty. We use probabilistic modelling techniques to quantify pseudotime uncertainty and propagate this into downstream differential expression analysis. We demonstrate that reliance on a point estimate of pseudotime can lead to inflated false discovery rates and that probabilistic approaches provide greater robustness and measures of the temporal resolution that can be obtained from pseudotime inference. PMID:27870852
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ansong, Charles; Tolic, Nikola; Purvine, Samuel O.
Complete and accurate genome annotation is crucial for comprehensive and systematic studies of biological systems. For example systems biology-oriented genome scale modeling efforts greatly benefit from accurate annotation of protein-coding genes to develop proper functioning models. However, determining protein-coding genes for most new genomes is almost completely performed by inference, using computational predictions with significant documented error rates (> 15%). Furthermore, gene prediction programs provide no information on biologically important post-translational processing events critical for protein function. With the ability to directly measure peptides arising from expressed proteins, mass spectrometry-based proteomics approaches can be used to augment and verify codingmore » regions of a genomic sequence and importantly detect post-translational processing events. In this study we utilized “shotgun” proteomics to guide accurate primary genome annotation of the bacterial pathogen Salmonella Typhimurium 14028 to facilitate a systems-level understanding of Salmonella biology. The data provides protein-level experimental confirmation for 44% of predicted protein-coding genes, suggests revisions to 48 genes assigned incorrect translational start sites, and uncovers 13 non-annotated genes missed by gene prediction programs. We also present a comprehensive analysis of post-translational processing events in Salmonella, revealing a wide range of complex chemical modifications (70 distinct modifications) and confirming more than 130 signal peptide and N-terminal methionine cleavage events in Salmonella. This study highlights several ways in which proteomics data applied during the primary stages of annotation can improve the quality of genome annotations, especially with regards to the annotation of mature protein products.« less
Back, C R; Douglas, S K; Emerson, J E; Nobbs, A H; Jenkinson, H F
2015-10-01
Streptococcus gordonii SspA and SspB proteins, members of the antigen I/II (AgI/II) family of Streptococcus adhesins, mediate adherence to cysteine-rich scavenger glycoprotein gp340 and cells of other oral microbial species. In this article we investigated further the mechanism of coaggregation between S. gordonii DL1 and Actinomyces oris T14V. Previous mutational analysis of S. gordonii suggested that SspB was necessary for coaggregation with A. oris T14V. We have confirmed this by showing that Lactococcus lactis surrogate host cells expressing SspB coaggregated with A. oris T14V and PK606 cells, while L. lactis cells expressing SspA did not. Coaggregation occurred independently of expression of A. oris type 1 (FimP) or type 2 (FimA) fimbriae. Polysaccharide was prepared from cells of A. oris T14V and found to contain 1,4-, 4,6- and 3,4-linked glucose, 1,4-linked mannose, and 2,4-linked galactose residues. When immobilized onto plastic wells this polysaccharide supported binding of L. lactis expressing SspB, but not binding of L. lactis expressing other AgI/II family proteins. Purified recombinant NAVP region of SspB, comprising amino acid (aa) residues 41-847, bound A. oris polysaccharide but the C-domain (932-1470 aa residues) did not. A site-directed deletion of 29 aa residues (Δ691-718) close to the predicted binding cleft within the SspB V-region ablated binding of the NAVP region to polysaccharide. These results infer that the V-region head of SspB recognizes an actinomyces polysaccharide ligand, so further characterizing a lectin-like coaggregation mechanism occurring between two important primary colonizers. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Cavallini, Erika; Matus, José Tomás; Finezzo, Laura; Zenoni, Sara; Loyola, Rodrigo; Guzzo, Flavia; Schlechter, Rudolf; Ageorges, Agnès; Arce-Johnson, Patricio
2015-01-01
Because of the vast range of functions that phenylpropanoids possess, their synthesis requires precise spatiotemporal coordination throughout plant development and in response to the environment. The accumulation of these secondary metabolites is transcriptionally controlled by positive and negative regulators from the MYB and basic helix-loop-helix protein families. We characterized four grapevine (Vitis vinifera) R2R3-MYB proteins from the C2 repressor motif clade, all of which harbor the ethylene response factor-associated amphiphilic repression domain but differ in the presence of an additional TLLLFR repression motif found in the strong flavonoid repressor Arabidopsis (Arabidopsis thaliana) AtMYBL2. Constitutive expression of VvMYB4a and VvMYB4b in petunia (Petunia hybrida) repressed general phenylpropanoid biosynthetic genes and selectively reduced the amount of small-weight phenolic compounds. Conversely, transgenic petunia lines expressing VvMYBC2-L1 and VvMYBC2-L3 showed a severe reduction in petal anthocyanins and seed proanthocyanidins together with a higher pH of crude petal extracts. The distinct function of these regulators was further confirmed by transient expression in tobacco (Nicotiana benthamiana) leaves and grapevine plantlets. Finally, VvMYBC2-L3 was ectopically expressed in grapevine hairy roots, showing a reduction in proanthocyanidin content together with the down-regulation of structural and regulatory genes of the flavonoid pathway as revealed by a transcriptomic analysis. The physiological role of these repressors was inferred by combining the results of the functional analyses and their expression patterns in grapevine during development and in response to ultraviolet B radiation. Our results indicate that VvMYB4a and VvMYB4b may play a key role in negatively regulating the synthesis of small-weight phenolic compounds, whereas VvMYBC2-L1 and VvMYBC2-L3 may additionally fine tune flavonoid levels, balancing the inductive effects of transcriptional activators. PMID:25659381
ERIC Educational Resources Information Center
Henriques, Ana; Oliveira, Hélia
2016-01-01
This paper reports on the results of a study investigating the potential to embed Informal Statistical Inference in statistical investigations, using TinkerPlots, for assisting 8th grade students' informal inferential reasoning to emerge, particularly their articulations of uncertainty. Data collection included students' written work on a…
Inferring causal genomic alterations in breast cancer using gene expression data
2011-01-01
Background One of the primary objectives in cancer research is to identify causal genomic alterations, such as somatic copy number variation (CNV) and somatic mutations, during tumor development. Many valuable studies lack genomic data to detect CNV; therefore, methods that are able to infer CNVs from gene expression data would help maximize the value of these studies. Results We developed a framework for identifying recurrent regions of CNV and distinguishing the cancer driver genes from the passenger genes in the regions. By inferring CNV regions across many datasets we were able to identify 109 recurrent amplified/deleted CNV regions. Many of these regions are enriched for genes involved in many important processes associated with tumorigenesis and cancer progression. Genes in these recurrent CNV regions were then examined in the context of gene regulatory networks to prioritize putative cancer driver genes. The cancer driver genes uncovered by the framework include not only well-known oncogenes but also a number of novel cancer susceptibility genes validated via siRNA experiments. Conclusions To our knowledge, this is the first effort to systematically identify and validate drivers for expression based CNV regions in breast cancer. The framework where the wavelet analysis of copy number alteration based on expression coupled with the gene regulatory network analysis, provides a blueprint for leveraging genomic data to identify key regulatory components and gene targets. This integrative approach can be applied to many other large-scale gene expression studies and other novel types of cancer data such as next-generation sequencing based expression (RNA-Seq) as well as CNV data. PMID:21806811
Annotation-based inference of transporter function.
Lee, Thomas J; Paulsen, Ian; Karp, Peter
2008-07-01
We present a method for inferring and constructing transport reactions for transporter proteins based primarily on the analysis of the names of individual proteins in the genome annotation of an organism. Transport reactions are declarative descriptions of transporter activities, and thus can be manipulated computationally, unlike free-text protein names. Once transporter activities are encoded as transport reactions, a number of computational analyses are possible including database queries by transporter activity; inclusion of transporters into an automatically generated metabolic-map diagram that can be painted with omics data to aid in their interpretation; detection of anomalies in the metabolic and transport networks, such as substrates that are transported into the cell but are not inputs to any metabolic reaction or pathway; and comparative analyses of the transport capabilities of different organisms. On randomly selected organisms, the method achieves precision and recall rates of 0.93 and 0.90, respectively in identifying transporter proteins by name within the complete genome. The method obtains 67.5% accuracy in predicting complete transport reactions; if allowance is made for predictions that are overly general yet not incorrect, reaction prediction accuracy is 82.5%. The method is implemented as part of PathoLogic, the inference component of the Pathway Tools software. Pathway Tools is freely available to researchers at non-commercial institutions, including source code; a fee applies to commercial institutions. Supplementary data are available at Bioinformatics online.
MoCha: Molecular Characterization of Unknown Pathways.
Lobo, Daniel; Hammelman, Jennifer; Levin, Michael
2016-04-01
Automated methods for the reverse-engineering of complex regulatory networks are paving the way for the inference of mechanistic comprehensive models directly from experimental data. These novel methods can infer not only the relations and parameters of the known molecules defined in their input datasets, but also unknown components and pathways identified as necessary by the automated algorithms. Identifying the molecular nature of these unknown components is a crucial step for making testable predictions and experimentally validating the models, yet no specific and efficient tools exist to aid in this process. To this end, we present here MoCha (Molecular Characterization), a tool optimized for the search of unknown proteins and their pathways from a given set of known interacting proteins. MoCha uses the comprehensive dataset of protein-protein interactions provided by the STRING database, which currently includes more than a billion interactions from over 2,000 organisms. MoCha is highly optimized, performing typical searches within seconds. We demonstrate the use of MoCha with the characterization of unknown components from reverse-engineered models from the literature. MoCha is useful for working on network models by hand or as a downstream step of a model inference engine workflow and represents a valuable and efficient tool for the characterization of unknown pathways using known data from thousands of organisms. MoCha and its source code are freely available online under the GPLv3 license.
2018-01-01
Stoichiometric balance, or dosage balance, implies that proteins that are subunits of obligate complexes (e.g. the ribosome) should have copy numbers expressed to match their stoichiometry in that complex. Establishing balance (or imbalance) is an important tool for inferring subunit function and assembly bottlenecks. We show here that these correlations in protein copy numbers can extend beyond complex subunits to larger protein-protein interactions networks (PPIN) involving a range of reversible binding interactions. We develop a simple method for quantifying balance in any interface-resolved PPINs based on network structure and experimentally observed protein copy numbers. By analyzing such a network for the clathrin-mediated endocytosis (CME) system in yeast, we found that the real protein copy numbers were significantly more balanced in relation to their binding partners compared to randomly sampled sets of yeast copy numbers. The observed balance is not perfect, highlighting both under and overexpressed proteins. We evaluate the potential cost and benefits of imbalance using two criteria. First, a potential cost to imbalance is that ‘leftover’ proteins without remaining functional partners are free to misinteract. We systematically quantify how this misinteraction cost is most dangerous for strong-binding protein interactions and for network topologies observed in biological PPINs. Second, a more direct consequence of imbalance is that the formation of specific functional complexes depends on relative copy numbers. We therefore construct simple kinetic models of two sub-networks in the CME network to assess multi-protein assembly of the ARP2/3 complex and a minimal, nine-protein clathrin-coated vesicle forming module. We find that the observed, imperfectly balanced copy numbers are less effective than balanced copy numbers in producing fast and complete multi-protein assemblies. However, we speculate that strategic imbalance in the vesicle forming module allows cells to tune where endocytosis occurs, providing sensitive control over cargo uptake via clathrin-coated vesicles. PMID:29518071
Holland, David O; Johnson, Margaret E
2018-03-01
Stoichiometric balance, or dosage balance, implies that proteins that are subunits of obligate complexes (e.g. the ribosome) should have copy numbers expressed to match their stoichiometry in that complex. Establishing balance (or imbalance) is an important tool for inferring subunit function and assembly bottlenecks. We show here that these correlations in protein copy numbers can extend beyond complex subunits to larger protein-protein interactions networks (PPIN) involving a range of reversible binding interactions. We develop a simple method for quantifying balance in any interface-resolved PPINs based on network structure and experimentally observed protein copy numbers. By analyzing such a network for the clathrin-mediated endocytosis (CME) system in yeast, we found that the real protein copy numbers were significantly more balanced in relation to their binding partners compared to randomly sampled sets of yeast copy numbers. The observed balance is not perfect, highlighting both under and overexpressed proteins. We evaluate the potential cost and benefits of imbalance using two criteria. First, a potential cost to imbalance is that 'leftover' proteins without remaining functional partners are free to misinteract. We systematically quantify how this misinteraction cost is most dangerous for strong-binding protein interactions and for network topologies observed in biological PPINs. Second, a more direct consequence of imbalance is that the formation of specific functional complexes depends on relative copy numbers. We therefore construct simple kinetic models of two sub-networks in the CME network to assess multi-protein assembly of the ARP2/3 complex and a minimal, nine-protein clathrin-coated vesicle forming module. We find that the observed, imperfectly balanced copy numbers are less effective than balanced copy numbers in producing fast and complete multi-protein assemblies. However, we speculate that strategic imbalance in the vesicle forming module allows cells to tune where endocytosis occurs, providing sensitive control over cargo uptake via clathrin-coated vesicles.
2014-01-01
Background Network inference of gene expression data is an important challenge in systems biology. Novel algorithms may provide more detailed gene regulatory networks (GRN) for complex, chronic inflammatory diseases such as rheumatoid arthritis (RA), in which activated synovial fibroblasts (SFBs) play a major role. Since the detailed mechanisms underlying this activation are still unclear, simultaneous investigation of multi-stimuli activation of SFBs offers the possibility to elucidate the regulatory effects of multiple mediators and to gain new insights into disease pathogenesis. Methods A GRN was therefore inferred from RA-SFBs treated with 4 different stimuli (IL-1 β, TNF- α, TGF- β, and PDGF-D). Data from time series microarray experiments (0, 1, 2, 4, 12 h; Affymetrix HG-U133 Plus 2.0) were batch-corrected applying ‘ComBat’, analyzed for differentially expressed genes over time with ‘Limma’, and used for the inference of a robust GRN with NetGenerator V2.0, a heuristic ordinary differential equation-based method with soft integration of prior knowledge. Results Using all genes differentially expressed over time in RA-SFBs for any stimulus, and selecting the genes belonging to the most significant gene ontology (GO) term, i.e., ‘cartilage development’, a dynamic, robust, moderately complex multi-stimuli GRN was generated with 24 genes and 57 edges in total, 31 of which were gene-to-gene edges. Prior literature-based knowledge derived from Pathway Studio or manual searches was reflected in the final network by 25/57 confirmed edges (44%). The model contained known network motifs crucial for dynamic cellular behavior, e.g., cross-talk among pathways, positive feed-back loops, and positive feed-forward motifs (including suppression of the transcriptional repressor OSR2 by all 4 stimuli. Conclusion A multi-stimuli GRN highly concordant with literature data was successfully generated by network inference from the gene expression of stimulated RA-SFBs. The GRN showed high reliability, since 10 predicted edges were independently validated by literature findings post network inference. The selected GO term ‘cartilage development’ contained a number of differentiation markers, growth factors, and transcription factors with potential relevance for RA. Finally, the model provided new insight into the response of RA-SFBs to multiple stimuli implicated in the pathogenesis of RA, in particular to the ‘novel’ potent growth factor PDGF-D. PMID:24989895
Padua, Maria B.; Lynch, Vincent J.; Alvarez, Natalia V.; Garthwaite, Mark A.; Golos, Thaddeus G.; Bazer, Fuller W.; Kalkunte, Satyan; Sharma, Surendra; Wagner, Gunter P.; Hansen, Peter J.
2012-01-01
ABSTRACT Type 5 acid phosphatase (ACP5; also known as tartrate-resistant acid phosphatase or uteroferrin) is a metalloprotein secreted by the endometrial glandular epithelium of pigs, mares, sheep, and water buffalo. In this paper, we describe the phylogenetic distribution of endometrial expression of ACP5 and demonstrate that endometrial expression arose early in evolution (i.e., before divergence of prototherian and therian mammals ∼166 million years ago). To determine expression of ACP5 in the pregnant endometrium, RNA was isolated from rhesus, mouse, rat, dog, sheep, cow, horse, armadillo, opossum, and duck-billed platypus. Results from RT-PCR and RNA-Seq experiments confirmed that ACP5 is expressed in all species examined. ACP5 was also demonstrated immunochemically in endometrium of rhesus, marmoset, sheep, cow, goat, and opossum. Alignment of inferred amino acid sequences shows a high conservation of ACP5 throughout speciation, with species-specific differences most extensive in the N-terminal and C-terminal regions of the protein. Analysis by Selecton indicated that most of the sites in ACP5 are undergoing purifying selection, and no sites undergoing positive selection were found. In conclusion, endometrial expression of ACP5 is a common feature in all orders of mammals and has been subjected to purifying selection. Expression of ACP5 in the uterus predates the divergence of therians and prototherians. ACP5 is an evolutionary conserved gene that likely exerts a common function important for pregnancy in mammals using a wide range of reproductive strategies. PMID:22278982
Sekiguchi, Toshio; Kuwasako, Kenji; Ogasawara, Michio; Takahashi, Hiroki; Matsubara, Shin; Osugi, Tomohiro; Muramatsu, Ikunobu; Sasayama, Yuichi; Suzuki, Nobuo; Satake, Honoo
2016-01-29
The calcitonin (CT)/CT gene-related peptide (CGRP) family is conserved in vertebrates. The activities of this peptide family are regulated by a combination of two receptors, namely the calcitonin receptor (CTR) and the CTR-like receptor (CLR), and three receptor activity-modifying proteins (RAMPs). Furthermore, RAMPs act as escort proteins by translocating CLR to the cell membrane. Recently, CT/CGRP family peptides have been identified or inferred in several invertebrates. However, the molecular characteristics and relevant functions of the CTR/CLR and RAMPs in invertebrates remain unclear. In this study, we identified three CT/CGRP family peptides (Bf-CTFPs), one CTR/CLR-like receptor (Bf-CTFP-R), and three RAMP-like proteins (Bf-RAMP-LPs) in the basal chordate amphioxus (Branchiostoma floridae). The Bf-CTFPs were shown to possess an N-terminal circular region typical of the CT/CGRP family and a C-terminal Pro-NH2. The Bf-CTFP genes were expressed in the central nervous system and in endocrine cells of the midgut, indicating that Bf-CTFPs serve as brain and/or gut peptides. Cell surface expression of the Bf-CTFP-R was enhanced by co-expression with each Bf-RAMP-LP. Furthermore, Bf-CTFPs activated Bf-CTFP-R·Bf-RAMP-LP complexes, resulting in cAMP accumulation. These results confirmed that Bf-RAMP-LPs, like vertebrate RAMPs, are prerequisites for the function and translocation of the Bf-CTFP-R. The relative potencies of the three peptides at each receptor were similar. Bf-CTFP2 was a potent ligand at all receptors in cAMP assays. Bf-RAMP-LP effects on ligand potency order were distinct to vertebrate CGRP/adrenomedullin/amylin receptors. To the best of our knowledge, this is the first molecular and functional characterization of an authentic invertebrate CT/CGRP family receptor and RAMPs. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.
Comparative analysis of genomics and proteomics in Bacillus thuringiensis 4.0718.
Rang, Jie; He, Hao; Wang, Ting; Ding, Xuezhi; Zuo, Mingxing; Quan, Meifang; Sun, Yunjun; Yu, Ziquan; Hu, Shengbiao; Xia, Liqiu
2015-01-01
Bacillus thuringiensis is a widely used biopesticide that produced various insecticidal active substances during its life cycle. Separation and purification of numerous insecticide active substances have been difficult because of the relatively short half-life of such substances. On the other hand, substances can be synthetized at different times during development, so samples at different stages have to be studied, further complicating the analysis. A dual genomic and proteomic approach would enhance our ability to identify such substances, and particularily using mass spectrometry-based proteomic methods. The comparative analysis for genomic and proteomic data have showed that not all of the products deduced from the annotated genome could be identified among the proteomic data. For instance, genome annotation results showed that 39 coding sequences in the whole genome were related to insect pathogenicity, including five cry genes. However, Cry2Ab, Cry1Ia, Cytotoxin K, Bacteriocin, Exoenzyme C3 and Alveolysin could not be detected in the proteomic data obtained. The sporulation-related proteins were also compared analysis, results showed that the great majority sporulation-related proteins can be detected by mass spectrometry. This analysis revealed Spo0A~P, SigF, SigE(+), SigK(+) and SigG(+), all known to play an important role in the process of spore formation regulatory network, also were displayed in the proteomic data. Through the comparison of the two data sets, it was possible to infer that some genes were silenced or were expressed at very low levels. For instance, found that cry2Ab seems to lack a functional promoter while cry1Ia may not be expressed due to the presence of transposons. With this comparative study a relatively complete database can be constructed and used to transform hereditary material, thereby prompting the high expression of toxic proteins. A theoretical basis is provided for constructing highly virulent engineered bacteria and for promoting the application of proteogenomics in the life sciences.
dynGENIE3: dynamical GENIE3 for the inference of gene networks from time series expression data.
Huynh-Thu, Vân Anh; Geurts, Pierre
2018-02-21
The elucidation of gene regulatory networks is one of the major challenges of systems biology. Measurements about genes that are exploited by network inference methods are typically available either in the form of steady-state expression vectors or time series expression data. In our previous work, we proposed the GENIE3 method that exploits variable importance scores derived from Random forests to identify the regulators of each target gene. This method provided state-of-the-art performance on several benchmark datasets, but it could however not specifically be applied to time series expression data. We propose here an adaptation of the GENIE3 method, called dynamical GENIE3 (dynGENIE3), for handling both time series and steady-state expression data. The proposed method is evaluated extensively on the artificial DREAM4 benchmarks and on three real time series expression datasets. Although dynGENIE3 does not systematically yield the best performance on each and every network, it is competitive with diverse methods from the literature, while preserving the main advantages of GENIE3 in terms of scalability.
Oeo-Santos, Carmen; Mas, Salvador; Benedé, Sara; López-Lucendo, María; Quiralte, Joaquín; Blanca, Miguel; Mayorga, Cristobalina; Villalba, Mayte; Barderas, Rodrigo
2018-06-05
The allergenic non-specific lipid transfer protein Ole e 7 from olive pollen is a major allergen associated with severe symptoms in areas with high olive pollen levels. Despite its clinical importance, its cloning and recombinant production has been unable by classical approaches. This study aimed at determining by mass-spectrometry based proteomics its complete amino acid sequence for its subsequent expression and characterization. To this end, the natural protein was in-2D-gel tryptic digested, and CID and HCD fragmentation spectra obtained by nLC-MS/MS analyzed using PEAKS software. Thirteen out of the 457 de novo sequenced peptides obtained allowed assembling its full-length amino acid sequence. Then, Ole e 7-encoding cDNA was synthesized and cloned in pPICZαA vector for its expression in Pichia pastoris yeast. The analyses by Circular Dichroism, and WB, ELISA and cell-based tests using sera and blood from olive pollen-sensitized patients showed that rOle e 7 mostly retained the structural, allergenic and antigenic properties of the natural allergen. In summary, rOle e 7 allergen assembled by de novo peptide sequencing by MS behaved immunologically similar to the natural allergen scarcely isolated from pollen. Olive pollen is an important cause of allergy. The non-specific lipid binding protein Ole e 7 is a major allergen with a high incidence and a phenotype associated to severe clinical symptoms. Despite its relevance, its cloning and recombinant expression has been unable by classical techniques. Here, we have inferred the primary amino acid sequence of Ole e 7 by mass-spectrometry. We separated Ole e 7 isolated from pollen by 2DE. After in-gel digestion with trypsin and a direct analysis by nLC-MS/MS in an LTQ-Orbitrap Velos, we got the complete de novo sequenced peptides repertoire that allowed the assembling of the primary sequence of Ole e 7. After its protein expression, purification to homogeneity, and structural and immunological characterization using sera from olive pollen allergic patients and cell-based assays, we observed that the recombinant allergen retained the antigenic and allergenic properties of the natural allergen. Collectively, we show that the recombinant protein assembled by proteomics would be suitable for a better in vitro diagnosis of olive pollen allergic patients. Copyright © 2018. Published by Elsevier B.V.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fu Guo; Institute of Neuroscience, Department of Neurobiology, Second Military Medical University, Shanghai 200433; Yang Huayan
Macrophage differentiation antigen associated with complement three receptor function (Mac-1) belongs to {beta}{sub 2} subfamily of integrins that mediate important cell-cell and cell-extracellular matrix interactions. Biochemical studies have indicated that Mac-1 is a constitutive heterodimer in vitro. Here, we detected the heterodimerization of Mac-1 subunits in living cells by means of two fluorescence resonance energy transfer (FRET) techniques (fluorescence microscopy and fluorescence spectroscopy) and our results demonstrated that there is constitutive heterodimerization of the Mac-1 subunits and this constitutive heterodimerization of the Mac-1 subunits is cell-type independent. Through FRET imaging, we found that heterodimers of Mac-1 mainly localized in plasmamore » membrane, perinuclear, and Golgi area in living cells. Furthermore, through analysis of the estimated physical distances between cyan fluorescent protein (CFP) and yellow fluorescent protein (YFP) fused to Mac-1 subunits, we suggested that the conformation of Mac-1 subunits is not affected by the fusion of CFP or YFP and inferred that Mac-1 subunits take different conformation when expressed in Chinese hamster ovary (CHO) and human embryonic kidney (HEK) 293T cells, respectively.« less
Limited utility of residue masking for positive-selection inference.
Spielman, Stephanie J; Dawson, Eric T; Wilke, Claus O
2014-09-01
Errors in multiple sequence alignments (MSAs) can reduce accuracy in positive-selection inference. Therefore, it has been suggested to filter MSAs before conducting further analyses. One widely used filter, Guidance, allows users to remove MSA positions aligned with low confidence. However, Guidance's utility in positive-selection inference has been disputed in the literature. We have conducted an extensive simulation-based study to characterize fully how Guidance impacts positive-selection inference, specifically for protein-coding sequences of realistic divergence levels. We also investigated whether novel scoring algorithms, which phylogenetically corrected confidence scores, and a new gap-penalization score-normalization scheme improved Guidance's performance. We found that no filter, including original Guidance, consistently benefitted positive-selection inferences. Moreover, all improvements detected were exceedingly minimal, and in certain circumstances, Guidance-based filters worsened inferences. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
From pull-down data to protein interaction networks and complexes with biological relevance.
Zhang, Bing; Park, Byung-Hoon; Karpinets, Tatiana; Samatova, Nagiza F
2008-04-01
Recent improvements in high-throughput Mass Spectrometry (MS) technology have expedited genome-wide discovery of protein-protein interactions by providing a capability of detecting protein complexes in a physiological setting. Computational inference of protein interaction networks and protein complexes from MS data are challenging. Advances are required in developing robust and seamlessly integrated procedures for assessment of protein-protein interaction affinities, mathematical representation of protein interaction networks, discovery of protein complexes and evaluation of their biological relevance. A multi-step but easy-to-follow framework for identifying protein complexes from MS pull-down data is introduced. It assesses interaction affinity between two proteins based on similarity of their co-purification patterns derived from MS data. It constructs a protein interaction network by adopting a knowledge-guided threshold selection method. Based on the network, it identifies protein complexes and infers their core components using a graph-theoretical approach. It deploys a statistical evaluation procedure to assess biological relevance of each found complex. On Saccharomyces cerevisiae pull-down data, the framework outperformed other more complicated schemes by at least 10% in F(1)-measure and identified 610 protein complexes with high-functional homogeneity based on the enrichment in Gene Ontology (GO) annotation. Manual examination of the complexes brought forward the hypotheses on cause of false identifications. Namely, co-purification of different protein complexes as mediated by a common non-protein molecule, such as DNA, might be a source of false positives. Protein identification bias in pull-down technology, such as the hydrophilic bias could result in false negatives.
Identification and analysis of MKK and MPK gene families in canola (Brassica napus L.).
Liang, Wanwan; Yang, Bo; Yu, Bao-Jun; Zhou, Zili; Li, Cui; Jia, Ming; Sun, Yun; Zhang, Yue; Wu, Feifei; Zhang, Hanfeng; Wang, Boya; Deyholos, Michael K; Jiang, Yuan-Qing
2013-06-11
Eukaryotic mitogen-activated protein kinase (MAPK/MPK) signaling cascades transduce and amplify environmental signals via three types of reversibly phosphorylated kinases to activate defense gene expression. Canola (oilseed rape, Brassica napus) is a major crop in temperate regions. Identification and characterization of MAPK and MAPK kinases (MAPKK/MKK) of canola will help to elucidate their role in responses to abiotic and biotic stresses. We describe the identification and analysis of seven MKK (BnaMKK) and 12 MPK (BnaMPK) members from canola. Sequence alignments and phylogenetic analyses of the predicted amino acid sequences of BnaMKKs and BnaMPKs classified them into four different groups. We also examined the subcellular localization of four and two members of BnaMKK and BnaMPK gene families, respectively, using green fluorescent protein (GFP) and, found GFP signals in both nuclei and cytoplasm. Furthermore, we identified several interesting interaction pairs through yeast two-hybrid (Y2H) analysis of interactions between BnaMKKs and BnaMPKs, as well as BnaMPK and BnaWRKYs. We defined contiguous signaling modules including BnaMKK9-BnaMPK1/2-BnaWRKY53, BnaMKK2/4/5-BnaMPK3/6-BnaWRKY20/26 and BnaMKK9-BnaMPK5/9/19/20. Of these, several interactions had not been previously described in any species. Selected interactions were validated in vivo by a bimolecular fluorescence complementation (BiFC) assay. Transcriptional responses of a subset of canola MKK and MPK genes to stimuli including fungal pathogens, hormones and abiotic stress treatments were analyzed through real-time RT-PCR and we identified a few of BnaMKKs and BnaMPKs responding to salicylic acid (SA), oxalic acid (OA), Sclerotinia sclerotiorum or other stress conditions. Comparisons of expression patterns of putative orthologs in canola and Arabidopsis showed that transcript expression patterns were generally conserved, with some differences suggestive of sub-functionalization. We identified seven MKK and 12 MPK genes from canola and examined their phylogenetic relationships, transcript expression patterns, subcellular localization, and protein-protein interactions. Not all expression patterns and interactions were conserved between canola and Arabidopsis, highlighting the limitations of drawing inferences about crops from model species. The data presented here provide the first systematic description of MKK-MPK-WRKY signaling modules in canola and will further improve our understanding of defense responses in general and provide a basis for future crop improvement.
Ford, Janet A; Milosky, Linda M
2008-04-01
This study examined whether young children with typical language development (TL) and children with language impairment (LI) make emotion inferences online during the process of discourse comprehension, identified variables that predict emotion inferencing, and explored the relationship of these variables to social competence. Preschool children (16 TL and 16 LI) watched narrated videos designed to activate knowledge about a particular emotional state. Following each story, children named a facial expression that either matched or did not match the anticipated emotion. Several experimental tasks examined linguistic and nonlinguistic abilities. Finally, each child's teacher completed a measure of social competence. Children with TL named expressions significantly more slowly in the mismatched condition than in the matched condition, whereas children with LI did not differ in response times between the conditions. Language and vocal response time measures were related to emotion inferencing ability, and this ability predicted social competence scores. The findings suggest that children with TL are inferring emotions during the comprehension process, whereas children with LI often fail to make these inferences. Making emotion inferences is related to discourse comprehension and to social competence in children. The current findings provide evidence that language and vocal response time measures predicted inferencing ability and suggest that additional factors may influence discourse inferencing and social competence.
Activity, specificity, and probe design for the smallpox virus protease K7L.
Aleshin, Alexander E; Drag, Marcin; Gombosuren, Naran; Wei, Ge; Mikolajczyk, Jowita; Satterthwait, Arnold C; Strongin, Alex Y; Liddington, Robert C; Salvesen, Guy S
2012-11-16
The K7L gene product of the smallpox virus is a protease implicated in the maturation of viral proteins. K7L belongs to protease Clan CE, which includes distantly related cysteine proteases from eukaryotes, pathogenic bacteria, and viruses. Here, we describe its recombinant high level expression, biochemical mechanism, substrate preference, and regulation. Earlier studies inferred that the orthologous I7L vaccinia protease cleaves at an AG-X motif in six viral proteins. Our data for K7L suggest that the AG-X motif is necessary but not sufficient for optimal cleavage activity. Thus, K7L requires peptides extended into the P7 and P8 positions for efficient substrate cleavage. Catalytic activity of K7L is substantially enhanced by homodimerization, by the substrate protein P25K as well as by glycerol. RNA and DNA also enhance cleavage of the P25K protein but not of synthetic peptides, suggesting that nucleic acids augment the interaction of K7L with its protein substrate. Library-based peptide preference analyses enabled us to design an activity-based probe that covalently and selectively labels K7L in lysates of transfected and infected cells. Our study thus provides proof-of-concept for the design of inhibitors and probes that may contribute both to a better understanding of the role of K7L in the virus life cycle and the design of novel anti-virals.
Probabilistic grammatical model for helix‐helix contact site classification
2013-01-01
Background Hidden Markov Models power many state‐of‐the‐art tools in the field of protein bioinformatics. While excelling in their tasks, these methods of protein analysis do not convey directly information on medium‐ and long‐range residue‐residue interactions. This requires an expressive power of at least context‐free grammars. However, application of more powerful grammar formalisms to protein analysis has been surprisingly limited. Results In this work, we present a probabilistic grammatical framework for problem‐specific protein languages and apply it to classification of transmembrane helix‐helix pairs configurations. The core of the model consists of a probabilistic context‐free grammar, automatically inferred by a genetic algorithm from only a generic set of expert‐based rules and positive training samples. The model was applied to produce sequence based descriptors of four classes of transmembrane helix‐helix contact site configurations. The highest performance of the classifiers reached AUCROC of 0.70. The analysis of grammar parse trees revealed the ability of representing structural features of helix‐helix contact sites. Conclusions We demonstrated that our probabilistic context‐free framework for analysis of protein sequences outperforms the state of the art in the task of helix‐helix contact site classification. However, this is achieved without necessarily requiring modeling long range dependencies between interacting residues. A significant feature of our approach is that grammar rules and parse trees are human‐readable. Thus they could provide biologically meaningful information for molecular biologists. PMID:24350601
Improved orthologous databases to ease protozoan targets inference.
Kotowski, Nelson; Jardim, Rodrigo; Dávila, Alberto M R
2015-09-29
Homology inference helps on identifying similarities, as well as differences among organisms, which provides a better insight on how closely related one might be to another. In addition, comparative genomics pipelines are widely adopted tools designed using different bioinformatics applications and algorithms. In this article, we propose a methodology to build improved orthologous databases with the potential to aid on protozoan target identification, one of the many tasks which benefit from comparative genomics tools. Our analyses are based on OrthoSearch, a comparative genomics pipeline originally designed to infer orthologs through protein-profile comparison, supported by an HMM, reciprocal best hits based approach. Our methodology allows OrthoSearch to confront two orthologous databases and to generate an improved new one. Such can be later used to infer potential protozoan targets through a similarity analysis against the human genome. The protein sequences of Cryptosporidium hominis, Entamoeba histolytica and Leishmania infantum genomes were comparatively analyzed against three orthologous databases: (i) EggNOG KOG, (ii) ProtozoaDB and (iii) Kegg Orthology (KO). That allowed us to create two new orthologous databases, "KO + EggNOG KOG" and "KO + EggNOG KOG + ProtozoaDB", with 16,938 and 27,701 orthologous groups, respectively. Such new orthologous databases were used for a regular OrthoSearch run. By confronting "KO + EggNOG KOG" and "KO + EggNOG KOG + ProtozoaDB" databases and protozoan species we were able to detect the following total of orthologous groups and coverage (relation between the inferred orthologous groups and the species total number of proteins): Cryptosporidium hominis: 1,821 (11 %) and 3,254 (12 %); Entamoeba histolytica: 2,245 (13 %) and 5,305 (19 %); Leishmania infantum: 2,702 (16 %) and 4,760 (17 %). Using our HMM-based methodology and the largest created orthologous database, it was possible to infer 13 orthologous groups which represent potential protozoan targets; these were found because of our distant homology approach. We also provide the number of species-specific, pair-to-pair and core groups from such analyses, depicted in Venn diagrams. The orthologous databases generated by our HMM-based methodology provide a broader dataset, with larger amounts of orthologous groups when compared to the original databases used as input. Those may be used for several homology inference analyses, annotation tasks and protozoan targets identification.
Lauric Acid Accelerates Glycolytic Muscle Fiber Formation through TLR4 Signaling.
Wang, Leshan; Luo, Lv; Zhao, Weijie; Yang, Kelin; Shu, Gang; Wang, Songbo; Gao, Ping; Zhu, Xiaotong; Xi, Qianyun; Zhang, Yongliang; Jiang, Qingyan; Wang, Lina
2018-06-18
Lauric acid (LA), which is the primary fatty acid in coconut oil, was reported to have many metabolic benefits. TLR4 is a common receptor of lipopolysaccharides and involved mainly in inflammation responses. Here, we focused on the effects of LA on skeletal muscle fiber types and metabolism. We found that 200 μM LA treatment in C2C12 or dietary supplementation of 1% LA increased MHCIIb protein expression and the proportion of type IIb muscle fibers from 0.452 ± 0.0165 to 0.572 ± 0.0153, increasing the mRNA expression of genes involved in glycolysis, such as HK2 and LDH2 (from 1.00 ± 0.110 to 1.35 ± 0.0843 and from 1.00 ± 0.123 to 1.71 ± 0.302 in vivo, respectively), decreasing the catalytic activity of lactate dehydrogenase (LDH), and transforming lactic acid to pyruvic acid. Furthermore, LA activated TLR4 signaling, and TLR4 knockdown reversed the effect of LA on muscle fiber type and glycolysis. Thus, we inferred that LA promoted glycolytic fiber formation through TLR4 signaling.
Domain repertoires as a tool to derive protein recognition rules.
Zucconi, A; Panni, S; Paoluzi, S; Castagnoli, L; Dente, L; Cesareni, G
2000-08-25
Several approaches, some of which are described in this issue, have been proposed to assemble a complete protein interaction map. These are often based on high throughput methods that explore the ability of each gene product to bind any other element of the proteome of the organism. Here we propose that a large number of interactions can be inferred by revealing the rules underlying recognition specificity of a small number (a few hundreds) of families of protein recognition modules. This can be achieved through the construction and characterization of domain repertoires. A domain repertoire is assembled in a combinatorial fashion by allowing each amino acid position in the binding site of a given protein recognition domain to vary to include all the residues allowed at that position in the domain family. The repertoire is then searched by phage display techniques with any target of interest and from the primary structure of the binding site of the selected domains one derives rules that are used to infer the formation of complexes between natural proteins in the cell.
Gene regulatory network inference using fused LASSO on multiple data sets
Omranian, Nooshin; Eloundou-Mbebi, Jeanne M. O.; Mueller-Roeber, Bernd; Nikoloski, Zoran
2016-01-01
Devising computational methods to accurately reconstruct gene regulatory networks given gene expression data is key to systems biology applications. Here we propose a method for reconstructing gene regulatory networks by simultaneous consideration of data sets from different perturbation experiments and corresponding controls. The method imposes three biologically meaningful constraints: (1) expression levels of each gene should be explained by the expression levels of a small number of transcription factor coding genes, (2) networks inferred from different data sets should be similar with respect to the type and number of regulatory interactions, and (3) relationships between genes which exhibit similar differential behavior over the considered perturbations should be favored. We demonstrate that these constraints can be transformed in a fused LASSO formulation for the proposed method. The comparative analysis on transcriptomics time-series data from prokaryotic species, Escherichia coli and Mycobacterium tuberculosis, as well as a eukaryotic species, mouse, demonstrated that the proposed method has the advantages of the most recent approaches for regulatory network inference, while obtaining better performance and assigning higher scores to the true regulatory links. The study indicates that the combination of sparse regression techniques with other biologically meaningful constraints is a promising framework for gene regulatory network reconstructions. PMID:26864687
Newton, Richard; Wernisch, Lorenz
2014-01-01
Inferring gene regulatory relationships from observational data is challenging. Manipulation and intervention is often required to unravel causal relationships unambiguously. However, gene copy number changes, as they frequently occur in cancer cells, might be considered natural manipulation experiments on gene expression. An increasing number of data sets on matched array comparative genomic hybridisation and transcriptomics experiments from a variety of cancer pathologies are becoming publicly available. Here we explore the potential of a meta-analysis of thirty such data sets. The aim of our analysis was to assess the potential of in silico inference of trans-acting gene regulatory relationships from this type of data. We found sufficient correlation signal in the data to infer gene regulatory relationships, with interesting similarities between data sets. A number of genes had highly correlated copy number and expression changes in many of the data sets and we present predicted potential trans-acted regulatory relationships for each of these genes. The study also investigates to what extent heterogeneity between cell types and between pathologies determines the number of statistically significant predictions available from a meta-analysis of experiments. PMID:25148247
Extended abstract: Managing disjunction for practical temporal reasoning
NASA Technical Reports Server (NTRS)
Boddy, Mark; Schrag, Bob; Carciofini, Jim
1992-01-01
One of the problems that must be dealt with in either a formal or implemented temporal reasoning system is the ambiguity arising from uncertain information. Lack of precise information about when events happen leads to uncertainty regarding the effects of those events. Incomplete information and nonmonotonic inference lead to situations where there is more than one set of possible inferences, even when there is no temporal uncertainty at all. In an implemented system, this ambiguity is a computational problem as well as a semantic one. In this paper, we discuss some of the sources of this ambiguity, which we will treat as explicit disjunction, in the sense that ambiguous information can be interpreted as defining a set of possible inferences. We describe the application of three techniques for managing disjunction in an implementation of Dean's Time Map Manager. Briefly, the disjunction is either: removed by limiting the expressive power of the system, or approximated by a weaker form of representation that subsumes the disjunction. We use a combination of these methods to implement an expressive and efficient temporal reasoning engine that performs sound inference in accordance with a well-defined formal semantics.
Principal Component Analysis: A Method for Determining the Essential Dynamics of Proteins
David, Charles C.; Jacobs, Donald J.
2015-01-01
It has become commonplace to employ principal component analysis to reveal the most important motions in proteins. This method is more commonly known by its acronym, PCA. While most popular molecular dynamics packages inevitably provide PCA tools to analyze protein trajectories, researchers often make inferences of their results without having insight into how to make interpretations, and they are often unaware of limitations and generalizations of such analysis. Here we review best practices for applying standard PCA, describe useful variants, discuss why one may wish to make comparison studies, and describe a set of metrics that make comparisons possible. In practice, one will be forced to make inferences about the essential dynamics of a protein without having the desired amount of samples. Therefore, considerable time is spent on describing how to judge the significance of results, highlighting pitfalls. The topic of PCA is reviewed from the perspective of many practical considerations, and useful recipes are provided. PMID:24061923
Principal component analysis: a method for determining the essential dynamics of proteins.
David, Charles C; Jacobs, Donald J
2014-01-01
It has become commonplace to employ principal component analysis to reveal the most important motions in proteins. This method is more commonly known by its acronym, PCA. While most popular molecular dynamics packages inevitably provide PCA tools to analyze protein trajectories, researchers often make inferences of their results without having insight into how to make interpretations, and they are often unaware of limitations and generalizations of such analysis. Here we review best practices for applying standard PCA, describe useful variants, discuss why one may wish to make comparison studies, and describe a set of metrics that make comparisons possible. In practice, one will be forced to make inferences about the essential dynamics of a protein without having the desired amount of samples. Therefore, considerable time is spent on describing how to judge the significance of results, highlighting pitfalls. The topic of PCA is reviewed from the perspective of many practical considerations, and useful recipes are provided.
Complete fold annotation of the human proteome using a novel structural feature space
Middleton, Sarah A.; Illuminati, Joseph; Kim, Junhyong
2017-04-13
Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 94% accuracy even when many folds have only one training example. We demonstrate the utility of this methodmore » by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into potential biological function, such as prediction of RNA-binding ability. Finally, our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families.« less
Fuertes, Gustavo; Banterle, Niccolò; Ruff, Kiersten M.; Chowdhury, Aritra; Mercadante, Davide; Koehler, Christine; Kachala, Michael; Estrada Girona, Gemma; Milles, Sigrid; Mishra, Ankur; Onck, Patrick R.; Gräter, Frauke; Esteban-Martín, Santiago; Pappu, Rohit V.; Svergun, Dmitri I.; Lemke, Edward A.
2017-01-01
Unfolded states of proteins and native states of intrinsically disordered proteins (IDPs) populate heterogeneous conformational ensembles in solution. The average sizes of these heterogeneous systems, quantified by the radius of gyration (RG), can be measured by small-angle X-ray scattering (SAXS). Another parameter, the mean dye-to-dye distance (RE) for proteins with fluorescently labeled termini, can be estimated using single-molecule Förster resonance energy transfer (smFRET). A number of studies have reported inconsistencies in inferences drawn from the two sets of measurements for the dimensions of unfolded proteins and IDPs in the absence of chemical denaturants. These differences are typically attributed to the influence of fluorescent labels used in smFRET and to the impact of high concentrations and averaging features of SAXS. By measuring the dimensions of a collection of labeled and unlabeled polypeptides using smFRET and SAXS, we directly assessed the contributions of dyes to the experimental values RG and RE. For chemically denatured proteins we obtain mutual consistency in our inferences based on RG and RE, whereas for IDPs under native conditions, we find substantial deviations. Using computations, we show that discrepant inferences are neither due to methodological shortcomings of specific measurements nor due to artifacts of dyes. Instead, our analysis suggests that chemical heterogeneity in heteropolymeric systems leads to a decoupling between RE and RG that is amplified in the absence of denaturants. Therefore, joint assessments of RG and RE combined with measurements of polymer shapes should provide a consistent and complete picture of the underlying ensembles. PMID:28716919
Pathogenomic Inference of Virulence-Associated Genes in Leptospira interrogans
Lehmann, Jason S.; Fouts, Derrick E.; Haft, Daniel H.; Cannella, Anthony P.; Ricaldi, Jessica N.; Brinkac, Lauren; Harkins, Derek; Durkin, Scott; Sanka, Ravi; Sutton, Granger; Moreno, Angelo; Vinetz, Joseph M.; Matthias, Michael A.
2013-01-01
Leptospirosis is a globally important, neglected zoonotic infection caused by spirochetes of the genus Leptospira. Since genetic transformation remains technically limited for pathogenic Leptospira, a systems biology pathogenomic approach was used to infer leptospiral virulence genes by whole genome comparison of culture-attenuated Leptospira interrogans serovar Lai with its virulent, isogenic parent. Among the 11 pathogen-specific protein-coding genes in which non-synonymous mutations were found, a putative soluble adenylate cyclase with host cell cAMP-elevating activity, and two members of a previously unstudied ∼15 member paralogous gene family of unknown function were identified. This gene family was also uniquely found in the alpha-proteobacteria Bartonella bacilliformis and Bartonella australis that are geographically restricted to the Andes and Australia, respectively. How the pathogenic Leptospira and these two Bartonella species came to share this expanded gene family remains an evolutionary mystery. In vivo expression analyses demonstrated up-regulation of 10/11 Leptospira genes identified in the attenuation screen, and profound in vivo, tissue-specific up-regulation by members of the paralogous gene family, suggesting a direct role in virulence and host-pathogen interactions. The pathogenomic experimental design here is generalizable as a functional systems biology approach to studying bacterial pathogenesis and virulence and should encourage similar experimental studies of other pathogens. PMID:24098822
Pathogenomic inference of virulence-associated genes in Leptospira interrogans.
Lehmann, Jason S; Fouts, Derrick E; Haft, Daniel H; Cannella, Anthony P; Ricaldi, Jessica N; Brinkac, Lauren; Harkins, Derek; Durkin, Scott; Sanka, Ravi; Sutton, Granger; Moreno, Angelo; Vinetz, Joseph M; Matthias, Michael A
2013-01-01
Leptospirosis is a globally important, neglected zoonotic infection caused by spirochetes of the genus Leptospira. Since genetic transformation remains technically limited for pathogenic Leptospira, a systems biology pathogenomic approach was used to infer leptospiral virulence genes by whole genome comparison of culture-attenuated Leptospira interrogans serovar Lai with its virulent, isogenic parent. Among the 11 pathogen-specific protein-coding genes in which non-synonymous mutations were found, a putative soluble adenylate cyclase with host cell cAMP-elevating activity, and two members of a previously unstudied ∼15 member paralogous gene family of unknown function were identified. This gene family was also uniquely found in the alpha-proteobacteria Bartonella bacilliformis and Bartonella australis that are geographically restricted to the Andes and Australia, respectively. How the pathogenic Leptospira and these two Bartonella species came to share this expanded gene family remains an evolutionary mystery. In vivo expression analyses demonstrated up-regulation of 10/11 Leptospira genes identified in the attenuation screen, and profound in vivo, tissue-specific up-regulation by members of the paralogous gene family, suggesting a direct role in virulence and host-pathogen interactions. The pathogenomic experimental design here is generalizable as a functional systems biology approach to studying bacterial pathogenesis and virulence and should encourage similar experimental studies of other pathogens.
Zhou, Yingying; Kang, Xilong; Xiong, Dan; Zhu, Shanshan; Zheng, Huijuan; Xu, Ying; Guo, Yaxin; Pan, Zhiming; Jiao, Xinan
2017-04-01
Tumor necrosis factor receptor-associated factor 3 (TRAF3) plays a key antiviral role by promoting type I interferon production. We cloned the pigeon TRAF3 gene (PiTRAF3) according to its predicted mRNA sequence to investigate its function. The 1704-bp full-length open reading frame encodes a 567-amino acid protein. One Ring finger, two TRAF-type Zinc fingers, one Coiled coil, and one MATH domain were inferred. RT-PCR showed that PiTRAF3 was expressed in all tissues, with relatively weak expression in the heart and liver. In HEK293T cells, over-expression of wild-type, △Ring, △Zinc finger, and △Coiled coil PiTRAF3, but not a △MATH form, significantly increased IFN-β promoter activity. Zinc finger and Coiled coil domains were essential for NF-κB activation. In chicken HD11 cells, PiTRAF3 increased IFN-β promoter activity and four domains were all contributing. R848 stimulation of pigeon peripheral blood mononuclear cells and splenocytes significantly increased expression of PiTRAF3 and the inflammatory cytokine genes CCL5, IL-8, and IL-10. These data demonstrate TRAF3's innate immune function and improve understanding of its involvement in poultry antiviral defense. Copyright © 2016 Elsevier Ltd. All rights reserved.
Bacterial infection as assessed by in vivo gene expression
Heithoff, Douglas M.; Conner, Christopher P.; Hanna, Philip C.; Julio, Steven M.; Hentschel, Ute; Mahan, Michael J.
1997-01-01
In vivo expression technology (IVET) has been used to identify >100 Salmonella typhimurium genes that are specifically expressed during infection of BALB/c mice and/or murine cultured macrophages. Induction of these genes is shown to be required for survival in the animal under conditions of the IVET selection. One class of in vivo induced (ivi) genes, iviVI-A and iviVI-B, constitute an operon that resides in a region of the Salmonella genome with low G+C content and presumably has been acquired by horizontal transfer. These ivi genes encode predicted proteins that are similar to adhesins and invasins from prokaryotic and eukaryotic pathogens (Escherichia coli [tia], Plasmodium falciparum [PfEMP1]) and have coopted the PhoPQ regulatory circuitry of Salmonella virulence genes. Examination of the in vivo induction profile indicates (i) many ivi genes encode regulatory functions (e.g., phoPQ and pmrAB) that serve to enhance the sensitivity and amplitude of virulence gene expression (e.g., spvB); (ii) the biochemical function of many metabolic genes may not represent their sole contribution to virulence; (iii) the host ecology can be inferred from the biochemical functions of ivi genes; and (iv) nutrient limitation plays a dual signaling role in pathogenesis: to induce metabolic functions that complement host nutritional deficiencies and to induce virulence functions required for immediate survival and spread to subsequent host sites. PMID:9023360
Balancing gene expression without library construction via a reusable sRNA pool.
Ghodasara, Amar; Voigt, Christopher A
2017-07-27
Balancing protein expression is critical when optimizing genetic systems. Typically, this requires library construction to vary the genetic parts controlling each gene, which can be expensive and time-consuming. Here, we develop sRNAs corresponding to 15nt 'target' sequences that can be inserted upstream of a gene. The targeted gene can be repressed from 1.6- to 87-fold by controlling sRNA expression using promoters of different strength. A pool is built where six sRNAs are placed under the control of 16 promoters that span a ∼103-fold range of strengths, yielding ∼107 combinations. This pool can simultaneously optimize up to six genes in a system. This requires building only a single system-specific construct by placing a target sequence upstream of each gene and transforming it with the pre-built sRNA pool. The resulting library is screened and the top clone is sequenced to determine the promoter controlling each sRNA, from which the fold-repression of the genes can be inferred. The system is then rebuilt by rationally selecting parts that implement the optimal expression of each gene. We demonstrate the versatility of this approach by using the same pool to optimize a metabolic pathway (β-carotene) and genetic circuit (XNOR logic gate). © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
USDA-ARS?s Scientific Manuscript database
Inferences about lactation responses to diet have been hypothesized to be affected by the use of change-over instead of continuous experimental designs. A direct test of this hypothesis has not been well studied. Additionally, when dietary protein level is changed it must occur through dilution with...
Jin, Tingting; Gao, Yulin; He, Kanglai
2018-01-01
Abstract Trehalose is the major blood sugar in insects. Physiological significance of this compound has been extensively reported. Trehalose-6-phosphate synthase (TPS) is an important enzyme in the trehalose biosynthesis pathway. Full-length cDNAs of TPS (Of tps) and its alternative splicing isoform (Of tps_isoformI) were cloned from the Asian corn borer (ACB), Ostrinia furnacalis (Guenée; Lepidoptera: Crambidae) larvae. The Of tps and Of tps_isoformI transcripts were 2913 and 1689 bp long, contained 2529 and 1293 bp open reading frames encoding proteins of 842 and 430 amino acids with a molecular mass of 94.4 and 48.6 kDa, respectively. Transcriptional profiling and response to thermal stress of Of tps gene were determined by quantitative real-time PCR showing that the Of tps was predominantly expressed in the larval fat body, significantly enhanced during molting and transformation; and thermal stress also induced Of tps expression. Gene structure analysis is indicating that one TPS domain and one trehalose-6-phosphate phosphatase (TPP) domain were located at the N- and C-termini of Of TPS, respectively, while only the TPS domain was detected in OfTPS_isoformI. Three-dimensional modeling and heterologous expression were developed to predict the putative functions of OfTPS and Of TPS_isoformI. We infer that the expression of Of tps gene is thermally induced and might be crucial for larvae survival.
Cloning, sequencing, and expression of cDNA for human. beta. -glucuronidase
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oshima, A.; Kyle, J.W.; Miller, R.D.
1987-02-01
The authors report here the cDNA sequence for human placental ..beta..-glucuronidase (..beta..-D-glucuronoside glucuronosohydrolase, EC 3.2.1.31) and demonstrate expression of the human enzyme in transfected COS cells. They also sequenced a partial cDNA clone from human fibroblasts that contained a 153-base-pair deletion within the coding sequence and found a second type of cDNA clone from placenta that contained the same deletion. Nuclease S1 mapping studies demonstrated two types of mRNAs in human placenta that corresponded to the two types of cDNA clones isolated. The NH/sub 2/-terminal amino acid sequence determined for human spleen ..beta..-glucuronidase agreed with that inferred from the DNAmore » sequence of the two placental clones, beginning at amino acid 23, suggesting a cleaved signal sequence of 22 amino acids. When transfected into COS cells, plasmids containing either placental clone expressed an immunoprecipitable protein that contained N-linked oligosaccharides as evidenced by sensitivity to endoglycosidase F. However, only transfection with the clone containing the 153-base-pair segment led to expression of human ..beta..-glucuronidase activity. These studies provide the sequence for the full-length cDNA for human ..beta..-glucuronidase, demonstrate the existence of two populations of mRNA for ..beta..-glucuronidase in human placenta, only one of which specifies a catalytically active enzyme, and illustrate the importance of expression studies in verifying that a cDNA is functionally full-length.« less
Immunomodulatory Effects of CP-25 on Splenic T Cells of Rats with Adjuvant Arthritis.
Wang, Yang; Han, Chen-Chen; Cui, Dongqian; Luo, Ting-Ting; Li, Yifan; Zhang, Yuwen; Ma, Yang; Wei, Wei
2018-06-01
Rheumatoid arthritis (RA) is an autoimmune disease in which T cells play an important role. Paeoniflorin-6-oxy-benzenesulfonate (CP-25) shows a strong anti-inflammatory and immunomodulatory effect in the joint of adjuvant arthritis (AA) rats, but the role of the spleen function is still unclear. The aim of this study was to research how CP-25 regulated spleen function of AA rats. Male Sprague-Dawley rats were administered with CP-25 (50 mg/kg) orally from day 17 to 29 after immunization. The spleen histopathological changes were analyzed by hematoxylin-eosin staining. G protein-coupled receptor kinases (GRKs) and prostaglandin receptor subtypes (EPs) were screened by Western blot and immunohistochemistry. The co-expression of GRK2 and EP2 as well as GRK2 and EP4 was measured by immunofluorescence and co-immunoprecipitation. The expression of GRK2 and EP4 in splenic T cells was further detected by immunofluorescence. CP-25 was found to relieve the secondary paw swelling, attenuate histopathologic changes, and downregulate GRK2, EP2 and EP4 expression in AA rats. Additionally, CP-25 not only downregulated the co-expression of GRK2 and EP4 but also downregulated GRK2, EP4 expression in splenic T cells of AA rats. From these results, we can infer that CP-25 play an anti-inflammatory and immune function by affecting the function of the splenic T cells.
Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae
Reguly, Teresa; Breitkreutz, Ashton; Boucher, Lorrie; Breitkreutz, Bobby-Joe; Hon, Gary C; Myers, Chad L; Parsons, Ainslie; Friesen, Helena; Oughtred, Rose; Tong, Amy; Stark, Chris; Ho, Yuen; Botstein, David; Andrews, Brenda; Boone, Charles; Troyanskya, Olga G; Ideker, Trey; Dolinski, Kara; Batada, Nizar N; Tyers, Mike
2006-01-01
Background The study of complex biological networks and prediction of gene function has been enabled by high-throughput (HTP) methods for detection of genetic and protein interactions. Sparse coverage in HTP datasets may, however, distort network properties and confound predictions. Although a vast number of well substantiated interactions are recorded in the scientific literature, these data have not yet been distilled into networks that enable system-level inference. Results We describe here a comprehensive database of genetic and protein interactions, and associated experimental evidence, for the budding yeast Saccharomyces cerevisiae, as manually curated from over 31,793 abstracts and online publications. This literature-curated (LC) dataset contains 33,311 interactions, on the order of all extant HTP datasets combined. Surprisingly, HTP protein-interaction datasets currently achieve only around 14% coverage of the interactions in the literature. The LC network nevertheless shares attributes with HTP networks, including scale-free connectivity and correlations between interactions, abundance, localization, and expression. We find that essential genes or proteins are enriched for interactions with other essential genes or proteins, suggesting that the global network may be functionally unified. This interconnectivity is supported by a substantial overlap of protein and genetic interactions in the LC dataset. We show that the LC dataset considerably improves the predictive power of network-analysis approaches. The full LC dataset is available at the BioGRID () and SGD () databases. Conclusion Comprehensive datasets of biological interactions derived from the primary literature provide critical benchmarks for HTP methods, augment functional prediction, and reveal system-level attributes of biological networks. PMID:16762047
Grobei, Monica A.; Qeli, Ermir; Brunner, Erich; Rehrauer, Hubert; Zhang, Runxuan; Roschitzki, Bernd; Basler, Konrad; Ahrens, Christian H.; Grossniklaus, Ueli
2009-01-01
Pollen, the male gametophyte of flowering plants, represents an ideal biological system to study developmental processes, such as cell polarity, tip growth, and morphogenesis. Upon hydration, the metabolically quiescent pollen rapidly switches to an active state, exhibiting extremely fast growth. This rapid switch requires relevant proteins to be stored in the mature pollen, where they have to retain functionality in a desiccated environment. Using a shotgun proteomics approach, we unambiguously identified ∼3500 proteins in Arabidopsis pollen, including 537 proteins that were not identified in genetic or transcriptomic studies. To generate this comprehensive reference data set, which extends the previously reported pollen proteome by a factor of 13, we developed a novel deterministic peptide classification scheme for protein inference. This generally applicable approach considers the gene model–protein sequence–protein accession relationships. It allowed us to classify and eliminate ambiguities inherently associated with any shotgun proteomics data set, to report a conservative list of protein identifications, and to seamlessly integrate data from previous transcriptomics studies. Manual validation of proteins unambiguously identified by a single, information-rich peptide enabled us to significantly reduce the false discovery rate, while keeping valuable identifications of shorter and lower abundant proteins. Bioinformatic analyses revealed a higher stability of pollen proteins compared to those of other tissues and implied a protein family of previously unknown function in vesicle trafficking. Interestingly, the pollen proteome is most similar to that of seeds, indicating physiological similarities between these developmentally distinct tissues. PMID:19546170
Inferring fitness landscapes and selection on phenotypic states from single-cell genealogical data
Kussell, Edo
2017-01-01
Recent advances in single-cell time-lapse microscopy have revealed non-genetic heterogeneity and temporal fluctuations of cellular phenotypes. While different phenotypic traits such as abundance of growth-related proteins in single cells may have differential effects on the reproductive success of cells, rigorous experimental quantification of this process has remained elusive due to the complexity of single cell physiology within the context of a proliferating population. We introduce and apply a practical empirical method to quantify the fitness landscapes of arbitrary phenotypic traits, using genealogical data in the form of population lineage trees which can include phenotypic data of various kinds. Our inference methodology for fitness landscapes determines how reproductivity is correlated to cellular phenotypes, and provides a natural generalization of bulk growth rate measures for single-cell histories. Using this technique, we quantify the strength of selection acting on different cellular phenotypic traits within populations, which allows us to determine whether a change in population growth is caused by individual cells’ response, selection within a population, or by a mixture of these two processes. By applying these methods to single-cell time-lapse data of growing bacterial populations that express a resistance-conferring protein under antibiotic stress, we show how the distributions, fitness landscapes, and selection strength of single-cell phenotypes are affected by the drug. Our work provides a unified and practical framework for quantitative measurements of fitness landscapes and selection strength for any statistical quantities definable on lineages, and thus elucidates the adaptive significance of phenotypic states in time series data. The method is applicable in diverse fields, from single cell biology to stem cell differentiation and viral evolution. PMID:28267748
Senatore, Adriano; Raiss, Hamad; Le, Phuong
2016-01-01
Voltage-gated calcium (Cav) channels serve dual roles in the cell, where they can both depolarize the membrane potential for electrical excitability, and activate transient cytoplasmic Ca2+ signals. In animals, Cav channels play crucial roles including driving muscle contraction (excitation-contraction coupling), gene expression (excitation-transcription coupling), pre-synaptic and neuroendocrine exocytosis (excitation-secretion coupling), regulation of flagellar/ciliary beating, and regulation of cellular excitability, either directly or through modulation of other Ca2+-sensitive ion channels. In recent years, genome sequencing has provided significant insights into the molecular evolution of Cav channels. Furthermore, expanded gene datasets have permitted improved inference of the species phylogeny at the base of Metazoa, providing clearer insights into the evolution of complex animal traits which involve Cav channels, including the nervous system. For the various types of metazoan Cav channels, key properties that determine their cellular contribution include: Ion selectivity, pore gating, and, importantly, cytoplasmic protein-protein interactions that direct sub-cellular localization and functional complexing. It is unclear when these defining features, many of which are essential for nervous system function, evolved. In this review, we highlight some experimental observations that implicate Cav channels in the physiology and behavior of the most early-diverging animals from the phyla Cnidaria, Placozoa, Porifera, and Ctenophora. Given our limited understanding of the molecular biology of Cav channels in these basal animal lineages, we infer insights from better-studied vertebrate and invertebrate animals. We also highlight some apparently conserved cellular functions of Cav channels, which might have emerged very early on during metazoan evolution, or perhaps predated it. PMID:27867359
ABCE1 is essential for S phase progression in human cells
Toompuu, Marina; Kärblane, Kairi; Pata, Pille; Truve, Erkki; Sarmiento, Cecilia
2016-01-01
ABSTRACT ABCE1 is a highly conserved protein universally present in eukaryotes and archaea, which is crucial for the viability of different organisms. First identified as RNase L inhibitor, ABCE1 is currently recognized as an essential translation factor involved in several stages of eukaryotic translation and ribosome biogenesis. The nature of vital functions of ABCE1, however, remains unexplained. Here, we study the role of ABCE1 in human cell proliferation and its possible connection to translation. We show that ABCE1 depletion by siRNA results in a decreased rate of cell growth due to accumulation of cells in S phase, which is accompanied by inefficient DNA synthesis and reduced histone mRNA and protein levels. We infer that in addition to the role in general translation, ABCE1 is involved in histone biosynthesis and DNA replication and therefore is essential for normal S phase progression. In addition, we analyze whether ABCE1 is implicated in transcript-specific translation via its association with the eIF3 complex subunits known to control the synthesis of cell proliferation-related proteins. The expression levels of a few such targets regulated by eIF3A, however, were not consistently affected by ABCE1 depletion. PMID:26985706
The Evolution of Two-Component Systems in Bacteria Reveals Different Strategies for Niche Adaptation
Arkin, Adam
2006-01-01
Two-component systems including histidine protein kinases represent the primary signal transduction paradigm in prokaryotic organisms. To understand how these systems adapt to allow organisms to detect niche-specific signals, we analyzed the phylogenetic distribution of nearly 5,000 histidine protein kinases from 207 sequenced prokaryotic genomes. We found that many genomes carry a large repertoire of recently evolved signaling genes, which may reflect selective pressure to adapt to new environmental conditions. Both lineage-specific gene family expansion and horizontal gene transfer play major roles in the introduction of new histidine kinases into genomes; however, there are differences in how these two evolutionary forces act. Genes imported via horizontal transfer are more likely to retain their original functionality as inferred from a similar complement of signaling domains, while gene family expansion accompanied by domain shuffling appears to be a major source of novel genetic diversity. Family expansion is the dominant source of new histidine kinase genes in the genomes most enriched in signaling proteins, and detailed analysis reveals that divergence in domain structure and changes in expression patterns are hallmarks of recent expansions. Finally, while these two modes of gene acquisition are widespread across bacterial taxa, there are clear species-specific preferences for which mode is used. PMID:17083272
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bock KW; D Honys; JM. Ward
Male fertility depends on the proper development of the male gametophyte, successful pollen germination, tube growth and delivery of the sperm cells to the ovule. Previous studies have shown that nutrients like boron, and ion gradients or currents of Ca2+, H+, and K+ are critical for pollen tube growth. However, the molecular identities of transporters mediating these fluxes are mostly unknown. As a first step to integrate transport with pollen development and function, a genome-wide analysis of transporter genes expressed in the male gametophyte at four developmental stages was conducted. About 1269 genes encoding classified transporters were collected from themore » Arabidopsis thaliana genome. Of 757 transporter genes expressed in pollen, 16% or 124 genes, including AHA6, CNGC18, TIP1.3 and CHX08, are specifically or preferentially expressed relative to sporophytic tissues. Some genes are highly expressed in microspores and bicellular pollen (COPT3, STP2, OPT9); while others are activated only in tricellular or mature pollen (STP11, LHT7). Analyses of entire gene families showed that a subset of genes, including those expressed in sporophytic tissues, were developmentally-regulated during pollen maturation. Early and late expression patterns revealed by transcriptome analysis are supported by promoter::GUS analyses of CHX genes and by other methods. Recent genetic studies based on a few transporters, including plasma membrane H+ pump AHA3, Ca2+ pump ACA9, and K+ channel SPIK, further support the expression patterns and the inferred functions revealed by our analyses. Thus, revealing the distinct expression patterns of specific transporters and unknown polytopic proteins during microgametogenesis provides new insights for strategic mutant analyses necessary to integrate the roles of transporters and potential receptors with male gametophyte development.« less
Bock, Kevin W; Honys, David; Ward, John M; Padmanaban, Senthilkumar; Nawrocki, Eric P; Hirschi, Kendal D; Twell, David; Sze, Heven
2006-04-01
Male fertility depends on the proper development of the male gametophyte, successful pollen germination, tube growth, and delivery of the sperm cells to the ovule. Previous studies have shown that nutrients like boron, and ion gradients or currents of Ca2+, H+, and K+ are critical for pollen tube growth. However, the molecular identities of transporters mediating these fluxes are mostly unknown. As a first step to integrate transport with pollen development and function, a genome-wide analysis of transporter genes expressed in the male gametophyte at four developmental stages was conducted. Approximately 1,269 genes encoding classified transporters were collected from the Arabidopsis (Arabidopsis thaliana) genome. Of 757 transporter genes expressed in pollen, 16% or 124 genes, including AHA6, CNGC18, TIP1.3, and CHX08, are specifically or preferentially expressed relative to sporophytic tissues. Some genes are highly expressed in microspores and bicellular pollen (COPT3, STP2, OPT9), while others are activated only in tricellular or mature pollen (STP11, LHT7). Analyses of entire gene families showed that a subset of genes, including those expressed in sporophytic tissues, was developmentally regulated during pollen maturation. Early and late expression patterns revealed by transcriptome analysis are supported by promoter::beta-glucuronidase analyses of CHX genes and by other methods. Recent genetic studies based on a few transporters, including plasma membrane H+ pump AHA3, Ca2+ pump ACA9, and K+ channel SPIK, further support the expression patterns and the inferred functions revealed by our analyses. Thus, revealing the distinct expression patterns of specific transporters and unknown polytopic proteins during microgametogenesis provides new insights for strategic mutant analyses necessary to integrate the roles of transporters and potential receptors with male gametophyte development.
Yang, Yajie; Boss, Isaac W; McIntyre, Lauren M; Renne, Rolf
2014-08-08
Kaposi's sarcoma associated herpes virus (KSHV) is associated with tumors of endothelial and lymphoid origin. During latent infection, KSHV expresses miR-K12-11, an ortholog of the human tumor gene hsa-miR-155. Both gene products are microRNAs (miRNAs), which are important post-transcriptional regulators that contribute to tissue specific gene expression. Advances in target identification technologies and molecular interaction databases have allowed a systems biology approach to unravel the gene regulatory networks (GRNs) triggered by miR-K12-11 in endothelial and lymphoid cells. Understanding the tissue specific function of miR-K12-11 will help to elucidate underlying mechanisms of KSHV pathogenesis. Ectopic expression of miR-K12-11 differentially affected gene expression in BJAB cells of lymphoid origin and TIVE cells of endothelial origin. Direct miRNA targeting accounted for a small fraction of the observed transcriptome changes: only 29 genes were identified as putative direct targets of miR-K12-11 in both cell types. However, a number of commonly affected biological pathways, such as carbohydrate metabolism and interferon response related signaling, were revealed by gene ontology analysis. Integration of transcriptome profiling, bioinformatic algorithms, and databases of protein-protein interactome from the ENCODE project identified different nodes of GRNs utilized by miR-K12-11 in a tissue-specific fashion. These effector genes, including cancer associated transcription factors and signaling proteins, amplified the regulatory potential of a single miRNA, from a small set of putative direct targets to a larger set of genes. This is the first comparative analysis of miRNA-K12-11's effects in endothelial and B cells, from tissues infected with KSHV in vivo. MiR-K12-11 was able to broadly modulate gene expression in both cell types. Using a systems biology approach, we inferred that miR-K12-11 establishes its GRN by both repressing master TFs and influencing signaling pathways, to counter the host anti-viral response and to promote proliferation and survival of infected cells. The targeted GRNs are more reproducible and informative than target gene identification, and our approach can be applied to other regulatory factors of interest.
Uchida, Yukiko; Townsend, Sarah S M; Rose Markus, Hazel; Bergsieker, Hilary B
2009-11-01
Four studies using open-ended and experimental methods test the hypothesis that in Japanese contexts, emotions are understood as between people, whereas in American contexts, emotions are understood as primarily within people. Study 1 analyzed television interviews of Olympic athletes. When asked about their relationships, Japanese athletes used significantly more emotion words than American athletes. This difference was not significant when questions asked directly about athletes' feelings. In Study 2, when describing an athlete's emotional reaction to winning, Japanese participants implicated others more often than American participants. After reading an athlete's self-description, Japanese participants inferred more emotions when the athlete mentioned relationships, whereas American participants inferred more emotions when the athlete focused only on herself (Study 3). Finally, when viewing images of athletes, Japanese participants inferred more emotions for athletes pictured with teammates, whereas American participants inferred more emotions for athletes pictured alone (Studies 4a and 4b).
ARNetMiT R Package: association rules based gene co-expression networks of miRNA targets.
Özgür Cingiz, M; Biricik, G; Diri, B
2017-03-31
miRNAs are key regulators that bind to target genes to suppress their gene expression level. The relations between miRNA-target genes enable users to derive co-expressed genes that may be involved in similar biological processes and functions in cells. We hypothesize that target genes of miRNAs are co-expressed, when they are regulated by multiple miRNAs. With the usage of these co-expressed genes, we can theoretically construct co-expression networks (GCNs) related to 152 diseases. In this study, we introduce ARNetMiT that utilize a hash based association rule algorithm in a novel way to infer the GCNs on miRNA-target genes data. We also present R package of ARNetMiT, which infers and visualizes GCNs of diseases that are selected by users. Our approach assumes miRNAs as transactions and target genes as their items. Support and confidence values are used to prune association rules on miRNA-target genes data to construct support based GCNs (sGCNs) along with support and confidence based GCNs (scGCNs). We use overlap analysis and the topological features for the performance analysis of GCNs. We also infer GCNs with popular GNI algorithms for comparison with the GCNs of ARNetMiT. Overlap analysis results show that ARNetMiT outperforms the compared GNI algorithms. We see that using high confidence values in scGCNs increase the ratio of the overlapped gene-gene interactions between the compared methods. According to the evaluation of the topological features of ARNetMiT based GCNs, the degrees of nodes have power-law distribution. The hub genes discovered by ARNetMiT based GCNs are consistent with the literature.
Trotter, Eleanor W.; Rolfe, Matthew D.; Hounslow, Andrea M.; Craven, C. Jeremy; Williamson, Michael P.; Sanguinetti, Guido; Poole, Robert K.; Green, Jeffrey
2011-01-01
Background Many bacteria undergo transitions between environments with differing O2 availabilities as part of their natural lifestyles and during biotechnological processes. However, the dynamics of adaptation when bacteria experience changes in O2 availability are understudied. The model bacterium and facultative anaerobe Escherichia coli K-12 provides an ideal system for exploring this process. Methods and Findings Time-resolved transcript profiles of E. coli K-12 during the initial phase of transition from anaerobic to micro-aerobic conditions revealed a reprogramming of gene expression consistent with a switch from fermentative to respiratory metabolism. The changes in transcript abundance were matched by changes in the abundances of selected central metabolic proteins. A probabilistic state space model was used to infer the activities of two key regulators, FNR (O2 sensing) and PdhR (pyruvate sensing). The model implied that both regulators were rapidly inactivated during the transition from an anaerobic to a micro-aerobic environment. Analysis of the external metabolome and protein levels suggested that the cultures transit through different physiological states during the process of adaptation, characterized by the rapid inactivation of pyruvate formate-lyase (PFL), a slower induction of pyruvate dehydrogenase complex (PDHC) activity and transient excretion of pyruvate, consistent with the predicted inactivation of PdhR and FNR. Conclusion Perturbation of anaerobic steady-state cultures by introduction of a limited supply of O2 combined with time-resolved transcript, protein and metabolite profiling, and probabilistic modeling has revealed that pyruvate (sensed by PdhR) is a key metabolic signal in coordinating the reprogramming of E. coli K-12 gene expression by working alongside the O2 sensor FNR during transition from anaerobic to micro-aerobic conditions. PMID:21980479
Trotter, Eleanor W; Rolfe, Matthew D; Hounslow, Andrea M; Craven, C Jeremy; Williamson, Michael P; Sanguinetti, Guido; Poole, Robert K; Green, Jeffrey
2011-01-01
Many bacteria undergo transitions between environments with differing O₂ availabilities as part of their natural lifestyles and during biotechnological processes. However, the dynamics of adaptation when bacteria experience changes in O₂ availability are understudied. The model bacterium and facultative anaerobe Escherichia coli K-12 provides an ideal system for exploring this process. Time-resolved transcript profiles of E. coli K-12 during the initial phase of transition from anaerobic to micro-aerobic conditions revealed a reprogramming of gene expression consistent with a switch from fermentative to respiratory metabolism. The changes in transcript abundance were matched by changes in the abundances of selected central metabolic proteins. A probabilistic state space model was used to infer the activities of two key regulators, FNR (O₂ sensing) and PdhR (pyruvate sensing). The model implied that both regulators were rapidly inactivated during the transition from an anaerobic to a micro-aerobic environment. Analysis of the external metabolome and protein levels suggested that the cultures transit through different physiological states during the process of adaptation, characterized by the rapid inactivation of pyruvate formate-lyase (PFL), a slower induction of pyruvate dehydrogenase complex (PDHC) activity and transient excretion of pyruvate, consistent with the predicted inactivation of PdhR and FNR. Perturbation of anaerobic steady-state cultures by introduction of a limited supply of O₂ combined with time-resolved transcript, protein and metabolite profiling, and probabilistic modeling has revealed that pyruvate (sensed by PdhR) is a key metabolic signal in coordinating the reprogramming of E. coli K-12 gene expression by working alongside the O₂ sensor FNR during transition from anaerobic to micro-aerobic conditions.
Theoretical Analysis of Allosteric and Operator Binding for Cyclic-AMP Receptor Protein Mutants
NASA Astrophysics Data System (ADS)
Einav, Tal; Duque, Julia; Phillips, Rob
2018-02-01
Allosteric transcription factors undergo binding events both at their inducer binding sites as well as at distinct DNA binding domains, and it is often difficult to disentangle the structural and functional consequences of these two classes of interactions. In this work, we compare the ability of two statistical mechanical models - the Monod-Wyman-Changeux (MWC) and the Koshland-N\\'emethy-Filmer (KNF) models of protein conformational change - to characterize the multi-step activation mechanism of the broadly acting cyclic-AMP receptor protein (CRP). We first consider the allosteric transition resulting from cyclic-AMP binding to CRP, then analyze how CRP binds to its operator, and finally investigate the ability of CRP to activate gene expression. In light of these models, we examine data from a beautiful recent experiment that created a single-chain version of the CRP homodimer, thereby enabling each subunit to be mutated separately. Using this construct, six mutants were created using all possible combinations of the wild type subunit, a D53H mutant subunit, and an S62F mutant subunit. We demonstrate that both the MWC and KNF models can explain the behavior of all six mutants using a small, self-consistent set of parameters. In comparing the results, we find that the MWC model slightly outperforms the KNF model in the quality of its fits, but more importantly the parameters inferred by the MWC model are more in line with structural knowledge of CRP. In addition, we discuss how the conceptual framework developed here for CRP enables us to not merely analyze data retrospectively, but has the predictive power to determine how combinations of mutations will interact, how double mutants will behave, and how each construct would regulate gene expression.
Ibrahim, Sherrine A; Ackerman, William E; Summerfield, Taryn L; Lockwood, Charles J; Schatz, Frederick; Kniss, Douglas A
2016-02-01
Inflammation is a proximate mediator of preterm birth and fetal injury. During inflammation several microRNAs (22 nucleotide noncoding ribonucleic acid (RNA) molecules) are up-regulated in response to cytokines such as interleukin-1β. MicroRNAs, in most cases, fine-tune gene expression, including both up-regulation and down-regulation of their target genes. However, the role of pro- and antiinflammatory microRNAs in this process is poorly understood. The principal goal of the work was to examine the inflammatory genomic profile of human decidual cells challenged with a proinflammatory cytokine known to be present in the setting of preterm parturition. We determined the coding (messenger RNA) and noncoding (microRNA) sequences to construct a network of interacting genes during inflammation using an in vitro model of decidual stromal cells. The effects of interleukin-1β exposure on mature microRNA expression were tested in human decidual cell cultures using the multiplexed NanoString platform, whereas the global inflammatory transcriptional response was measured using oligonucleotide microarrays. Differential expression of select transcripts was confirmed by quantitative real time-polymerase chain reaction. Bioinformatics tools were used to infer transcription factor activation and regulatory interactions. Interleukin-1β elicited up- and down-regulation of 350 and 78 nonredundant transcripts (false discovery rate < 0.1), respectively, including induction of numerous cytokines, chemokines, and other inflammatory mediators. Whereas this transcriptional response included marked changes in several microRNA gene loci, the pool of fully processed, mature microRNA was comparatively stable following a cytokine challenge. Of a total of 6 mature microRNAs identified as being differentially expressed by NanoString profiling, 2 (miR-146a and miR-155) were validated by quantitative real time-polymerase chain reaction. Using complementary bioinformatics approaches, activation of several inflammatory transcription factors could be inferred downstream of interleukin-1β based on the overall transcriptional response. Further analysis revealed that miR-146a and miR-155 both target genes involved in inflammatory signaling, including Toll-like receptor and mitogen-activated protein kinase pathways. Stimulation of decidual cells with interleukin-1β alters the expression of microRNAs that function to temper proinflammatory signaling. In this setting, some microRNAs may be involved in tissue-level inflammation during the bulk of gestation and assist in pregnancy maintenance. Copyright © 2016 Elsevier Inc. All rights reserved.
Jin, Hyun-Seok; Kim, Jeonhyun; Kwak, Woori; Jeong, Hyeonsoo; Lim, Gyu-Bin
2017-01-01
Congenital cataracts can occur as a non-syndromic isolated ocular disease or as a part of genetic syndromes accompanied by a multi-systemic disease. Approximately 50% of all congenital cataract cases have a heterogeneous genetic basis. Here, we describe three generations of a family with an autosomal dominant inheritance pattern and common complex phenotypes, including bilateral congenital cataracts, short stature, macrocephaly, and minor skeletal anomalies. We did not find any chromosomal aberrations or gene copy number abnormalities using conventional genetic tests; accordingly, we conducted whole-exome sequencing (WES) to identify disease-causing genetic alterations in this family. Based on family WES data, we identified a novel BRD4 missense mutation as a candidate causal variant and performed cell-based experiments by ablation of endogenous BRD4 expression in human lens epithelial cells. The protein expression levels of connexin 43, p62, LC3BII, and p53 differed significantly between control cells and cells in which endogenous BRD4 expression was inhibited. We inferred that a BRD4 missense mutation was the likely disease-causing mutation in this family. Our findings may improve the molecular diagnosis of congenital cataracts and support the use of WES to clarify the genetic basis of complex diseases. PMID:28076398
Rasala, Beth A; Muto, Machiko; Lee, Philip A; Jager, Michal; Cardoso, Rosa MF; Behnke, Craig A; Kirk, Peter; Hokanson, Craig A; Crea, Roberto; Mendez, Michael; Mayfield, Stephen P
2010-01-01
Summary Recombinant proteins are widely used today in many industries, including the biopharmaceutical industry, and can be expressed in bacteria, yeasts, mammalian and insect cell cultures, or in transgenic plants and animals. In addition, transgenic algae have also been shown to support recombinant protein expression, both from the nuclear and chloroplast genomes. However, to date, there are only a few reports on recombinant proteins expressed in the algal chloroplast. It is unclear if this is due to few attempts or to limitations of the system that preclude expression of many proteins. Thus, we sought to assess the versatility of transgenic algae as a recombinant protein production platform. To do this, we tested whether the algal chloroplast could support the expression of a diverse set of current or potential human therapeutic proteins. Of the seven proteins chosen, greater than 50% expressed at levels sufficient for commercial production. Three expressed at 2% to 3% of total soluble protein, while a forth protein accumulated to similar levels when translationally fused to a well-expressed serum amyloid protein. All of the algal chloroplast-expressed proteins are soluble and showed biological activity comparable to that of the same proteins expressed using traditional production platforms. Thus, the success rate, expression levels, and bioactivty achieved demonstrate the utility of C. reinhardtii as a robust platform for human therapeutic protein production. PMID:20230484
State Space Model with hidden variables for reconstruction of gene regulatory networks.
Wu, Xi; Li, Peng; Wang, Nan; Gong, Ping; Perkins, Edward J; Deng, Youping; Zhang, Chaoyang
2011-01-01
State Space Model (SSM) is a relatively new approach to inferring gene regulatory networks. It requires less computational time than Dynamic Bayesian Networks (DBN). There are two types of variables in the linear SSM, observed variables and hidden variables. SSM uses an iterative method, namely Expectation-Maximization, to infer regulatory relationships from microarray datasets. The hidden variables cannot be directly observed from experiments. How to determine the number of hidden variables has a significant impact on the accuracy of network inference. In this study, we used SSM to infer Gene regulatory networks (GRNs) from synthetic time series datasets, investigated Bayesian Information Criterion (BIC) and Principle Component Analysis (PCA) approaches to determining the number of hidden variables in SSM, and evaluated the performance of SSM in comparison with DBN. True GRNs and synthetic gene expression datasets were generated using GeneNetWeaver. Both DBN and linear SSM were used to infer GRNs from the synthetic datasets. The inferred networks were compared with the true networks. Our results show that inference precision varied with the number of hidden variables. For some regulatory networks, the inference precision of DBN was higher but SSM performed better in other cases. Although the overall performance of the two approaches is compatible, SSM is much faster and capable of inferring much larger networks than DBN. This study provides useful information in handling the hidden variables and improving the inference precision.
Several immune escape patterns in non-Hodgkin's lymphomas
Laurent, Camille; Charmpi, Konstantina; Gravelle, Pauline; Tosolini, Marie; Franchet, Camille; Ysebaert, Loïc; Brousset, Pierre; Bidaut, Alexandre; Ycart, Bernard; Fournié, Jean-Jacques
2015-01-01
Follicular Lymphomas (FL) and diffuse large B cell lymphomas (DLBCL) must evolve some immune escape strategy to develop from lymphoid organs, but their immune evasion pathways remain poorly characterized. We investigated this issue by transcriptome data mining and immunohistochemistry (IHC) of FL and DLBCL lymphoma biopsies. A set of genes involved in cancer immune-evasion pathways (Immune Escape Gene Set, IEGS) was defined and the distribution of the expression levels of these genes was compared in FL, DLBCL and normal B cell transcriptomes downloaded from the GEO database. The whole IEGS was significantly upregulated in all the lymphoma samples but not in B cells or other control tissues, as shown by the overexpression of the PD-1, PD-L1, PD-L2 and LAG3 genes. Tissue microarray immunostainings for PD-1, PD-L1, PD-L2 and LAG3 proteins on additional biopsies from 27 FL and 27 DLBCL patients confirmed the expression of these proteins. The immune infiltrates were more abundant in FL than DLBCL samples, and the microenvironment of FL comprised higher rates of PD-1+ lymphocytes. Further, DLBCL tumor cells comprised a higher proportion of PD-1+, PD-L1+, PD-L2+ and LAG3+ lymphoma cells than the FL tumor cells, confirming that DLBCL mount immune escape strategies distinct from FL. In addition, some cases of DLBCL had tumor cells co-expressing both PD-1, PD-L1 and PD-L2. Among the DLBCLs, the activated B cell (ABC) subtype comprised more PD-L1+ and PD-L2+ lymphoma cells than the GC subtype. Thus, we infer that FL and DLBCL evolved several pathways of immune escape. PMID:26405585
Yang, Sichao; Jiang, Yun; Xu, Liqing; Shiratake, Katsuhiro; Luo, Zhengrong; Zhang, Qinglin
2016-11-01
Persimmon fruits accumulate a large amount of proanthocyanidins (PAs) in "tannin cells" during development that cause the sensation of astringency due to coagulation of oral proteins. Pollination-constant non-astringent (PCNA) is a spontaneous mutant persimmon phenotype that loses its astringency naturally on the tree at maturity; while the more common non-PCNA fruits remain rich in PAs until they are fully ripened. Here, we isolated a DkMATE1 gene encoding a Multidrug And Toxic Compound Extrusion (MATE) family protein from the Chinese PCNA (C-PCNA) 'Eshi 1'. Expression patterns of DkMATE1 were positively correlated with the accumulation of PAs in different types of persimmons fruits during fruit development. An analysis of the inferred amino acid sequences and phylogenetic relationships indicated that DkMATE1 is a putative PA precursor transporter, and subcellular localization assays revealed that DkMATE1 is localized in the vacuolar membrane. Ectopic expression of the DkMATE1 in Arabidopsis tt12 mutant supported that DkMATE1 could complement its biological function in transporting epicatechin 3'-O-glucoside as a PAs precursor from the cytoplasm to vacuole. Furthermore, the transient over-expression and silencing of DkMATE1 in 'Mopanshi' persimmon leaves resulted in a significant increase and a decrease in PA content, respectively. The analysis of cis-elements in DkMATE1 promoter regions indicated that DkMATE1 might be regulated by DkMYB4, another well-known structural gene in persimmon. Overall, our results show that DkMATE1 may be an essential PA precursor membrane transporter that plays an important role in PA biosynthesis in persimmon. Copyright © 2016 Elsevier Masson SAS. All rights reserved.
Park, Chihyun; Yun, So Jeong; Ryu, Sung Jin; Lee, Soyoung; Lee, Young-Sam; Yoon, Youngmi; Park, Sang Chul
2017-03-15
Cellular senescence irreversibly arrests growth of human diploid cells. In addition, recent studies have indicated that senescence is a multi-step evolving process related to important complex biological processes. Most studies analyzed only the genes and their functions representing each senescence phase without considering gene-level interactions and continuously perturbed genes. It is necessary to reveal the genotypic mechanism inferred by affected genes and their interaction underlying the senescence process. We suggested a novel computational approach to identify an integrative network which profiles an underlying genotypic signature from time-series gene expression data. The relatively perturbed genes were selected for each time point based on the proposed scoring measure denominated as perturbation scores. Then, the selected genes were integrated with protein-protein interactions to construct time point specific network. From these constructed networks, the conserved edges across time point were extracted for the common network and statistical test was performed to demonstrate that the network could explain the phenotypic alteration. As a result, it was confirmed that the difference of average perturbation scores of common networks at both two time points could explain the phenotypic alteration. We also performed functional enrichment on the common network and identified high association with phenotypic alteration. Remarkably, we observed that the identified cell cycle specific common network played an important role in replicative senescence as a key regulator. Heretofore, the network analysis from time series gene expression data has been focused on what topological structure was changed over time point. Conversely, we focused on the conserved structure but its context was changed in course of time and showed it was available to explain the phenotypic changes. We expect that the proposed method will help to elucidate the biological mechanism unrevealed by the existing approaches.
2018-02-15
address the problem that probabilistic inference algorithms are diÿcult and tedious to implement, by expressing them in terms of a small number of...building blocks, which are automatic transformations on probabilistic programs. On one hand, our curation of these building blocks reflects the way human...reasoning with low-level computational optimization, so the speed and accuracy of the generated solvers are competitive with state-of-the-art systems. 15
MacGilvray, Matthew E; Shishkova, Evgenia; Chasman, Deborah; Place, Michael; Gitter, Anthony; Coon, Joshua J; Gasch, Audrey P
2018-05-01
Cells respond to stressful conditions by coordinating a complex, multi-faceted response that spans many levels of physiology. Much of the response is coordinated by changes in protein phosphorylation. Although the regulators of transcriptome changes during stress are well characterized in Saccharomyces cerevisiae, the upstream regulatory network controlling protein phosphorylation is less well dissected. Here, we developed a computational approach to infer the signaling network that regulates phosphorylation changes in response to salt stress. We developed an approach to link predicted regulators to groups of likely co-regulated phospho-peptides responding to stress, thereby creating new edges in a background protein interaction network. We then use integer linear programming (ILP) to integrate wild type and mutant phospho-proteomic data and predict the network controlling stress-activated phospho-proteomic changes. The network we inferred predicted new regulatory connections between stress-activated and growth-regulating pathways and suggested mechanisms coordinating metabolism, cell-cycle progression, and growth during stress. We confirmed several network predictions with co-immunoprecipitations coupled with mass-spectrometry protein identification and mutant phospho-proteomic analysis. Results show that the cAMP-phosphodiesterase Pde2 physically interacts with many stress-regulated transcription factors targeted by PKA, and that reduced phosphorylation of those factors during stress requires the Rck2 kinase that we show physically interacts with Pde2. Together, our work shows how a high-quality computational network model can facilitate discovery of new pathway interactions during osmotic stress.
Tagliavia, Marcello; Cuttitta, Angela
2016-01-01
High rates of plasmid instability are associated with the use of some expression vectors in Escherichia coli, resulting in the loss of recombinant protein expression. This is due to sequence alterations in vector promoter elements caused by the background expression of the cloned gene, which leads to the selection of fast-growing, plasmid-containing cells that do not express the target protein. This phenomenon, which is worsened when expressing toxic proteins, results in preparations containing very little or no recombinant protein, or even in clone loss; however, no methods to prevent loss of recombinant protein expression are currently available. We have exploited the phenomenon of translational coupling, a mechanism of prokaryotic gene expression regulation, in order to select cells containing plasmids still able to express recombinant proteins. Here we designed an expression vector in which the cloned gene and selection marker are co-expressed. Our approach allowed for the selection of the recombinant protein-expressing cells and proved effective even for clones encoding toxic proteins.
Campbell, Kieran R; Yau, Christopher
2017-03-15
Modeling bifurcations in single-cell transcriptomics data has become an increasingly popular field of research. Several methods have been proposed to infer bifurcation structure from such data, but all rely on heuristic non-probabilistic inference. Here we propose the first generative, fully probabilistic model for such inference based on a Bayesian hierarchical mixture of factor analyzers. Our model exhibits competitive performance on large datasets despite implementing full Markov-Chain Monte Carlo sampling, and its unique hierarchical prior structure enables automatic determination of genes driving the bifurcation process. We additionally propose an Empirical-Bayes like extension that deals with the high levels of zero-inflation in single-cell RNA-seq data and quantify when such models are useful. We apply or model to both real and simulated single-cell gene expression data and compare the results to existing pseudotime methods. Finally, we discuss both the merits and weaknesses of such a unified, probabilistic approach in the context practical bioinformatics analyses.
Morrison, T; McQuain, C; McGinnes, L
1991-01-01
The cDNA derived from the fusion gene of the virulent AV strain of Newcastle disease virus (NDV) was expressed in chicken embryo cells by using a retrovirus vector. The fusion protein expressed in this system was transported to the cell surface and was efficiently cleaved into the disulfide-linked F1-F2 form found in infectious virions. The cells expressing the fusion gene grew normally and could be passaged many times. Monolayers of these cells would plaque, in the absence of trypsin, avirulent NDV strains (strains which encode a fusion protein which is not cleaved in tissue culture). Fusion protein-expressing cells would not fuse if mixed with uninfected cells or uninfected cells expressing the hemagglutinin-neuraminidase (HN) protein. However, the fusion protein-expressing cells, if infected with avirulent strains of NDV, would fuse with uninfected cells, suggesting that fusion requires both the fusion protein and another viral protein expressed in the same cell. Fusion was also seen after transfection of the HN protein gene into fusion protein-expressing cells. Thus, the expressed fusion protein gene is capable of complementing the virus infection, providing an active cleaved fusion protein required for the spread of infection. However, the fusion protein does not mediate cell fusion unless the cell also expresses the HN protein. Fusion protein-expressing cells would not plaque influenza virus in the absence of trypsin, nor would influenza virus-infected fusion protein-expressing cells fuse with uninfected cells. Thus, the influenza virus HA protein will not substitute for the NDV HN protein in cell-to-cell fusion. Images PMID:1987376
2017-01-01
Mapping gene expression as a quantitative trait using whole genome-sequencing and transcriptome analysis allows to discover the functional consequences of genetic variation. We developed a novel method and ultra-fast software Findr for higly accurate causal inference between gene expression traits using cis-regulatory DNA variations as causal anchors, which improves current methods by taking into consideration hidden confounders and weak regulations. Findr outperformed existing methods on the DREAM5 Systems Genetics challenge and on the prediction of microRNA and transcription factor targets in human lymphoblastoid cells, while being nearly a million times faster. Findr is publicly available at https://github.com/lingfeiwang/findr. PMID:28821014
NIMEFI: gene regulatory network inference using multiple ensemble feature importance algorithms.
Ruyssinck, Joeri; Huynh-Thu, Vân Anh; Geurts, Pierre; Dhaene, Tom; Demeester, Piet; Saeys, Yvan
2014-01-01
One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms) and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made publicly available.
Hewitt, Stephen N.; Choi, Ryan; Kelley, Angela; Crowther, Gregory J.; Napuli, Alberto J.; Van Voorhis, Wesley C.
2011-01-01
Despite recent advances, the expression of heterologous proteins in Escherichia coli for crystallization remains a nontrivial challenge. The present study investigates the efficacy of maltose-binding protein (MBP) fusion as a general strategy for rescuing the expression of target proteins. From a group of sequence-verified clones with undetectable levels of protein expression in an E. coli T7 expression system, 95 clones representing 16 phylogenetically diverse organisms were selected for recloning into a chimeric expression vector with an N-terminal histidine-tagged MBP. PCR-amplified inserts were annealed into an identical ligation-independent cloning region in an MBP-fusion vector and were analyzed for expression and solubility by high-throughput nickel-affinity binding. This approach yielded detectable expression of 72% of the clones; soluble expression was visible in 62%. However, the solubility of most proteins was marginal to poor upon cleavage of the MBP tag. This study offers large-scale evidence that MBP can improve the soluble expression of previously non-expressing proteins from a variety of eukaryotic and prokaryotic organisms. While the behavior of the cleaved proteins was disappointing, further refinements in MBP tagging may permit the more widespread use of MBP-fusion proteins in crystallographic studies. PMID:21904041
Abdelmaksoud, Heba E.; Yau, Edwin H.; Zuker, Michael; Sullivan, Jack M.
2011-01-01
To identify lead candidate allele-independent hammerhead ribozymes (hhRz) for the treatment of autosomal dominant mutations in the human rod opsin (RHO) gene, we tested a series of hhRzs for potential to significantly knockdown human RHO gene expression in a human cell expression system. Multiple computational criteria were used to select target mRNA regions likely to be single stranded and accessible to hhRz annealing and cleavage. Target regions are tested for accessibility in a human cell culture expression system where the hhRz RNA and target mRNA and protein are coexpressed. The hhRz RNA is embedded in an adenoviral VAI RNA chimeric RNA of established structure and properties which are critical to the experimental paradigm. The chimeric hhRz-VAI RNA is abundantly transcribed so that the hhRzs are expected to be in great excess over substrate mRNA. HhRz-VAI traffics predominantly to the cytoplasm to colocalize with the RHO mRNA target. Colocalization is essential for second-order annealing reactions. The VAI chimera protects the hhRz RNA from degradation and provides for a long half life. With cell lines chosen for high transfection efficiency and a molar excess of hhRz plasmid over target plasmid, the conditions of this experimental paradigm are specifically designed to evaluate for regions of accessibility of the target mRNA in cellulo. Western analysis was used to measure the impact of hhRz expression on RHO protein expression. Three lead candidate hhRz designs were identified that significantly knockdown target protein expression relative to control (p < 0.05). Successful lead candidates (hhRz CUC↓ 266, hhRz CUC↓ 1411, hhRz AUA↓ 1414) targeted regions of human RHO mRNA that were predicted to be accessible by a bioinformatics approach, whereas regions predicted to be inaccessible supported no knockdown. The maximum opsin protein level knockdown is approximately 30% over a 48 hr paradigm of testing. These results validate a rigorous computational bioinformatics approach to detect accessible regions of target mRNAs in cellulo. The opsin knockdown effect could prove to be clinically significant when integrated over longer periods in photoreceptors. Further optimization and animal testing is the next step in this stratified RNA drug discovery program. A recently developed novel and efficient screening assay based upon expression of a dicistronic mRNA (RHO-IRES-SEAP) containing both RHO and reporter (SEAP) cDNAs was used to compare the hhRz 266 lead candidate to another agent (Rz525/hhRz485) already known to partially rescue retinal degeneration in a rodent model. Lead hhRz 266 CUC↓ proved more efficacious than Rz525/hhRz485 which infers viability for rescue of retinal degeneration in appropriate preclinical models of disease. PMID:19094986
Fuertes, Gustavo; Banterle, Niccolò; Ruff, Kiersten M; Chowdhury, Aritra; Mercadante, Davide; Koehler, Christine; Kachala, Michael; Estrada Girona, Gemma; Milles, Sigrid; Mishra, Ankur; Onck, Patrick R; Gräter, Frauke; Esteban-Martín, Santiago; Pappu, Rohit V; Svergun, Dmitri I; Lemke, Edward A
2017-08-01
Unfolded states of proteins and native states of intrinsically disordered proteins (IDPs) populate heterogeneous conformational ensembles in solution. The average sizes of these heterogeneous systems, quantified by the radius of gyration ( R G ), can be measured by small-angle X-ray scattering (SAXS). Another parameter, the mean dye-to-dye distance ( R E ) for proteins with fluorescently labeled termini, can be estimated using single-molecule Förster resonance energy transfer (smFRET). A number of studies have reported inconsistencies in inferences drawn from the two sets of measurements for the dimensions of unfolded proteins and IDPs in the absence of chemical denaturants. These differences are typically attributed to the influence of fluorescent labels used in smFRET and to the impact of high concentrations and averaging features of SAXS. By measuring the dimensions of a collection of labeled and unlabeled polypeptides using smFRET and SAXS, we directly assessed the contributions of dyes to the experimental values R G and R E For chemically denatured proteins we obtain mutual consistency in our inferences based on R G and R E , whereas for IDPs under native conditions, we find substantial deviations. Using computations, we show that discrepant inferences are neither due to methodological shortcomings of specific measurements nor due to artifacts of dyes. Instead, our analysis suggests that chemical heterogeneity in heteropolymeric systems leads to a decoupling between R E and R G that is amplified in the absence of denaturants. Therefore, joint assessments of R G and R E combined with measurements of polymer shapes should provide a consistent and complete picture of the underlying ensembles.
Xie, Jing; Wang, Chunli; Huang, Dong-Yue; Zhang, Yanyan; Xu, Jianwen; Kolesnikov, Stanislav S; Sung, K L Paul; Zhao, Hucheng
2013-03-15
The anterior cruciate ligament (ACL) is known to have a poor self-healing ability. In contrast, the medial collateral ligament (MCL) can heal relatively well and restore the joint function. Transforming growth factor-beta1 (TGF-β1) is considered to be an important chemical mediator in the wound healing of the ligaments. While the role of TGF-β1-induced expressions of the lysyl oxidases (LOXs) and matrix metalloproteinases (MMPs), which respectively facilitate the extracellular matrix (ECM) repair and degradation, is poorly understood. In this study, we used equibiaxial stretch chamber to mimic mechanical injury of ACL and MCL fibroblasts, and aimed to determine the intrinsic differences between ACL and MCL by characterizing the differential expressions of LOXs and MMPs in response to TGF-β1 after mechanical injury. By using semi-quantitative PCR, quantitative real-time PCR, western blot and zymography, we found TGF-β1 induced injured MCL to express more LOXs than injured ACL (up to 1.85-fold in LOX, 2.21-fold in LOXL-1, 1.71-fold in LOXL-2, 2.52-fold in LOXL-3 and 3.32-fold in LOXL-4). Meanwhile, TGF-β1 induced injured ACL to express more MMPs than injured MCL fibroblasts (up to 2.33-fold in MMP-1, 2.45-fold in MMP-2, 1.89-fold in MMP-3 and 1.50-fold in MMP-12). The further protein results were coincident with the gene expressions above. The different expressions of LOXs and MMPs inferred the intrinsic differences between ACL and MCL, and the intrinsic differences could help to explain their differential healing abilities. Copyright © 2012 Elsevier Ltd. All rights reserved.
CLIC, a tool for expanding biological pathways based on co-expression across thousands of datasets
Li, Yang; Liu, Jun S.; Mootha, Vamsi K.
2017-01-01
In recent years, there has been a huge rise in the number of publicly available transcriptional profiling datasets. These massive compendia comprise billions of measurements and provide a special opportunity to predict the function of unstudied genes based on co-expression to well-studied pathways. Such analyses can be very challenging, however, since biological pathways are modular and may exhibit co-expression only in specific contexts. To overcome these challenges we introduce CLIC, CLustering by Inferred Co-expression. CLIC accepts as input a pathway consisting of two or more genes. It then uses a Bayesian partition model to simultaneously partition the input gene set into coherent co-expressed modules (CEMs), while assigning the posterior probability for each dataset in support of each CEM. CLIC then expands each CEM by scanning the transcriptome for additional co-expressed genes, quantified by an integrated log-likelihood ratio (LLR) score weighted for each dataset. As a byproduct, CLIC automatically learns the conditions (datasets) within which a CEM is operative. We implemented CLIC using a compendium of 1774 mouse microarray datasets (28628 microarrays) or 1887 human microarray datasets (45158 microarrays). CLIC analysis reveals that of 910 canonical biological pathways, 30% consist of strongly co-expressed gene modules for which new members are predicted. For example, CLIC predicts a functional connection between protein C7orf55 (FMC1) and the mitochondrial ATP synthase complex that we have experimentally validated. CLIC is freely available at www.gene-clic.org. We anticipate that CLIC will be valuable both for revealing new components of biological pathways as well as the conditions in which they are active. PMID:28719601
Nascimento, Diana Sofia Marques; Potes, Catarina Soares; Soares, Miguel Luz; Ferreira, António Carlos; Malcangio, Marzia; Castro-Lopes, José Manuel; Neto, Fani Lourença Moreira
2018-05-01
Purinergic receptors (P2XRs) have been widely associated with pain states mostly due to their involvement in neuron-glia communication. Interestingly, we have previously shown that satellite glial cells (SGC), surrounding dorsal root ganglia (DRG) neurons, become activated and proliferate during monoarthritis (MA) in the rat. Here, we demonstrate that P2X7R expression increases in ipsilateral DRG after 1 week of disease, while P2X3R immunoreactivity decreases. We have also reported a significant induction of the activating transcriptional factor 3 (ATF3) in MA. In this study, we show that ATF3 knocked down in DRG cell cultures does not affect the expression of P2X7R, P2X3R, or glial fibrillary acidic protein (GFAP). We suggest that P2X7R negatively regulates P2X3R, which, however, is unlikely mediated by ATF3. Interestingly, we found that ATF3 knockdown in vitro induced significant decreases in the heat shock protein 90 (HSP90) expression. Thus, we evaluated in vivo the involvement of HSP90 in MA and demonstrated that the HSP90 messenger RNA levels increase in ipsilateral DRG of inflamed animals. We also show that HSP90 is mostly found in a cleaved form in this condition. Moreover, administration of a HSP90 inhibitor, 17-dimethylaminoethylamino-17-demethoxygeldanamycin (17-DMAG), attenuated MA-induced mechanical allodynia in the first hours. The drug also reversed the HSP90 upregulation and cleavage. 17-DMAG seemed to attenuate glial activation and neuronal sensitization (as inferred by downregulation of GFAP and P2X3R in ipsilateral DRG) which might correlate with the observed pain alleviation. Our data indicate a role of HSP90 in MA pathophysiology, but further investigation is necessary to clarify the underlying mechanisms.
Improving membrane protein expression and function using genomic edits
Jensen, Heather M.; Eng, Thomas; Chubukov, Victor; ...
2017-10-12
Expression of membrane proteins often leads to growth inhibition and perturbs central metabolism and this burden varies with the protein being overexpressed. There are also known strain backgrounds that allow greater expression of membrane proteins but that differ in efficacy across proteins. Here, we hypothesized that for any membrane protein, it may be possible to identify a modified strain background where its expression can be accommodated with less burden. To directly test this hypothesis, we used a bar-coded transposon insertion library in tandem with cell sorting to assess genome-wide impact of gene deletions on membrane protein expression. The expression ofmore » five membrane proteins (CyoB, CydB, MdlB, YidC, and LepI) and one soluble protein (GST), each fused to GFP, was examined. We identified Escherichia coli mutants that demonstrated increased membrane protein expression relative to that in wild type. For two of the proteins (CyoB and CydB), we conducted functional assays to confirm that the increase in protein expression also led to phenotypic improvement in function. This study represents a systematic approach to broadly identify genetic loci that can be used to improve membrane protein expression, and our method can be used to improve expression of any protein that poses a cellular burden.« less
Improving membrane protein expression and function using genomic edits
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jensen, Heather M.; Eng, Thomas; Chubukov, Victor
Expression of membrane proteins often leads to growth inhibition and perturbs central metabolism and this burden varies with the protein being overexpressed. There are also known strain backgrounds that allow greater expression of membrane proteins but that differ in efficacy across proteins. Here, we hypothesized that for any membrane protein, it may be possible to identify a modified strain background where its expression can be accommodated with less burden. To directly test this hypothesis, we used a bar-coded transposon insertion library in tandem with cell sorting to assess genome-wide impact of gene deletions on membrane protein expression. The expression ofmore » five membrane proteins (CyoB, CydB, MdlB, YidC, and LepI) and one soluble protein (GST), each fused to GFP, was examined. We identified Escherichia coli mutants that demonstrated increased membrane protein expression relative to that in wild type. For two of the proteins (CyoB and CydB), we conducted functional assays to confirm that the increase in protein expression also led to phenotypic improvement in function. This study represents a systematic approach to broadly identify genetic loci that can be used to improve membrane protein expression, and our method can be used to improve expression of any protein that poses a cellular burden.« less
Kim, Dongchul; Kang, Mingon; Biswas, Ashis; Liu, Chunyu; Gao, Jean
2016-08-10
Inferring gene regulatory networks is one of the most interesting research areas in the systems biology. Many inference methods have been developed by using a variety of computational models and approaches. However, there are two issues to solve. First, depending on the structural or computational model of inference method, the results tend to be inconsistent due to innately different advantages and limitations of the methods. Therefore the combination of dissimilar approaches is demanded as an alternative way in order to overcome the limitations of standalone methods through complementary integration. Second, sparse linear regression that is penalized by the regularization parameter (lasso) and bootstrapping-based sparse linear regression methods were suggested in state of the art methods for network inference but they are not effective for a small sample size data and also a true regulator could be missed if the target gene is strongly affected by an indirect regulator with high correlation or another true regulator. We present two novel network inference methods based on the integration of three different criteria, (i) z-score to measure the variation of gene expression from knockout data, (ii) mutual information for the dependency between two genes, and (iii) linear regression-based feature selection. Based on these criterion, we propose a lasso-based random feature selection algorithm (LARF) to achieve better performance overcoming the limitations of bootstrapping as mentioned above. In this work, there are three main contributions. First, our z score-based method to measure gene expression variations from knockout data is more effective than similar criteria of related works. Second, we confirmed that the true regulator selection can be effectively improved by LARF. Lastly, we verified that an integrative approach can clearly outperform a single method when two different methods are effectively jointed. In the experiments, our methods were validated by outperforming the state of the art methods on DREAM challenge data, and then LARF was applied to inferences of gene regulatory network associated with psychiatric disorders.
Reconstructing Cell Lineages from Single-Cell Gene Expression Data: A Pilot Study
2016-08-30
Reconstructing cell lineages from single- cell gene expression data: a pilot study The goal of this pilot study is to develop novel mathematical...methods, by leveraging tools developed in the bifurcation theory, to infer the underlying cell -state dynamics from single- cell gene expression data. Our...proposed method contains two steps. The first step is to reconstruct the temporal order of the cells from gene expression data, whereas the second
Spatial reconstruction of single-cell gene expression
Satija, Rahul; Farrell, Jeffrey A.; Gennert, David; Schier, Alexander F.; Regev, Aviv
2015-01-01
Spatial localization is a key determinant of cellular fate and behavior, but spatial RNA assays traditionally rely on staining for a limited number of RNA species. In contrast, single-cell RNA-seq allows for deep profiling of cellular gene expression, but established methods separate cells from their native spatial context. Here we present Seurat, a computational strategy to infer cellular localization by integrating single-cell RNA-seq data with in situ RNA patterns. We applied Seurat to spatially map 851 single cells from dissociated zebrafish (Danio rerio) embryos, inferring a transcriptome-wide map of spatial patterning. We confirmed Seurat’s accuracy using several experimental approaches, and used it to identify a set of archetypal expression patterns and spatial markers. Additionally, Seurat correctly localizes rare subpopulations, accurately mapping both spatially restricted and scattered groups. Seurat will be applicable to mapping cellular localization within complex patterned tissues in diverse systems. PMID:25867923
Path Models of Vocal Emotion Communication
Bänziger, Tanja; Hosoya, Georg; Scherer, Klaus R.
2015-01-01
We propose to use a comprehensive path model of vocal emotion communication, encompassing encoding, transmission, and decoding processes, to empirically model data sets on emotion expression and recognition. The utility of the approach is demonstrated for two data sets from two different cultures and languages, based on corpora of vocal emotion enactment by professional actors and emotion inference by naïve listeners. Lens model equations, hierarchical regression, and multivariate path analysis are used to compare the relative contributions of objectively measured acoustic cues in the enacted expressions and subjective voice cues as perceived by listeners to the variance in emotion inference from vocal expressions for four emotion families (fear, anger, happiness, and sadness). While the results confirm the central role of arousal in vocal emotion communication, the utility of applying an extended path modeling framework is demonstrated by the identification of unique combinations of distal cues and proximal percepts carrying information about specific emotion families, independent of arousal. The statistical models generated show that more sophisticated acoustic parameters need to be developed to explain the distal underpinnings of subjective voice quality percepts that account for much of the variance in emotion inference, in particular voice instability and roughness. The general approach advocated here, as well as the specific results, open up new research strategies for work in psychology (specifically emotion and social perception research) and engineering and computer science (specifically research and development in the domain of affective computing, particularly on automatic emotion detection and synthetic emotion expression in avatars). PMID:26325076
Visualization of newt aragonitic otoconial matrices using transmission electron microscopy
NASA Technical Reports Server (NTRS)
Steyger, P. S.; Wiederhold, M. L.
1995-01-01
Otoconia are calcified protein matrices within the gravity-sensing organs of the vertebrate vestibular system. These protein matrices are thought to originate from the supporting or hair cells in the macula during development. Previous studies of mammalian calcitic, barrel-shaped otoconia revealed an organized protein matrix consisting of a thin peripheral layer, a well-defined organic core and a flocculent matrix inbetween. No studies have reported the microscopic organization of the aragonitic otoconial matrix, despite its protein characterization. Pote et al. (1993b) used densitometric methods and inferred that prismatic (aragonitic) otoconia have a peripheral protein distribution, compared to that described for the barrel-shaped, calcitic otoconia of birds, mammals, and the amphibian utricle. By using tannic acid as a negative stain, we observed three kinds of organic matrices in preparations of fixed, decalcified saccular otoconia from the adult newt: (1) fusiform shapes with a homogenous electron-dense matrix; (2) singular and multiple strands of matrix; and (3) more significantly, prismatic shapes outlined by a peripheral organic matrix. These prismatic shapes remain following removal of the gelatinous matrix, revealing an internal array of organic matter. We conclude that prismatic otoconia have a largely peripheral otoconial matrix, as inferred by densitometry.
Westerbeck, Jason W.
2015-01-01
ABSTRACT Coronaviruses (CoVs) assemble by budding into the lumen of the early Golgi complex prior to exocytosis. The small CoV envelope (E) protein plays roles in assembly, virion release, and pathogenesis. CoV E has a single hydrophobic domain (HD), is targeted to Golgi complex membranes, and has cation channel activity in vitro. However, the precise functions of the CoV E protein during infection are still enigmatic. Structural data for the severe acute respiratory syndrome (SARS)-CoV E protein suggest that it assembles into a homopentamer. Specific residues in the HD regulate the ion-conducting pore formed by SARS-CoV E in artificial bilayers and the pathogenicity of the virus during infection. The E protein from the avian infectious bronchitis virus (IBV) has dramatic effects on the secretory system which require residues in the HD. Here, we use the known structural data from SARS-CoV E to infer the residues important for ion channel activity and the oligomerization of IBV E. We present biochemical data for the formation of two distinct oligomeric pools of IBV E in transfected and infected cells and the residues required for their formation. A high-order oligomer of IBV E is required for the production of virus-like particles (VLPs), implicating this form of the protein in virion assembly. Additionally, disruption of the secretory pathway by IBV E correlates with a form that is likely monomeric, suggesting that the effects on the secretory pathway are independent of E ion channel activity. IMPORTANCE CoVs are important human pathogens with significant zoonotic potential, as demonstrated by the emergence of SARS-CoV and Middle East respiratory syndrome (MERS)-CoV. Progress has been made toward identifying potential vaccine candidates in mouse models of CoV infection, including the use of attenuated viruses that lack the CoV E protein or express E-protein mutants. However, no approved vaccines or antiviral therapeutics exist. We previously reported that the hydrophobic domain of the IBV E protein, a putative viroporin, causes disruption of the mammalian secretory pathway when exogenously expressed in cells. Understanding the mechanism of this disruption could lead to the identification of novel antiviral therapeutics. Here, we present biochemical evidence for two distinct oligomeric forms of IBV E, one essential for assembly and the other with a role in disruption of the secretory pathway. Discovery of two forms of CoV E protein will provide additional targets for antiviral therapeutics. PMID:26136577
Relational learning and transitive expression in aging and amnesia
D'Angelo, Maria C.; Kamino, Daphne; Ostreicher, Melanie; Moses, Sandra N.; Rosenbaum, R. Shayna
2016-01-01
ABSTRACT Aging has been associated with a decline in relational memory, which is critically supported by the hippocampus. By adapting the transitivity paradigm (Bunsey and Eichenbaum (1996) Nature 379:255‐257), which traditionally has been used in nonhuman animal research, this work examined the extent to which aging is accompanied by deficits in relational learning and flexible expression of relational information. Older adults' performance was additionally contrasted with that of amnesic case DA to understand the critical contributions of the medial temporal lobe, and specifically, the hippocampus, which endures structural and functional changes in healthy aging. Participants were required to select the correct choice item (B versus Y) based on the presented sample item (e.g., A). Pairwise relations must be learned (A‐>B, B‐>C, C‐>D) so that ultimately, the correct relations can be inferred when presented with a novel probe item (A‐>C?Z?). Participants completed four conditions of transitivity that varied in terms of the degree to which the stimuli and the relations among them were known pre‐experimentally. Younger adults, older adults, and DA performed similarly when the condition employed all pre‐experimentally known, semantic, relations. Older adults and DA were less accurate than younger adults when all to‐be‐learned relations were arbitrary. However, accuracy improved for older adults when they could use pre‐experimentally known pairwise relations to express understanding of arbitrary relations as indexed through inference judgments. DA could not learn arbitrary relations nor use existing knowledge to support novel inferences. These results suggest that while aging has often been associated with an emerging decline in hippocampal function, prior knowledge can be used to support novel inferences. However, in case DA, significant damage to the hippocampus likely impaired his ability to learn novel relations, while additional damage to ventromedial prefrontal and anterior temporal regions may have resulted in an inability to use prior knowledge to flexibly express indirect relational knowledge. © 2015 The Authors Hippocampus Published by Wiley Periodicals, Inc. PMID:26234960
Combinatorial Labeling Method for Improving Peptide Fragmentation in Mass Spectrometry
NASA Astrophysics Data System (ADS)
Kuchibhotla, Bhanuramanand; Kola, Sankara Rao; Medicherla, Jagannadham V.; Cherukuvada, Swamy V.; Dhople, Vishnu M.; Nalam, Madhusudhana Rao
2017-06-01
Annotation of peptide sequence from tandem mass spectra constitutes the central step of mass spectrometry-based proteomics. Peptide mass spectra are obtained upon gas-phase fragmentation. Identification of the protein from a set of experimental peptide spectral matches is usually referred as protein inference. Occurrence and intensity of these fragment ions in the MS/MS spectra are dependent on many factors such as amino acid composition, peptide basicity, activation mode, protease, etc. Particularly, chemical derivatizations of peptides were known to alter their fragmentation. In this study, the influence of acetylation, guanidinylation, and their combination on peptide fragmentation was assessed initially on a lipase (LipA) from Bacillus subtilis followed by a bovine six protein mix digest. The dual modification resulted in improved fragment ion occurrence and intensity changes, and this resulted in the equivalent representation of b- and y-type fragment ions in an ion trap MS/MS spectrum. The improved representation has allowed us to accurately annotate the peptide sequences de novo. Dual labeling has significantly reduced the false positive protein identifications in standard bovine six peptide digest. Our study suggests that the combinatorial labeling of peptides is a useful method to validate protein identifications for high confidence protein inference. [Figure not available: see fulltext.
Boissinot, Sylvaine; Erdinger, Monique; Monsion, Baptiste; Ziegler-Graff, Véronique; Brault, Véronique
2014-01-01
Cucurbit aphid-borne yellows virus (CABYV) is a polerovirus (Luteoviridae family) with a capsid composed of the major coat protein and a minor component referred to as the readthrough protein (RT). Two forms of the RT were reported: a full-length protein of 74 kDa detected in infected plants and a truncated form of 55 kDa (RT*) incorporated into virions. Both forms were detected in CABYV-infected plants. To clarify the specific roles of each protein in the viral cycle, we generated by deletion a polerovirus mutant able to synthesize only the RT* which is incorporated into the particle. This mutant was unable to move systemically from inoculated leaves inferring that the C-terminal half of the RT is required for efficient long-distance transport of CABYV. Among a collection of CABYV mutants bearing point mutations in the central domain of the RT, we obtained a mutant impaired in the correct processing of the RT which does not produce the RT*. This mutant accumulated very poorly in upper non-inoculated leaves, suggesting that the RT* has a functional role in long-distance movement of CABYV. Taken together, these results infer that both RT proteins are required for an efficient CABYV movement.
Boissinot, Sylvaine; Erdinger, Monique; Monsion, Baptiste; Ziegler-Graff, Véronique; Brault, Véronique
2014-01-01
Cucurbit aphid-borne yellows virus (CABYV) is a polerovirus (Luteoviridae family) with a capsid composed of the major coat protein and a minor component referred to as the readthrough protein (RT). Two forms of the RT were reported: a full-length protein of 74 kDa detected in infected plants and a truncated form of 55 kDa (RT*) incorporated into virions. Both forms were detected in CABYV-infected plants. To clarify the specific roles of each protein in the viral cycle, we generated by deletion a polerovirus mutant able to synthesize only the RT* which is incorporated into the particle. This mutant was unable to move systemically from inoculated leaves inferring that the C-terminal half of the RT is required for efficient long-distance transport of CABYV. Among a collection of CABYV mutants bearing point mutations in the central domain of the RT, we obtained a mutant impaired in the correct processing of the RT which does not produce the RT*. This mutant accumulated very poorly in upper non-inoculated leaves, suggesting that the RT* has a functional role in long-distance movement of CABYV. Taken together, these results infer that both RT proteins are required for an efficient CABYV movement. PMID:24691251
Yin, Xiaotao; Wang, Wei; Tian, Renli; Xu, Yuanji; Yan, Jinqi; Zhang, Wei; Gao, Jiangping; Yu, Jiyun
2013-08-01
To construct a prokaryotic expression plasmid pET28a-survivin, optimize the recombinant protein expression conditions in E.coli, and purify the survivin recombinant protein and identify its antigenicity. Survivin cDNA segment was amplified by PCR and cloned into prokaryotic expression vector pET28a(+) to construct the recombinant expression vector pET28a-survivin. The expression vector was transformed into BL21 (DE3) and the fusion protein survivin/His was induced by IPTG. The fusion protein was purified through Ni affinity chromatography. The antigenicity of the purified survivin protein was identified by Western blotting and ELISA. The recombinant expression vector was verified successfully by BamHI and HindIII. The fusion protein induced by IPTG was obtained with Mr; about 24 000. The purity of the purified protein reached 90% by SDS-PAGE analysis. And the antigenicity of the survivin protein was validated by Western blotting and ELISA. The prokaryotic expression plasmid pET28a-survivin was successfully constructed and the survivin protein was expressed and purified in E.coli. The antigenicity of the purified survivin protein was demonstrated desirable.
2012-01-01
Background ChIP-seq provides new opportunities to study allele-specific protein-DNA binding (ASB). However, detecting allelic imbalance from a single ChIP-seq dataset often has low statistical power since only sequence reads mapped to heterozygote SNPs are informative for discriminating two alleles. Results We develop a new method iASeq to address this issue by jointly analyzing multiple ChIP-seq datasets. iASeq uses a Bayesian hierarchical mixture model to learn correlation patterns of allele-specificity among multiple proteins. Using the discovered correlation patterns, the model allows one to borrow information across datasets to improve detection of allelic imbalance. Application of iASeq to 77 ChIP-seq samples from 40 ENCODE datasets and 1 genomic DNA sample in GM12878 cells reveals that allele-specificity of multiple proteins are highly correlated, and demonstrates the ability of iASeq to improve allelic inference compared to analyzing each individual dataset separately. Conclusions iASeq illustrates the value of integrating multiple datasets in the allele-specificity inference and offers a new tool to better analyze ASB. PMID:23194258
Quantitative Proteomics via High Resolution MS Quantification: Capabilities and Limitations
Higgs, Richard E.; Butler, Jon P.; Han, Bomie; Knierman, Michael D.
2013-01-01
Recent improvements in the mass accuracy and resolution of mass spectrometers have led to renewed interest in label-free quantification using data from the primary mass spectrum (MS1) acquired from data-dependent proteomics experiments. The capacity for higher specificity quantification of peptides from samples enriched for proteins of biological interest offers distinct advantages for hypothesis generating experiments relative to immunoassay detection methods or prespecified peptide ions measured by multiple reaction monitoring (MRM) approaches. Here we describe an evaluation of different methods to post-process peptide level quantification information to support protein level inference. We characterize the methods by examining their ability to recover a known dilution of a standard protein in background matrices of varying complexity. Additionally, the MS1 quantification results are compared to a standard, targeted, MRM approach on the same samples under equivalent instrument conditions. We show the existence of multiple peptides with MS1 quantification sensitivity similar to the best MRM peptides for each of the background matrices studied. Based on these results we provide recommendations on preferred approaches to leveraging quantitative measurements of multiple peptides to improve protein level inference. PMID:23710359
Bastien, Olivier; Ortet, Philippe; Roy, Sylvaine; Maréchal, Eric
2005-03-10
Popular methods to reconstruct molecular phylogenies are based on multiple sequence alignments, in which addition or removal of data may change the resulting tree topology. We have sought a representation of homologous proteins that would conserve the information of pair-wise sequence alignments, respect probabilistic properties of Z-scores (Monte Carlo methods applied to pair-wise comparisons) and be the basis for a novel method of consistent and stable phylogenetic reconstruction. We have built up a spatial representation of protein sequences using concepts from particle physics (configuration space) and respecting a frame of constraints deduced from pair-wise alignment score properties in information theory. The obtained configuration space of homologous proteins (CSHP) allows the representation of real and shuffled sequences, and thereupon an expression of the TULIP theorem for Z-score probabilities. Based on the CSHP, we propose a phylogeny reconstruction using Z-scores. Deduced trees, called TULIP trees, are consistent with multiple-alignment based trees. Furthermore, the TULIP tree reconstruction method provides a solution for some previously reported incongruent results, such as the apicomplexan enolase phylogeny. The CSHP is a unified model that conserves mutual information between proteins in the way physical models conserve energy. Applications include the reconstruction of evolutionary consistent and robust trees, the topology of which is based on a spatial representation that is not reordered after addition or removal of sequences. The CSHP and its assigned phylogenetic topology, provide a powerful and easily updated representation for massive pair-wise genome comparisons based on Z-score computations.
Apoptin towards safe and efficient anticancer therapies.
Backendorf, Claude; Noteborn, Mathieu H M
2014-01-01
The chicken anemia virus derived protein apoptin harbors cancer-selective cell killing characteristics, essentially based on phosphorylation-mediated nuclear transfer in cancer cells and efficient cytoplasmic degradation in normal cells. Here, we describe a growing set of preclinical experiments underlying the promises of the anti-cancer potential of apoptin. Various non-replicative oncolytic viral vector systems have revealed the safety and efficacy of apoptin. In addition, apoptin enhanced the oncolytic potential of adenovirus, parvovirus and Newcastle disease virus vectors. Intratumoral injection of attenuated Salmonella typhimurium bacterial strains and plasmid-based systems expressing apoptin resulted in significant tumor regression. In-vitro and in-vivo experiments showed that recombinant membrane-transferring PTD4- or TAT-apoptin proteins have potential as a future anticancer therapeutics. In xenografted hepatoma and melanoma mouse models PTD4-apoptin protein entered both cancer and normal cells, but only killed cancer cells. Combinatorial treatment of PTD4-apoptin with various (chemo)therapeutic compounds revealed an additive or even synergistic effect, reducing the side effects of the single (chemo)therapeutic treatment. Degradable polymeric nanocapsules harboring MBP-apoptin fusion-protein induced tumor-selective cell killing in-vitro and in-vivo and revealed the potential of polymer-apoptin protein vehicles as an anticancer agent.Besides its direct use as an anticancer therapeutic, apoptin research has also generated novel possibilities for drug design. The nuclear location domains of apoptin are attractive tools for targeting therapeutic compounds into the nucleus of cancer cells. Identification of cancer-related processes targeted by apoptin can potentially generate novel drug targets. Recent breakthroughs important for clinical applications are reported inferring apoptin-based clinical trials as a feasible reality.
Mixture models for protein structure ensembles.
Hirsch, Michael; Habeck, Michael
2008-10-01
Protein structure ensembles provide important insight into the dynamics and function of a protein and contain information that is not captured with a single static structure. However, it is not clear a priori to what extent the variability within an ensemble is caused by internal structural changes. Additional variability results from overall translations and rotations of the molecule. And most experimental data do not provide information to relate the structures to a common reference frame. To report meaningful values of intrinsic dynamics, structural precision, conformational entropy, etc., it is therefore important to disentangle local from global conformational heterogeneity. We consider the task of disentangling local from global heterogeneity as an inference problem. We use probabilistic methods to infer from the protein ensemble missing information on reference frames and stable conformational sub-states. To this end, we model a protein ensemble as a mixture of Gaussian probability distributions of either entire conformations or structural segments. We learn these models from a protein ensemble using the expectation-maximization algorithm. Our first model can be used to find multiple conformers in a structure ensemble. The second model partitions the protein chain into locally stable structural segments or core elements and less structured regions typically found in loops. Both models are simple to implement and contain only a single free parameter: the number of conformers or structural segments. Our models can be used to analyse experimental ensembles, molecular dynamics trajectories and conformational change in proteins. The Python source code for protein ensemble analysis is available from the authors upon request.
Inferring ontology graph structures using OWL reasoning.
Rodríguez-García, Miguel Ángel; Hoehndorf, Robert
2018-01-05
Ontologies are representations of a conceptualization of a domain. Traditionally, ontologies in biology were represented as directed acyclic graphs (DAG) which represent the backbone taxonomy and additional relations between classes. These graphs are widely exploited for data analysis in the form of ontology enrichment or computation of semantic similarity. More recently, ontologies are developed in a formal language such as the Web Ontology Language (OWL) and consist of a set of axioms through which classes are defined or constrained. While the taxonomy of an ontology can be inferred directly from the axioms of an ontology as one of the standard OWL reasoning tasks, creating general graph structures from OWL ontologies that exploit the ontologies' semantic content remains a challenge. We developed a method to transform ontologies into graphs using an automated reasoner while taking into account all relations between classes. Searching for (existential) patterns in the deductive closure of ontologies, we can identify relations between classes that are implied but not asserted and generate graph structures that encode for a large part of the ontologies' semantic content. We demonstrate the advantages of our method by applying it to inference of protein-protein interactions through semantic similarity over the Gene Ontology and demonstrate that performance is increased when graph structures are inferred using deductive inference according to our method. Our software and experiment results are available at http://github.com/bio-ontology-research-group/Onto2Graph . Onto2Graph is a method to generate graph structures from OWL ontologies using automated reasoning. The resulting graphs can be used for improved ontology visualization and ontology-based data analysis.
Yu, Xiaoyu; Reva, Oleg N
2018-01-01
Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA.
Yu, Xiaoyu; Reva, Oleg N
2018-01-01
Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA. PMID:29511354
Zhang, Huaizhong; Fan, Jun; Perkins, Simon; Pisconti, Addolorata; Simpson, Deborah M.; Bessant, Conrad; Hubbard, Simon; Jones, Andrew R.
2015-01-01
The mzQuantML standard has been developed by the Proteomics Standards Initiative for capturing, archiving and exchanging quantitative proteomic data, derived from mass spectrometry. It is a rich XML‐based format, capable of representing data about two‐dimensional features from LC‐MS data, and peptides, proteins or groups of proteins that have been quantified from multiple samples. In this article we report the development of an open source Java‐based library of routines for mzQuantML, called the mzqLibrary, and associated software for visualising data called the mzqViewer. The mzqLibrary contains routines for mapping (peptide) identifications on quantified features, inference of protein (group)‐level quantification values from peptide‐level values, normalisation and basic statistics for differential expression. These routines can be accessed via the command line, via a Java programming interface access or a basic graphical user interface. The mzqLibrary also contains several file format converters, including import converters (to mzQuantML) from OpenMS, Progenesis LC‐MS and MaxQuant, and exporters (from mzQuantML) to other standards or useful formats (mzTab, HTML, csv). The mzqViewer contains in‐built routines for viewing the tables of data (about features, peptides or proteins), and connects to the R statistical library for more advanced plotting options. The mzqLibrary and mzqViewer packages are available from https://code.google.com/p/mzq‐lib/. PMID:26037908
Qi, Da; Zhang, Huaizhong; Fan, Jun; Perkins, Simon; Pisconti, Addolorata; Simpson, Deborah M; Bessant, Conrad; Hubbard, Simon; Jones, Andrew R
2015-09-01
The mzQuantML standard has been developed by the Proteomics Standards Initiative for capturing, archiving and exchanging quantitative proteomic data, derived from mass spectrometry. It is a rich XML-based format, capable of representing data about two-dimensional features from LC-MS data, and peptides, proteins or groups of proteins that have been quantified from multiple samples. In this article we report the development of an open source Java-based library of routines for mzQuantML, called the mzqLibrary, and associated software for visualising data called the mzqViewer. The mzqLibrary contains routines for mapping (peptide) identifications on quantified features, inference of protein (group)-level quantification values from peptide-level values, normalisation and basic statistics for differential expression. These routines can be accessed via the command line, via a Java programming interface access or a basic graphical user interface. The mzqLibrary also contains several file format converters, including import converters (to mzQuantML) from OpenMS, Progenesis LC-MS and MaxQuant, and exporters (from mzQuantML) to other standards or useful formats (mzTab, HTML, csv). The mzqViewer contains in-built routines for viewing the tables of data (about features, peptides or proteins), and connects to the R statistical library for more advanced plotting options. The mzqLibrary and mzqViewer packages are available from https://code.google.com/p/mzq-lib/. © 2015 The Authors. PROTEOMICS Published by Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Carver, Melissa N.; Müller, Ulrika; Bekiranov, Stefan; Auble, David T.
2017-01-01
Transcriptome studies on eukaryotic cells have revealed an unexpected abundance and diversity of noncoding RNAs synthesized by RNA polymerase II (Pol II), some of which influence the expression of protein-coding genes. Yet, much less is known about biogenesis of Pol II non-coding RNA than mRNAs. In the budding yeast Saccharomyces cerevisiae, initiation of non-coding transcripts by Pol II appears to be similar to that of mRNAs, but a distinct pathway is utilized for termination of most non-coding RNAs: the Sen1-dependent or “NNS” pathway. Here, we examine the effect on the S. cerevisiae transcriptome of conditional mutations in the genes encoding six different essential proteins that influence Sen1-dependent termination: Sen1, Nrd1, Nab3, Ssu72, Rpb11, and Hrp1. We observe surprisingly diverse effects on transcript abundance for the different proteins that cannot be explained simply by differing severity of the mutations. Rather, we infer from our results that termination of Pol II transcription of non-coding RNA genes is subject to complex combinatorial control that likely involves proteins beyond those studied here. Furthermore, we identify new targets and functions of Sen1-dependent termination, including a role in repression of meiotic genes in vegetative cells. In combination with other recent whole-genome studies on termination of non-coding RNAs, our results provide promising directions for further investigation. PMID:28665995
Chaperone-like properties of tobacco plastid thioredoxins f and m
Sanz-Barrio, Ruth; Fernández-San Millán, Alicia; Carballeda, Jon; Corral-Martínez, Patricia; Seguí-Simarro, José M.; Farran, Inmaculada
2012-01-01
Thioredoxins (Trxs) are ubiquitous disulphide reductases that play important roles in the redox regulation of many cellular processes. However, some redox-independent functions, such as chaperone activity, have also been attributed to Trxs in recent years. The focus of our study is on the putative chaperone function of the well-described plastid Trxs f and m. To that end, the cDNA of both Trxs, designated as NtTrxf and NtTrxm, was isolated from Nicotiana tabacum plants. It was found that bacterially expressed tobacco Trx f and Trx m, in addition to their disulphide reductase activity, possessed chaperone-like properties. In vitro, Trx f and Trx m could both facilitate the reactivation of the cysteine-free form of chemically denatured glucose-6 phosphate dehydrogenase (foldase chaperone activity) and prevent heat-induced malate dehydrogenase aggregation (holdase chaperone activity). Our results led us to infer that the disulphide reductase and foldase chaperone functions prevail when the proteins occur as monomers and the well-conserved non-active cysteine present in Trx f is critical for both functions. By contrast, the holdase chaperone activity of both Trxs depended on their oligomeric status: the proteins were functional only when they were associated with high molecular mass protein complexes. Because the oligomeric status of both Trxs was induced by salt and temperature, our data suggest that plastid Trxs could operate as molecular holdase chaperones upon oxidative stress, acting as a type of small stress protein. PMID:21948853
Analysis of Gene Regulatory Networks of Maize in Response to Nitrogen.
Jiang, Lu; Ball, Graham; Hodgman, Charlie; Coules, Anne; Zhao, Han; Lu, Chungui
2018-03-08
Nitrogen (N) fertilizer has a major influence on the yield and quality. Understanding and optimising the response of crop plants to nitrogen fertilizer usage is of central importance in enhancing food security and agricultural sustainability. In this study, the analysis of gene regulatory networks reveals multiple genes and biological processes in response to N. Two microarray studies have been used to infer components of the nitrogen-response network. Since they used different array technologies, a map linking the two probe sets to the maize B73 reference genome has been generated to allow comparison. Putative Arabidopsis homologues of maize genes were used to query the Biological General Repository for Interaction Datasets (BioGRID) network, which yielded the potential involvement of three transcription factors (TFs) (GLK5, MADS64 and bZIP108) and a Calcium-dependent protein kinase. An Artificial Neural Network was used to identify influential genes and retrieved bZIP108 and WRKY36 as significant TFs in both microarray studies, along with genes for Asparagine Synthetase, a dual-specific protein kinase and a protein phosphatase. The output from one study also suggested roles for microRNA (miRNA) 399b and Nin-like Protein 15 (NLP15). Co-expression-network analysis of TFs with closely related profiles to known Nitrate-responsive genes identified GLK5, GLK8 and NLP15 as candidate regulators of genes repressed under low Nitrogen conditions, while bZIP108 might play a role in gene activation.
Aberrant gene expression in mucosa adjacent to tumor reveals a molecular crosstalk in colon cancer
2014-01-01
Background A colorectal tumor is not an isolated entity growing in a restricted location of the body. The patient’s gut environment constitutes the framework where the tumor evolves and this relationship promotes and includes a complex and tight correlation of the tumor with inflammation, blood vessels formation, nutrition, and gut microbiome composition. The tumor influence in the environment could both promote an anti-tumor or a pro-tumor response. Methods A set of 98 paired adjacent mucosa and tumor tissues from colorectal cancer (CRC) patients and 50 colon mucosa from healthy donors (246 samples in total) were included in this work. RNA extracted from each sample was hybridized in Affymetrix chips Human Genome U219. Functional relationships between genes were inferred by means of systems biology using both transcriptional regulation networks (ARACNe algorithm) and protein-protein interaction networks (BIANA software). Results Here we report a transcriptomic analysis revealing a number of genes activated in adjacent mucosa from CRC patients, not activated in mucosa from healthy donors. A functional analysis of these genes suggested that this active reaction of the adjacent mucosa was related to the presence of the tumor. Transcriptional and protein-interaction networks were used to further elucidate this response of normal gut in front of the tumor, revealing a crosstalk between proteins secreted by the tumor and receptors activated in the adjacent colon tissue; and vice versa. Remarkably, Slit family of proteins activated ROBO receptors in tumor whereas tumor-secreted proteins transduced a cellular signal finally activating AP-1 in adjacent tissue. Conclusions The systems-level approach provides new insights into the micro-ecology of colorectal tumorogenesis. Disrupting this intricate molecular network of cell-cell communication and pro-inflammatory microenvironment could be a therapeutic target in CRC patients. PMID:24597571
Constructing networks with correlation maximization methods.
Mellor, Joseph C; Wu, Jie; Delisi, Charles
2004-01-01
Problems of inference in systems biology are ideally reduced to formulations which can efficiently represent the features of interest. In the case of predicting gene regulation and pathway networks, an important feature which describes connected genes and proteins is the relationship between active and inactive forms, i.e. between the "on" and "off" states of the components. While not optimal at the limits of resolution, these logical relationships between discrete states can often yield good approximations of the behavior in larger complex systems, where exact representation of measurement relationships may be intractable. We explore techniques for extracting binary state variables from measurement of gene expression, and go on to describe robust measures for statistical significance and information that can be applied to many such types of data. We show how statistical strength and information are equivalent criteria in limiting cases, and demonstrate the application of these measures to simple systems of gene regulation.
Universality and predictability in molecular quantitative genetics.
Nourmohammad, Armita; Held, Torsten; Lässig, Michael
2013-12-01
Molecular traits, such as gene expression levels or protein binding affinities, are increasingly accessible to quantitative measurement by modern high-throughput techniques. Such traits measure molecular functions and, from an evolutionary point of view, are important as targets of natural selection. We review recent developments in evolutionary theory and experiments that are expected to become building blocks of a quantitative genetics of molecular traits. We focus on universal evolutionary characteristics: these are largely independent of a trait's genetic basis, which is often at least partially unknown. We show that universal measurements can be used to infer selection on a quantitative trait, which determines its evolutionary mode of conservation or adaptation. Furthermore, universality is closely linked to predictability of trait evolution across lineages. We argue that universal trait statistics extends over a range of cellular scales and opens new avenues of quantitative evolutionary systems biology. Copyright © 2013. Published by Elsevier Ltd.
Orenstein, Yaron; Wang, Yuhao; Berger, Bonnie
2016-06-15
Protein-RNA interactions, which play vital roles in many processes, are mediated through both RNA sequence and structure. CLIP-based methods, which measure protein-RNA binding in vivo, suffer from experimental noise and systematic biases, whereas in vitro experiments capture a clearer signal of protein RNA-binding. Among them, RNAcompete provides binding affinities of a specific protein to more than 240 000 unstructured RNA probes in one experiment. The computational challenge is to infer RNA structure- and sequence-based binding models from these data. The state-of-the-art in sequence models, Deepbind, does not model structural preferences. RNAcontext models both sequence and structure preferences, but is outperformed by GraphProt. Unfortunately, GraphProt cannot detect structural preferences from RNAcompete data due to the unstructured nature of the data, as noted by its developers, nor can it be tractably run on the full RNACompete dataset. We develop RCK, an efficient, scalable algorithm that infers both sequence and structure preferences based on a new k-mer based model. Remarkably, even though RNAcompete data is designed to be unstructured, RCK can still learn structural preferences from it. RCK significantly outperforms both RNAcontext and Deepbind in in vitro binding prediction for 244 RNAcompete experiments. Moreover, RCK is also faster and uses less memory, which enables scalability. While currently on par with existing methods in in vivo binding prediction on a small scale test, we demonstrate that RCK will increasingly benefit from experimentally measured RNA structure profiles as compared to computationally predicted ones. By running RCK on the entire RNAcompete dataset, we generate and provide as a resource a set of protein-RNA structure-based models on an unprecedented scale. Software and models are freely available at http://rck.csail.mit.edu/ bab@mit.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Szamosfalvi, Balázs; Cortes, Pedro; Alviani, Rebecca; Asano, Kenichiro; Riser, Bruce L; Zasuwa, Gary; Yee, Jerry
2002-05-01
Sulfonylurea agents exert their physiological effects in many cell types via binding to specific sulfonylurea receptors (SUR). SUR couple to inwardly-rectifying K+ channel (Kir6.x) to form tetradimeric ATP-sensitive K+ channels (KATP). The SUR subunits confer ATP-sensitivity on KATP and also provide the binding sites for sulfonylureas and other pharmacological agents. Our previous work demonstrated that the exposure of mesangial cells (MC) to sulfonylureas generated profound effects on MC glucose uptake and matrix metabolism and induced heightened cell contractility in association with Ca2+ transients. Because these responses likely resulted from the binding of sulfonylurea to a mesangial SUR2, we subsequently documented [3H]-glibenclamide binding to MC and the gene expression of several mesangial SUR2 transcripts. From these data, we inferred that MC expressed the components of a mesangial KATP and sought to establish their presence in primary MC. To obtain mesangial SUR2 cDNA sequences, rapid amplification of cDNA ends (RACE) was utilized. DNA sequences were established by the fluorescent dye termination method. Gene expression of mesangial SUR2 and Kir6.1/2 was examined by reverse transcription polymerase chain reaction (RT-PCR) and Northern analysis. SUR2 proteins were identified by immunoblotting of mesangial proteins from membrane-enriched fractions with polyclonal antiserum directed against SUR2. RACE cloning yielded two mesangial SUR2 cDNAs of 4.8 and 6.7 kbp whose open reading frames translated proteins of 964 and 1535 aa, respectively. Using probes specific to each cDNA, the presence of a unique, 5.5 kbp serum-regulated mesangial SUR2 splice variant was established. The sequence of this mesangial SUR2 (mcSUR2B) shares identity with the recently cloned rat SUR2B (rSUR2B), but, in comparison to rSUR2B, is truncated by 12 exons at the N-terminus where it contains a unique insert of 16 aa. Immunoblotting studies with anti-SUR2 antiserum demonstrated SUR2 proteins of 108 and 170 kD in membrane-enriched fractions of MC protein extracts. Complementary studies showed abundant gene expression of Kir6.1, thereby establishing gene expression of both components of KATP. Based upon analogy to vascular smooth muscle cells (VSMC), there are at least two putative mesangial KATP that most likely represent hetero-octamers, comprised of either rSUR2B or mcSUR2 in complex with Kir6.1. Our results define the mesangial SUR2B as the possible first link in a chain of cellular events that culminates in MC contraction and altered extracellular matrix metabolism following exposure to sulfonylureas. In addition, our results serve as the basis for the future elucidation of the electrophysiologic characteristics of the mesangial KATP and the study of endogenous regulators of mesangial cell contractility.
General Methods for Evolutionary Quantitative Genetic Inference from Generalized Mixed Models.
de Villemereuil, Pierre; Schielzeth, Holger; Nakagawa, Shinichi; Morrissey, Michael
2016-11-01
Methods for inference and interpretation of evolutionary quantitative genetic parameters, and for prediction of the response to selection, are best developed for traits with normal distributions. Many traits of evolutionary interest, including many life history and behavioral traits, have inherently nonnormal distributions. The generalized linear mixed model (GLMM) framework has become a widely used tool for estimating quantitative genetic parameters for nonnormal traits. However, whereas GLMMs provide inference on a statistically convenient latent scale, it is often desirable to express quantitative genetic parameters on the scale upon which traits are measured. The parameters of fitted GLMMs, despite being on a latent scale, fully determine all quantities of potential interest on the scale on which traits are expressed. We provide expressions for deriving each of such quantities, including population means, phenotypic (co)variances, variance components including additive genetic (co)variances, and parameters such as heritability. We demonstrate that fixed effects have a strong impact on those parameters and show how to deal with this by averaging or integrating over fixed effects. The expressions require integration of quantities determined by the link function, over distributions of latent values. In general cases, the required integrals must be solved numerically, but efficient methods are available and we provide an implementation in an R package, QGglmm. We show that known formulas for quantities such as heritability of traits with binomial and Poisson distributions are special cases of our expressions. Additionally, we show how fitted GLMM can be incorporated into existing methods for predicting evolutionary trajectories. We demonstrate the accuracy of the resulting method for evolutionary prediction by simulation and apply our approach to data from a wild pedigreed vertebrate population. Copyright © 2016 de Villemereuil et al.
Robust Inference of Cell-to-Cell Expression Variations from Single- and K-Cell Profiling
Narayanan, Manikandan; Martins, Andrew J.; Tsang, John S.
2016-01-01
Quantifying heterogeneity in gene expression among single cells can reveal information inaccessible to cell-population averaged measurements. However, the expression level of many genes in single cells fall below the detection limit of even the most sensitive technologies currently available. One proposed approach to overcome this challenge is to measure random pools of k cells (e.g., 10) to increase sensitivity, followed by computational “deconvolution” of cellular heterogeneity parameters (CHPs), such as the biological variance of single-cell expression levels. Existing approaches infer CHPs using either single-cell or k-cell data alone, and typically within a single population of cells. However, integrating both single- and k-cell data may reap additional benefits, and quantifying differences in CHPs across cell populations or conditions could reveal novel biological information. Here we present a Bayesian approach that can utilize single-cell, k-cell, or both simultaneously to infer CHPs within a single condition or their differences across two conditions. Using simulated as well as experimentally generated single- and k-cell data, we found situations where each data type would offer advantages, but using both together can improve precision and better reconcile CHP information contained in single- and k-cell data. We illustrate the utility of our approach by applying it to jointly generated single- and k-cell data to reveal CHP differences in several key inflammatory genes between resting and inflammatory cytokine-activated human macrophages, delineating differences in the distribution of ‘ON’ versus ‘OFF’ cells and in continuous variation of expression level among cells. Our approach thus offers a practical and robust framework to assess and compare cellular heterogeneity within and across biological conditions using modern multiplexed technologies. PMID:27438699
Baringou, Stephane; Rouault, Jacques-Deric; Koken, Marcel; Hardivillier, Yann; Hurtado, Luis; Leignel, Vincent
2016-10-10
The 70kDa heat shock proteins (HSP70) are considered the most conserved members of the HSP family. These proteins are primordial to the cell, because of their implications in many cellular pathways (e. g., development, immunity) and also because they minimize the effects of multiple stresses (e. g., temperature, pollutants, salinity, radiations). In the cytosol, two ubiquitous HSP70s with either a constitutive (HSC70) or an inducible (HSP70) expression pattern are found in all metazoan species, encoded by 5 or 6 genes (Drosophila melanogaster or yeast and human respectively). The cytosolic HSP70 protein family is considered a major actor in environmental adaptation, and widely used in ecology as an important biomarker of environmental stress. Nevertheless, the diversity of cytosolic HSP70 remains unclear amongst the Athropoda phylum, especially within decapods. Using 122 new and 311 available sequences, we carried out analyses of the overall cytosolic HSP70 diversity in arthropods (with a focus on decapods) and inferred molecular phylogenies. Overall structural and phylogenetic analyses showed a surprisingly high diversity in cytosolic HSP70 and revealed the existence of several unrecognised groups. All crustacean HSP70 sequences present signature motifs and molecular weights characteristic of non-organellar HSP70, with multiple specific substitutions in the protein sequence. The cytosolic HSP70 family in arthropods appears to be constituted of at least three distinct groups (annotated as A, B and C), which comprise several subdivisions, including both constitutive and inducible forms. Group A is constituted by several classes of Arthropods, while group B and C seem to be specific to Malacostraca and Hexapoda/Chelicerata, respectively. The HSP70 organization appeared much more complex than previously suggested, and far beyond a simple differentiation according to their expression pattern (HSC70 versus HSP70). This study proposes a new classification of cytosolic HSP70 and an evolutionary model of the distinct forms amongst the Arthropoda phylum. The observed differences between HSP70 groups will probably have to be linked to distinct interactions with co-chaperones or other co-factors. Copyright © 2016 Elsevier B.V. All rights reserved.
Dortay, Hakan; Akula, Usha Madhuri; Westphal, Christin; Sittig, Marie; Mueller-Roeber, Bernd
2011-01-01
Protein expression in heterologous hosts for functional studies is a cumbersome effort. Here, we report a superior platform for parallel protein expression in vivo and in vitro. The platform combines highly efficient ligation-independent cloning (LIC) with instantaneous detection of expressed proteins through N- or C-terminal fusions to infrared fluorescent protein (IFP). For each open reading frame, only two PCR fragments are generated (with three PCR primers) and inserted by LIC into ten expression vectors suitable for protein expression in microbial hosts, including Escherichia coli, Kluyveromyces lactis, Pichia pastoris, the protozoon Leishmania tarentolae, and an in vitro transcription/translation system. Accumulation of IFP-fusion proteins is detected by infrared imaging of living cells or crude protein extracts directly after SDS-PAGE without additional processing. We successfully employed the LIC-IFP platform for in vivo and in vitro expression of ten plant and fungal proteins, including transcription factors and enzymes. Using the IFP reporter, we additionally established facile methods for the visualisation of protein-protein interactions and the detection of DNA-transcription factor interactions in microtiter and gel-free format. We conclude that IFP represents an excellent reporter for high-throughput protein expression and analysis, which can be easily extended to numerous other expression hosts using the setup reported here. PMID:21541323
Expression of the prospective mesoderm genes twist, snail, and mef2 in penaeid shrimp.
Wei, Jiankai; Glaves, Richard Samuel Elliot; Sellars, Melony J; Xiang, Jianhai; Hertzler, Philip L
2016-07-01
In penaeid shrimp, mesoderm forms from two sources: naupliar mesoderm founder cells, which invaginate during gastrulation, and posterior mesodermal stem cells called mesoteloblasts, which undergo characteristic teloblastic divisions. The primordial mesoteloblast descends from the ventral mesendoblast, which arrests in cell division at the 32-cell stage and ingresses with its sister dorsal mesendoblast prior to naupliar mesoderm invagination. The naupliar mesoderm forms the muscles of the naupliar appendages (first and second antennae and mandibles), while the mesoteloblasts form the mesoderm, including the muscles, of subsequently formed posterior segments. To better understand the mechanism of mesoderm and muscle formation in penaeid shrimp, twist, snail, and mef2 cDNAs were identified from transcriptomes of Penaeus vannamei, P. japonicus, P. chinensis, and P. monodon. A single Twist ortholog was found, with strong inferred amino acid conservation across all three species. Multiple Snail protein variants were detected, which clustered in a phylogenetic tree with other decapod crustacean Snail sequences. Two closely-related mef2 variants were found in P. vannamei. The developmental mRNA expression of these genes was studied by qPCR in P. vannamei embryos, larvae, and postlarvae. Expression of Pv-twist and Pv-snail began during the limb bud stage and continued through larval stages to the postlarva. Surprisingly, Pv-mef2 expression was found in all stages from the zygote to the postlarva, with the highest expression in the limb bud and protozoeal stages. The results add comparative data on the development of anterior and posterior mesoderm in malacostracan crustaceans, and should stimulate further studies on mesoderm and muscle development in penaeid shrimp.
Effects of immunosuppressive treatment on protein expression in rat kidney
Kędzierska, Karolina; Sporniak-Tutak, Katarzyna; Sindrewicz, Krzysztof; Bober, Joanna; Domański, Leszek; Parafiniuk, Mirosław; Urasińska, Elżbieta; Ciechanowicz, Andrzej; Domański, Maciej; Smektała, Tomasz; Masiuk, Marek; Skrzypczak, Wiesław; Ożgo, Małgorzata; Kabat-Koperska, Joanna; Ciechanowski, Kazimierz
2014-01-01
The structural proteins of renal tubular epithelial cells may become a target for the toxic metabolites of immunosuppressants. These metabolites can modify the properties of the proteins, thereby affecting cell function, which is a possible explanation for the mechanism of immunosuppressive agents’ toxicity. In our study, we evaluated the effect of two immunosuppressive strategies on protein expression in the kidneys of Wistar rats. Fragments of the rat kidneys were homogenized after cooling in liquid nitrogen and then dissolved in lysis buffer. The protein concentration in the samples was determined using a protein assay kit, and the proteins were separated by two-dimensional electrophoresis. The obtained gels were then stained with Coomassie Brilliant Blue, and their images were analyzed to evaluate differences in protein expression. Identification of selected proteins was then performed using mass spectrometry. We found that the immunosuppressive drugs used in popular regimens induce a series of changes in protein expression in target organs. The expression of proteins involved in drug, glucose, amino acid, and lipid metabolism was pronounced. However, to a lesser extent, we also observed changes in nuclear, structural, and transport proteins’ synthesis. Very slight differences were observed between the group receiving cyclosporine, mycophenolate mofetil, and glucocorticoids (CMG) and the control group. In contrast, compared to the control group, animals receiving tacrolimus, mycophenolate mofetil, and glucocorticoids (TMG) exhibited higher expression of proteins responsible for renal drug metabolism and lower expression levels of cytoplasmic actin and the major urinary protein. In the TMG group, we observed higher expression of proteins responsible for drug metabolism and a decrease in the expression of respiratory chain enzymes (thioredoxin-2) and markers of distal renal tubular damage (heart fatty acid-binding protein) compared to expression in the CMG group. The consequences of the reported changes in protein expression require further study. PMID:25328384
2012-01-01
Background Gene duplication and the subsequent divergence in function of the resulting paralogs via subfunctionalization and/or neofunctionalization is hypothesized to have played a major role in the evolution of plant form. The LEAFY HULL STERILE1 (LHS1) SEPALLATA (SEP) genes have been linked with the origin and diversification of the grass spikelet, but it is uncertain 1) when the duplication event that produced the LHS1 clade and its paralogous lineage Oryza sativa MADS5 (OSM5) occurred, and 2) how changes in gene structure and/or expression might have contributed to subfunctionalization and/or neofunctionalization in the two lineages. Methods Phylogenetic relationships among 84 SEP genes were estimated using Bayesian methods. RNA expression patterns were inferred using in situ hybridization. The patterns of protein sequence and RNA expression evolution were reconstructed using maximum parsimony (MP) and maximum likelihood (ML) methods, respectively. Results Phylogenetic analyses mapped the LHS1/OSM5 duplication event to the base of the grass family. MP character reconstructions estimated a change from cytosine to thymine in the first codon position of the first amino acid after the Zea mays MADS3 (ZMM3) domain converted a glutamine to a stop codon in the OSM5 ancestor following the LHS1/OSM5 duplication event. RNA expression analyses of OSM5 co-orthologs in Avena sativa, Chasmanthium latifolium, Hordeum vulgare, Pennisetum glaucum, and Sorghum bicolor followed by ML reconstructions of these data and previously published analyses estimated a complex pattern of gain and loss of LHS1 and OSM5 expression in different floral organs and different flowers within the spikelet or inflorescence. Conclusions Previous authors have reported that rice OSM5 and LHS1 proteins have different interaction partners indicating that the truncation of OSM5 following the LHS1/OSM5 duplication event has resulted in both partitioned and potentially novel gene functions. The complex pattern of OSM5 and LHS1 expression evolution is not consistent with a simple subfunctionalization model following the gene duplication event, but there is evidence of recent partitioning of OSM5 and LHS1 expression within different floral organs of A. sativa, C. latifolium, P. glaucum and S. bicolor, and between the upper and lower florets of the two-flowered maize spikelet. PMID:22340849
Review: The transcripts associated with organ allograft rejection.
Halloran, Philip F; Venner, Jeffery M; Madill-Thomsen, Katelynn S; Einecke, Gunilla; Parkes, Michael D; Hidalgo, Luis G; Famulski, Konrad S
2018-04-01
The molecular mechanisms operating in human organ transplant rejection are best inferred from the mRNAs expressed in biopsies because the corresponding proteins often have low expression and short half-lives, while small non-coding RNAs lack specificity. Associations should be characterized in a population that rigorously identifies T cell-mediated (TCMR) and antibody-mediated rejection (ABMR). This is best achieved in kidney transplant biopsies, but the results are generalizable to heart, lung, or liver transplants. Associations can be universal (all rejection), TCMR-selective, or ABMR-selective, with universal being strongest and ABMR-selective weakest. Top universal transcripts are IFNG-inducible (eg, CXCL11 IDO1, WARS) or shared by effector T cells (ETCs) and NK cells (eg, KLRD1, CCL4). TCMR-selective transcripts are expressed in activated ETCs (eg, CTLA4, IFNG), activated (eg, ADAMDEC1), or IFNG-induced macrophages (eg, ANKRD22). ABMR-selective transcripts are expressed in NK cells (eg, FGFBP2, GNLY) and endothelial cells (eg, ROBO4, DARC). Transcript associations are highly reproducible between biopsy sets when the same rejection definitions, case mix, algorithm, and technology are applied, but exact ranks will vary. Previously published rejection-associated transcripts resemble universal and TCMR-selective transcripts due to incomplete representation of ABMR. Rejection-associated transcripts are never completely rejection-specific because they are shared with the stereotyped response-to-injury and innate immunity. © 2017 The American Society of Transplantation and the American Society of Transplant Surgeons.