protein sequence variations: Topics by Science.gov

Sample records for protein sequence variations

Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation

NASA Astrophysics Data System (ADS)

Sheynkman, Gloria M.; Shortreed, Michael R.; Cesnik, Anthony J.; Smith, Lloyd M.

2016-06-01

Mass spectrometry-based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications.
Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation

PubMed Central

Sheynkman, Gloria M.; Shortreed, Michael R.; Cesnik, Anthony J.; Smith, Lloyd M.

2016-01-01

Mass spectrometry–based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications. PMID:27049631
LenVarDB: database of length-variant protein domains.

PubMed

Mutt, Eshita; Mathew, Oommen K; Sowdhamini, Ramanathan

2014-01-01

Protein domains are functionally and structurally independent modules, which add to the functional variety of proteins. This array of functional diversity has been enabled by evolutionary changes, such as amino acid substitutions or insertions or deletions, occurring in these protein domains. Length variations (indels) can introduce changes at structural, functional and interaction levels. LenVarDB (freely available at http://caps.ncbs.res.in/lenvardb/) traces these length variations, starting from structure-based sequence alignments in our Protein Alignments organized as Structural Superfamilies (PASS2) database, across 731 structural classification of proteins (SCOP)-based protein domain superfamilies connected to 2 730 625 sequence homologues. Alignment of sequence homologues corresponding to a structural domain is available, starting from a structure-based sequence alignment of the superfamily. Orientation of the length-variant (indel) regions in protein domains can be visualized by mapping them on the structure and on the alignment. Knowledge about location of length variations within protein domains and their visual representation will be useful in predicting changes within structurally or functionally relevant sites, which may ultimately regulate protein function. Non-technical summary: Evolutionary changes bring about natural changes to proteins that may be found in many organisms. Such changes could be reflected as amino acid substitutions or insertions-deletions (indels) in protein sequences. LenVarDB is a database that provides an early overview of observed length variations that were set among 731 protein families and after examining >2 million sequences. Indels are followed up to observe if they are close to the active site such that they can affect the activity of proteins. Inclusion of such information can aid the design of bioengineering experiments.
Detection and quantitation of single nucleotide polymorphisms, DNA sequence variations, DNA mutations, DNA damage and DNA mismatches

DOEpatents

McCutchen-Maloney, Sandra L.

2002-01-01

DNA mutation binding proteins alone and as chimeric proteins with nucleases are used with solid supports to detect DNA sequence variations, DNA mutations and single nucleotide polymorphisms. The solid supports may be flow cytometry beads, DNA chips, glass slides or DNA dips sticks. DNA molecules are coupled to solid supports to form DNA-support complexes. Labeled DNA is used with unlabeled DNA mutation binding proteins such at TthMutS to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by binding which gives an increase in signal. Unlabeled DNA is utilized with labeled chimeras to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by nuclease activity of the chimera which gives a decrease in signal.
Dissecting the relationship between protein structure and sequence variation

NASA Astrophysics Data System (ADS)

Shahmoradi, Amir; Wilke, Claus; Wilke Lab Team

2015-03-01

Over the past decade several independent works have shown that some structural properties of proteins are capable of predicting protein evolution. The strength and significance of these structure-sequence relations, however, appear to vary widely among different proteins, with absolute correlation strengths ranging from 0 . 1 to 0 . 8 . Here we present the results from a comprehensive search for the potential biophysical and structural determinants of protein evolution by studying more than 200 structural and evolutionary properties in a dataset of 209 monomeric enzymes. We discuss the main protein characteristics responsible for the general patterns of protein evolution, and identify sequence divergence as the main determinant of the strengths of virtually all structure-evolution relationships, explaining ~ 10 - 30 % of observed variation in sequence-structure relations. In addition to sequence divergence, we identify several protein structural properties that are moderately but significantly coupled with the strength of sequence-structure relations. In particular, proteins with more homogeneous back-bone hydrogen bond energies, large fractions of helical secondary structures and low fraction of beta sheets tend to have the strongest sequence-structure relation. BEACON-NSF center for the study of evolution in action.
Ovine mitochondrial DNA sequence variation and its association with production and reproduction traits within an Afec-Assaf flock.

PubMed

Reicher, S; Seroussi, E; Weller, J I; Rosov, A; Gootwine, E

2012-07-01

Polymorphisms in mitochondrial DNA (mtDNA) protein- and tRNA-coding genes were shown to be associated with various diseases in humans as well as with production and reproduction traits in livestock. Alignment of full length mitochondria sequences from the 5 known ovine haplogroups: HA (n = 3), HB (n = 5), HC (n = 3), HD (n = 2), and HE (n = 2; GenBank accession nos. HE577847-50 and 11 published complete ovine mitochondria sequences) revealed sequence variation in 10 out of the 13 protein coding mtDNA sequences. Twenty-six of the 245 variable sites found in the protein coding sequences represent non-synonymous mutations. Sequence variation was observed also in 8 out of the 22 tRNA mtDNA sequences. On the basis of the mtDNA control region and cytochrome b partial sequences along with information on maternal lineages within an Afec-Assaf flock, 1,126 Afec-Assaf ewes were assigned to mitochondrial haplogroups HA, HB, and HC, with frequencies of 0.43, 0.43, and 0.14, respectively. Analysis of birth weight and growth rate records of lamb (n = 1286) and productivity from 4,993 lambing records revealed no association between mitochondrial haplogroup affiliation and female longevity, lambs perinatal survival rate, birth weight, and daily growth rate of lambs up to 150 d that averaged 1,664 d, 88.3%, 4.5 kg, and 320 g/d, respectively. However, significant (P < 0.0001) differences among the haplogroups were found for prolificacy of ewes, with prolificacies (mean ± SE) of 2.14 ± 0.04, 2.25 ± 0.04, and 2.30 ± 0.06 lamb born/ewe lambing for the HA, HB, and the HC haplogroups, respectively. Our results highlight the ovine mitogenome genetic variation in protein- and tRNA coding genes and suggest that sequence variation in ovine mtDNA is associated with variation in ewe prolificacy.
Three copies of a single protein II-encoding sequence in the genome of Neisseria gonorrhoeae JS3: evidence for gene conversion and gene duplication.

PubMed

van der Ley, P

1988-11-01

Gonococci express a family of related outer membrane proteins designated protein II (P.II). These surface proteins are subject to both phase variation and antigenic variation. The P.II gene repertoire of Neisseria gonorrhoeae strain JS3 was found to consist of at least ten genes, eight of which were cloned. Sequence analysis and DNA hybridization studies revealed that one particular P.II-encoding sequence is present in three distinct, but almost identical, copies in the JS3 genome. These genes encode the P.II protein that was previously identified as P.IIc. Comparison of their sequences shows that the multiple copies of this P.IIc-encoding gene might have been generated by both gene conversion and gene duplication.
WS-SNPs&GO: a web server for predicting the deleterious effect of human protein variants using functional annotation

PubMed Central

2013-01-01

Background SNPs&GO is a method for the prediction of deleterious Single Amino acid Polymorphisms (SAPs) using protein functional annotation. In this work, we present the web server implementation of SNPs&GO (WS-SNPs&GO). The server is based on Support Vector Machines (SVM) and for a given protein, its input comprises: the sequence and/or its three-dimensional structure (when available), a set of target variations and its functional Gene Ontology (GO) terms. The output of the server provides, for each protein variation, the probabilities to be associated to human diseases. Results The server consists of two main components, including updated versions of the sequence-based SNPs&GO (recently scored as one of the best algorithms for predicting deleterious SAPs) and of the structure-based SNPs&GO3d programs. Sequence and structure based algorithms are extensively tested on a large set of annotated variations extracted from the SwissVar database. Selecting a balanced dataset with more than 38,000 SAPs, the sequence-based approach achieves 81% overall accuracy, 0.61 correlation coefficient and an Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve of 0.88. For the subset of ~6,600 variations mapped on protein structures available at the Protein Data Bank (PDB), the structure-based method scores with 84% overall accuracy, 0.68 correlation coefficient, and 0.91 AUC. When tested on a new blind set of variations, the results of the server are 79% and 83% overall accuracy for the sequence-based and structure-based inputs, respectively. Conclusions WS-SNPs&GO is a valuable tool that includes in a unique framework information derived from protein sequence, structure, evolutionary profile, and protein function. WS-SNPs&GO is freely available at http://snps.biofold.org/snps-and-go. PMID:23819482
Relation between native ensembles and experimental structures of proteins

PubMed Central

Best, Robert B.; Lindorff-Larsen, Kresten; DePristo, Mark A.; Vendruscolo, Michele

2006-01-01

Different experimental structures of the same protein or of proteins with high sequence similarity contain many small variations. Here we construct ensembles of “high-sequence similarity Protein Data Bank” (HSP) structures and consider the extent to which such ensembles represent the structural heterogeneity of the native state in solution. We find that different NMR measurements probing structure and dynamics of given proteins in solution, including order parameters, scalar couplings, and residual dipolar couplings, are remarkably well reproduced by their respective high-sequence similarity Protein Data Bank ensembles; moreover, we show that the effects of uncertainties in structure determination are insufficient to explain the results. These results highlight the importance of accounting for native-state protein dynamics in making comparisons with ensemble-averaged experimental data and suggest that even a modest number of structures of a protein determined under different conditions, or with small variations in sequence, capture a representative subset of the true native-state ensemble. PMID:16829580
Variation in a surface-exposed region of the Mycoplasma pneumoniae P40 protein as a consequence of homologous DNA recombination between RepMP5 elements.

PubMed

Spuesens, Emiel B M; van de Kreeke, Nick; Estevão, Silvia; Hoogenboezem, Theo; Sluijter, Marcel; Hartwig, Nico G; van Rossum, Annemarie M C; Vink, Cornelis

2011-02-01

Mycoplasma pneumoniae is a human pathogen that causes a range of respiratory tract infections. The first step in infection is adherence of the bacteria to the respiratory epithelium. This step is mediated by a specialized organelle, which contains several proteins (cytadhesins) that have an important function in adherence. Two of these cytadhesins, P40 and P90, represent the proteolytic products from a single 130 kDa protein precursor, which is encoded by the MPN142 gene. Interestingly, MPN142 contains a repetitive DNA element, termed RepMP5, of which homologues are found at seven other loci within the M. pneumoniae genome. It has been hypothesized that these RepMP5 elements, which are similar but not identical in sequence, recombine with their counterpart within MPN142 and thereby provide a source of sequence variation for this gene. As this variation may give rise to amino acid changes within P40 and P90, the recombination between RepMP5 elements may constitute the basis of antigenic variation and, possibly, immune evasion by M. pneumoniae. To investigate the sequence variation of MPN142 in relation to inter-RepMP5 recombination, we determined the sequences of all RepMP5 elements in a collection of 25 strains. The results indicate that: (i) inter-RepMP5 recombination events have occurred in seven of the strains, and (ii) putative RepMP5 recombination events involving MPN142 have induced amino acid changes in a surface-exposed part of the P40 protein in two of the strains. We conclude that recombination between RepMP5 elements is a common phenomenon that may lead to sequence variation of MPN142-encoded proteins.
Genetic Variation and Its Reflection on Posttranslational Modifications in Frequency Clock and Mating Type a-1 Proteins in Sordaria fimicola

PubMed Central

Arif, Rabia; Akram, Faiza; Jamil, Tazeen; Lee, Siu Fai

2017-01-01

Posttranslational modifications (PTMs) occur in all essential proteins taking command of their functions. There are many domains inside proteins where modifications take place on side-chains of amino acids through various enzymes to generate different species of proteins. In this manuscript we have, for the first time, predicted posttranslational modifications of frequency clock and mating type a-1 proteins in Sordaria fimicola collected from different sites to see the effect of environment on proteins or various amino acids pickings and their ultimate impact on consensus sequences present in mating type proteins using bioinformatics tools. Furthermore, we have also measured and walked through genomic DNA of various Sordaria strains to determine genetic diversity by genotyping the short sequence repeats (SSRs) of wild strains of S. fimicola collected from contrasting environments of two opposing slopes (harsh and xeric south facing slope and mild north facing slope) of Evolution Canyon (EC), Israel. Based on the whole genome sequence of S. macrospora, we targeted 20 genomic regions in S. fimicola which contain short sequence repeats (SSRs). Our data revealed genetic variations in strains from south facing slope and these findings assist in the hypothesis that genetic variations caused by stressful environments lead to evolution. PMID:28717646
Genetic Variation and Its Reflection on Posttranslational Modifications in Frequency Clock and Mating Type a-1 Proteins in Sordaria fimicola.

PubMed

Arif, Rabia; Akram, Faiza; Jamil, Tazeen; Mukhtar, Hamid; Lee, Siu Fai; Saleem, Muhammad

2017-01-01

Posttranslational modifications (PTMs) occur in all essential proteins taking command of their functions. There are many domains inside proteins where modifications take place on side-chains of amino acids through various enzymes to generate different species of proteins. In this manuscript we have, for the first time, predicted posttranslational modifications of frequency clock and mating type a-1 proteins in Sordaria fimicola collected from different sites to see the effect of environment on proteins or various amino acids pickings and their ultimate impact on consensus sequences present in mating type proteins using bioinformatics tools. Furthermore, we have also measured and walked through genomic DNA of various Sordaria strains to determine genetic diversity by genotyping the short sequence repeats (SSRs) of wild strains of S. fimicola collected from contrasting environments of two opposing slopes (harsh and xeric south facing slope and mild north facing slope) of Evolution Canyon (EC), Israel. Based on the whole genome sequence of S. macrospora , we targeted 20 genomic regions in S. fimicola which contain short sequence repeats (SSRs). Our data revealed genetic variations in strains from south facing slope and these findings assist in the hypothesis that genetic variations caused by stressful environments lead to evolution.
UET: a database of evolutionarily-predicted functional determinants of protein sequences that cluster as functional sites in protein structures.

PubMed

Lua, Rhonald C; Wilson, Stephen J; Konecki, Daniel M; Wilkins, Angela D; Venner, Eric; Morgan, Daniel H; Lichtarge, Olivier

2016-01-04

The structure and function of proteins underlie most aspects of biology and their mutational perturbations often cause disease. To identify the molecular determinants of function as well as targets for drugs, it is central to characterize the important residues and how they cluster to form functional sites. The Evolutionary Trace (ET) achieves this by ranking the functional and structural importance of the protein sequence positions. ET uses evolutionary distances to estimate functional distances and correlates genotype variations with those in the fitness phenotype. Thus, ET ranks are worse for sequence positions that vary among evolutionarily closer homologs but better for positions that vary mostly among distant homologs. This approach identifies functional determinants, predicts function, guides the mutational redesign of functional and allosteric specificity, and interprets the action of coding sequence variations in proteins, people and populations. Now, the UET database offers pre-computed ET analyses for the protein structure databank, and on-the-fly analysis of any protein sequence. A web interface retrieves ET rankings of sequence positions and maps results to a structure to identify functionally important regions. This UET database integrates several ways of viewing the results on the protein sequence or structure and can be found at http://mammoth.bcm.tmc.edu/uet/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Sequence variation and structural conservation allows development of novel function and immune evasion in parasite surface protein families

PubMed Central

Higgins, Matthew K; Carrington, Mark

2014-01-01

Trypanosoma and Plasmodium species are unicellular, eukaryotic pathogens that have evolved the capacity to survive and proliferate within a human host, causing sleeping sickness and malaria, respectively. They have very different survival strategies. African trypanosomes divide in blood and extracellular spaces, whereas Plasmodium species invade and proliferate within host cells. Interaction with host macromolecules is central to establishment and maintenance of an infection by both parasites. Proteins that mediate these interactions are under selection pressure to bind host ligands without compromising immune avoidance strategies. In both parasites, the expansion of genes encoding a small number of protein folds has established large protein families. This has permitted both diversification to form novel ligand binding sites and variation in sequence that contributes to avoidance of immune recognition. In this review we consider two such parasite surface protein families, one from each species. In each case, known structures demonstrate how extensive sequence variation around a conserved molecular architecture provides an adaptable protein scaffold that the parasites can mobilise to mediate interactions with their hosts. PMID:24442723
Global sequence variation in the histidine-rich proteins 2 and 3 of Plasmodium falciparum: implications for the performance of malaria rapid diagnostic tests

PubMed Central

2010-01-01

Background Accurate diagnosis is essential for prompt and appropriate treatment of malaria. While rapid diagnostic tests (RDTs) offer great potential to improve malaria diagnosis, the sensitivity of RDTs has been reported to be highly variable. One possible factor contributing to variable test performance is the diversity of parasite antigens. This is of particular concern for Plasmodium falciparum histidine-rich protein 2 (PfHRP2)-detecting RDTs since PfHRP2 has been reported to be highly variable in isolates of the Asia-Pacific region. Methods The pfhrp2 exon 2 fragment from 458 isolates of P. falciparum collected from 38 countries was amplified and sequenced. For a subset of 80 isolates, the exon 2 fragment of histidine-rich protein 3 (pfhrp3) was also amplified and sequenced. DNA sequence and statistical analysis of the variation observed in these genes was conducted. The potential impact of the pfhrp2 variation on RDT detection rates was examined by analysing the relationship between sequence characteristics of this gene and the results of the WHO product testing of malaria RDTs: Round 1 (2008), for 34 PfHRP2-detecting RDTs. Results Sequence analysis revealed extensive variations in the number and arrangement of various repeats encoded by the genes in parasite populations world-wide. However, no statistically robust correlation between gene structure and RDT detection rate for P. falciparum parasites at 200 parasites per microlitre was identified. Conclusions The results suggest that despite extreme sequence variation, diversity of PfHRP2 does not appear to be a major cause of RDT sensitivity variation. PMID:20470441
Chimeric proteins for detection and quantitation of DNA mutations, DNA sequence variations, DNA damage and DNA mismatches

DOEpatents

McCutchen-Maloney, Sandra L.

2002-01-01

Chimeric proteins having both DNA mutation binding activity and nuclease activity are synthesized by recombinant technology. The proteins are of the general formula A-L-B and B-L-A where A is a peptide having DNA mutation binding activity, L is a linker and B is a peptide having nuclease activity. The chimeric proteins are useful for detection and identification of DNA sequence variations including DNA mutations (including DNA damage and mismatches) by binding to the DNA mutation and cutting the DNA once the DNA mutation is detected.
Conservation of the C-type lectin fold for massive sequence variation in a Treponema diversity-generating retroelement

DOE Office of Scientific and Technical Information (OSTI.GOV)

Le Coq, Johanne; Ghosh, Partho

2012-06-19

Anticipatory ligand binding through massive protein sequence variation is rare in biological systems, having been observed only in the vertebrate adaptive immune response and in a phage diversity-generating retroelement (DGR). Earlier work has demonstrated that the prototypical DGR variable protein, major tropism determinant (Mtd), meets the demands of anticipatory ligand binding by novel means through the C-type lectin (CLec) fold. However, because of the low sequence identity among DGR variable proteins, it has remained unclear whether the CLec fold is a general solution for DGRs. We have addressed this problem by determining the structure of a second DGR variable protein,more » TvpA, from the pathogenic oral spirochete Treponema denticola. Despite its weak sequence identity to Mtd ({approx}16%), TvpA was found to also have a CLec fold, with predicted variable residues exposed in a ligand-binding site. However, this site in TvpA was markedly more variable than the one in Mtd, reflecting the unprecedented approximate 10{sup 20} potential variability of TvpA. In addition, similarity between TvpA and Mtd with formylglycine-generating enzymes was detected. These results provide strong evidence for the conservation of the formylglycine-generating enzyme-type CLec fold among DGRs as a means of accommodating massive sequence variation.« less
Conservation of the C-type lectin fold for massive sequence variation in a Treponema diversity-generating retroelement

PubMed Central

Le Coq, Johanne; Ghosh, Partho

2011-01-01

Anticipatory ligand binding through massive protein sequence variation is rare in biological systems, having been observed only in the vertebrate adaptive immune response and in a phage diversity-generating retroelement (DGR). Earlier work has demonstrated that the prototypical DGR variable protein, major tropism determinant (Mtd), meets the demands of anticipatory ligand binding by novel means through the C-type lectin (CLec) fold. However, because of the low sequence identity among DGR variable proteins, it has remained unclear whether the CLec fold is a general solution for DGRs. We have addressed this problem by determining the structure of a second DGR variable protein, TvpA, from the pathogenic oral spirochete Treponema denticola. Despite its weak sequence identity to Mtd (∼16%), TvpA was found to also have a CLec fold, with predicted variable residues exposed in a ligand-binding site. However, this site in TvpA was markedly more variable than the one in Mtd, reflecting the unprecedented approximate 1020 potential variability of TvpA. In addition, similarity between TvpA and Mtd with formylglycine-generating enzymes was detected. These results provide strong evidence for the conservation of the formylglycine-generating enzyme-type CLec fold among DGRs as a means of accommodating massive sequence variation. PMID:21873231
Conservation of the C-type lectin fold for massive sequence variation in a Treponema diversity-generating retroelement.

PubMed

Le Coq, Johanne; Ghosh, Partho

2011-08-30

Anticipatory ligand binding through massive protein sequence variation is rare in biological systems, having been observed only in the vertebrate adaptive immune response and in a phage diversity-generating retroelement (DGR). Earlier work has demonstrated that the prototypical DGR variable protein, major tropism determinant (Mtd), meets the demands of anticipatory ligand binding by novel means through the C-type lectin (CLec) fold. However, because of the low sequence identity among DGR variable proteins, it has remained unclear whether the CLec fold is a general solution for DGRs. We have addressed this problem by determining the structure of a second DGR variable protein, TvpA, from the pathogenic oral spirochete Treponema denticola. Despite its weak sequence identity to Mtd (∼16%), TvpA was found to also have a CLec fold, with predicted variable residues exposed in a ligand-binding site. However, this site in TvpA was markedly more variable than the one in Mtd, reflecting the unprecedented approximate 10(20) potential variability of TvpA. In addition, similarity between TvpA and Mtd with formylglycine-generating enzymes was detected. These results provide strong evidence for the conservation of the formylglycine-generating enzyme-type CLec fold among DGRs as a means of accommodating massive sequence variation.
Molecular interaction networks in the analyses of sequence variation and proteomics data.

PubMed

Stelzl, Ulrich

2013-12-01

Protein-protein interaction networks are typically generated in standard cell lines or model organisms as it is prohibitively difficult to record large interaction datasets from specific tissues or disease models at a reasonable pace. Although the interaction data are of high confidence, they thus do not reflect in vivo relationships as such. A wealth of physiologically relevant protein information, obtained under different conditions and from different systems, is available including information on genetic variation, protein levels, and PTMs. However, these data are difficult to assess comprehensively because the relationships between the entities remain elusive from the measurements. Here, we exemplarily highlight recent studies that gained deeper insight from genetic variation, protein, and PTM measurements using interaction information pointing toward the importance and potential of interaction networks for the interpretation of sequencing and proteomics data. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

Integrating mRNA and Protein Sequencing Enables the Detection and Quantitative Profiling of Natural Protein Sequence Variants of Populus trichocarpa.

PubMed

Abraham, Paul E; Wang, Xiaojing; Ranjan, Priya; Nookaew, Intawat; Zhang, Bing; Tuskan, Gerald A; Hettich, Robert L

2015-12-04

Next-generation sequencing has transformed the ability to link genotypes to phenotypes and facilitates the dissection of genetic contribution to complex traits. However, it is challenging to link genetic variants with the perturbed functional effects on proteins encoded by such genes. Here we show how RNA sequencing can be exploited to construct genotype-specific protein sequence databases to assess natural variation in proteins, providing information about the molecular toolbox driving cellular processes. For this study, we used two natural genotypes selected from a recent genome-wide association study of Populus trichocarpa, an obligate outcrosser with tremendous phenotypic variation across the natural population. This strategy allowed us to comprehensively catalogue proteins containing single amino acid polymorphisms (SAAPs), as well as insertions and deletions. We profiled the frequency of 128 types of naturally occurring amino acid substitutions, including both expected (neutral) and unexpected (non-neutral) SAAPs, with a subset occurring in regions of the genome having strong polymorphism patterns consistent with recent positive and/or divergent selection. By zeroing in on the molecular signatures of these important regions that might have previously been uncharacterized, we now provide a high-resolution molecular inventory that should improve accessibility and subsequent identification of natural protein variants in future genotype-to-phenotype studies.
Streptococcal M protein extracted by nonionic detergent. III. Correlation between immunological cross-reactions and structural similarities with implications for antiphagocytosis

PubMed Central

1978-01-01

Three immunologically cross-reactive and non-cross-reactive streptococcal M proteins were analyzed by a chromatographic tryptic peptide mapping system. The results indicate that cross-reactions correlate with the extent of structural similarity among the M protein molecules analyzed. The data also reveal that free lysine is released by the action of trypsin from these three M proteins, suggesting a common lys-lys or arg-lys sequence. In addition, only one peptide has been found to be common within all three M types. This limited structural relatedness among the three M proteins examined indicates that sequence variation plays a major role in the immunological specificity of the M antigens. However, despite sequence variation, all M protein molecules have a common antiphagocytic activity. The fact that no common opsonic antibody has yet been found, even against limited M types, argues against this biological activity being solely the result of a common sequence. Based on these data, it is suggested that the antiphagocytic effect of M protein may be due to a conformationally created environment on the surface of the molecule which is selected by both immunological and biological pressure. PMID:355596
Computational analysis of sequence selection mechanisms.

PubMed

Meyerguz, Leonid; Grasso, Catherine; Kleinberg, Jon; Elber, Ron

2004-04-01

Mechanisms leading to gene variations are responsible for the diversity of species and are important components of the theory of evolution. One constraint on gene evolution is that of protein foldability; the three-dimensional shapes of proteins must be thermodynamically stable. We explore the impact of this constraint and calculate properties of foldable sequences using 3660 structures from the Protein Data Bank. We seek a selection function that receives sequences as input, and outputs survival probability based on sequence fitness to structure. We compute the number of sequences that match a particular protein structure with energy lower than the native sequence, the density of the number of sequences, the entropy, and the "selection" temperature. The mechanism of structure selection for sequences longer than 200 amino acids is approximately universal. For shorter sequences, it is not. We speculate on concrete evolutionary mechanisms that show this behavior.
Influence of Molecular Resolution on Sequence-Based Discovery of Ecological Diversity among Synechococcus Populations in an Alkaline Siliceous Hot Spring Microbial Mat ▿ †

PubMed Central

Melendrez, Melanie C.; Lange, Rachel K.; Cohan, Frederick M.; Ward, David M.

2011-01-01

Previous research has shown that sequences of 16S rRNA genes and 16S-23S rRNA internal transcribed spacer regions may not have enough genetic resolution to define all ecologically distinct Synechococcus populations (ecotypes) inhabiting alkaline, siliceous hot spring microbial mats. To achieve higher molecular resolution, we studied sequence variation in three protein-encoding loci sampled by PCR from 60°C and 65°C sites in the Mushroom Spring mat (Yellowstone National Park, WY). Sequences were analyzed using the ecotype simulation (ES) and AdaptML algorithms to identify putative ecotypes. Between 4 and 14 times more putative ecotypes were predicted from variation in protein-encoding locus sequences than from variation in 16S rRNA and 16S-23S rRNA internal transcribed spacer sequences. The number of putative ecotypes predicted depended on the number of sequences sampled and the molecular resolution of the locus. Chao estimates of diversity indicated that few rare ecotypes were missed. Many ecotypes hypothesized by sequence analyses were different in their habitat specificities, suggesting different adaptations to temperature or other parameters that vary along the flow channel. PMID:21169433
A Bioinformatics Workflow for Variant Peptide Detection in Shotgun Proteomics*

PubMed Central

Li, Jing; Su, Zengliu; Ma, Ze-Qiang; Slebos, Robbert J. C.; Halvey, Patrick; Tabb, David L.; Liebler, Daniel C.; Pao, William; Zhang, Bing

2011-01-01

Shotgun proteomics data analysis usually relies on database search. However, commonly used protein sequence databases do not contain information on protein variants and thus prevent variant peptides and proteins from been identified. Including known coding variations into protein sequence databases could help alleviate this problem. Based on our recently published human Cancer Proteome Variation Database, we have created a protein sequence database that comprehensively annotates thousands of cancer-related coding variants collected in the Cancer Proteome Variation Database as well as noncancer-specific ones from the Single Nucleotide Polymorphism Database (dbSNP). Using this database, we then developed a data analysis workflow for variant peptide identification in shotgun proteomics. The high risk of false positive variant identifications was addressed by a modified false discovery rate estimation method. Analysis of colorectal cancer cell lines SW480, RKO, and HCT-116 revealed a total of 81 peptides that contain either noncancer-specific or cancer-related variations. Twenty-three out of 26 variants randomly selected from the 81 were confirmed by genomic sequencing. We further applied the workflow on data sets from three individual colorectal tumor specimens. A total of 204 distinct variant peptides were detected, and five carried known cancer-related mutations. Each individual showed a specific pattern of cancer-related mutations, suggesting potential use of this type of information for personalized medicine. Compatibility of the workflow has been tested with four popular database search engines including Sequest, Mascot, X!Tandem, and MyriMatch. In summary, we have developed a workflow that effectively uses existing genomic data to enable variant peptide detection in proteomics. PMID:21389108
Aggregation of population‐based genetic variation over protein domain homologues and its potential use in genetic diagnostics

PubMed Central

Wiel, Laurens; Venselaar, Hanka; Veltman, Joris A.; Vriend, Gert

2017-01-01

Abstract Whole exomes of patients with a genetic disorder are nowadays routinely sequenced but interpretation of the identified genetic variants remains a major challenge. The increased availability of population‐based human genetic variation has given rise to measures of genetic tolerance that have been used, for example, to predict disease‐causing genes in neurodevelopmental disorders. Here, we investigated whether combining variant information from homologous protein domains can improve variant interpretation. For this purpose, we developed a framework that maps population variation and known pathogenic mutations onto 2,750 “meta‐domains.” These meta‐domains consist of 30,853 homologous Pfam protein domain instances that cover 36% of all human protein coding sequences. We find that genetic tolerance is consistent across protein domain homologues, and that patterns of genetic tolerance faithfully mimic patterns of evolutionary conservation. Furthermore, for a significant fraction (68%) of the meta‐domains high‐frequency population variation re‐occurs at the same positions across domain homologues more often than expected. In addition, we observe that the presence of pathogenic missense variants at an aligned homologous domain position is often paired with the absence of population variation and vice versa. The use of these meta‐domains can improve the interpretation of genetic variation. PMID:28815929
Natural variation in floral nectar proteins of two Nicotiana attenuata accessions.

PubMed

Seo, Pil Joon; Wielsch, Natalie; Kessler, Danny; Svatos, Ales; Park, Chung-Mo; Baldwin, Ian T; Kim, Sang-Gyu

2013-07-13

Floral nectar (FN) contains not only energy-rich compounds to attract pollinators, but also defense chemicals and several proteins. However, proteomic analysis of FN has been hampered by the lack of publically available sequence information from nectar-producing plants. Here we used next-generation sequencing and advanced proteomics to profile FN proteins in the opportunistic outcrossing wild tobacco, Nicotiana attenuata. We constructed a transcriptome database of N. attenuata and characterized its nectar proteome using LC-MS/MS. The FN proteins of N. attenuata included nectarins, sugar-cleaving enzymes (glucosidase, galactosidase, and xylosidase), RNases, pathogen-related proteins, and lipid transfer proteins. Natural variation in FN proteins of eleven N. attenuata accessions revealed a negative relationship between the accumulation of two abundant proteins, nectarin1b and nectarin5. In addition, microarray analysis of nectary tissues revealed that protein accumulation in FN is not simply correlated with the accumulation of transcripts encoding FN proteins and identified a group of genes that were specifically expressed in the nectary. Natural variation of identified FN proteins in the ecological model plant N. attenuata suggests that nectar chemistry may have a complex function in plant-pollinator-microbe interactions.
Natural variation in floral nectar proteins of two Nicotiana attenuata accessions

PubMed Central

2013-01-01

Background Floral nectar (FN) contains not only energy-rich compounds to attract pollinators, but also defense chemicals and several proteins. However, proteomic analysis of FN has been hampered by the lack of publically available sequence information from nectar-producing plants. Here we used next-generation sequencing and advanced proteomics to profile FN proteins in the opportunistic outcrossing wild tobacco, Nicotiana attenuata. Results We constructed a transcriptome database of N. attenuata and characterized its nectar proteome using LC-MS/MS. The FN proteins of N. attenuata included nectarins, sugar-cleaving enzymes (glucosidase, galactosidase, and xylosidase), RNases, pathogen-related proteins, and lipid transfer proteins. Natural variation in FN proteins of eleven N. attenuata accessions revealed a negative relationship between the accumulation of two abundant proteins, nectarin1b and nectarin5. In addition, microarray analysis of nectary tissues revealed that protein accumulation in FN is not simply correlated with the accumulation of transcripts encoding FN proteins and identified a group of genes that were specifically expressed in the nectary. Conclusions Natural variation of identified FN proteins in the ecological model plant N. attenuata suggests that nectar chemistry may have a complex function in plant-pollinator-microbe interactions. PMID:23848992
A theoretical method to compute sequence dependent configurational properties in charged polymers and proteins.

PubMed

Sawle, Lucas; Ghosh, Kingshuk

2015-08-28

A general formalism to compute configurational properties of proteins and other heteropolymers with an arbitrary sequence of charges and non-uniform excluded volume interaction is presented. A variational approach is utilized to predict average distance between any two monomers in the chain. The presented analytical model, for the first time, explicitly incorporates the role of sequence charge distribution to determine relative sizes between two sequences that vary not only in total charge composition but also in charge decoration (even when charge composition is fixed). Furthermore, the formalism is general enough to allow variation in excluded volume interactions between two monomers. Model predictions are benchmarked against the all-atom Monte Carlo studies of Das and Pappu [Proc. Natl. Acad. Sci. U. S. A. 110, 13392 (2013)] for 30 different synthetic sequences of polyampholytes. These sequences possess an equal number of glutamic acid (E) and lysine (K) residues but differ in the patterning within the sequence. Without any fit parameter, the model captures the strong sequence dependence of the simulated values of the radius of gyration with a correlation coefficient of R(2) = 0.9. The model is then applied to real proteins to compare the unfolded state dimensions of 540 orthologous pairs of thermophilic and mesophilic proteins. The excluded volume parameters are assumed similar under denatured conditions, and only electrostatic effects encoded in the sequence are accounted for. With these assumptions, thermophilic proteins are found-with high statistical significance-to have more compact disordered ensemble compared to their mesophilic counterparts. The method presented here, due to its analytical nature, is capable of making such high throughput analysis of multiple proteins and will have broad applications in proteomic studies as well as in other heteropolymeric systems.
Detection and sequence/structure mapping of biophysical constraints to protein variation in saturated mutational libraries and protein sequence alignments with a dedicated server.

PubMed

Abriata, Luciano A; Bovigny, Christophe; Dal Peraro, Matteo

2016-06-17

Protein variability can now be studied by measuring high-resolution tolerance-to-substitution maps and fitness landscapes in saturated mutational libraries. But these rich and expensive datasets are typically interpreted coarsely, restricting detailed analyses to positions of extremely high or low variability or dubbed important beforehand based on existing knowledge about active sites, interaction surfaces, (de)stabilizing mutations, etc. Our new webserver PsychoProt (freely available without registration at http://psychoprot.epfl.ch or at http://lucianoabriata.altervista.org/psychoprot/index.html ) helps to detect, quantify, and sequence/structure map the biophysical and biochemical traits that shape amino acid preferences throughout a protein as determined by deep-sequencing of saturated mutational libraries or from large alignments of naturally occurring variants. We exemplify how PsychoProt helps to (i) unveil protein structure-function relationships from experiments and from alignments that are consistent with structures according to coevolution analysis, (ii) recall global information about structural and functional features and identify hitherto unknown constraints to variation in alignments, and (iii) point at different sources of variation among related experimental datasets or between experimental and alignment-based data. Remarkably, metabolic costs of the amino acids pose strong constraints to variability at protein surfaces in nature but not in the laboratory. This and other differences call for caution when extrapolating results from in vitro experiments to natural scenarios in, for example, studies of protein evolution. We show through examples how PsychoProt can be a useful tool for the broad communities of structural biology and molecular evolution, particularly for studies about protein modeling, evolution and design.
Variability of the protein sequences of lcrV between epidemic and atypical rhamnose-positive strains of Yersinia pestis.

PubMed

Anisimov, Andrey P; Panfertsev, Evgeniy A; Svetoch, Tat'yana E; Dentovskaya, Svetlana V

2007-01-01

Sequencing of lcrV genes and comparison of the deduced amino acid sequences from ten Y. pestis strains belonging mostly to the group of atypical rhamnose-positive isolates (non-pestis subspecies or pestoides group) showed that the LcrV proteins analyzed could be classified into five sequence types. This classification was based on major amino acid polymorphisms among LcrV proteins in the four "hot points" of the protein sequences. Some additional minor polymorphisms were found throughout these sequence types. The "hot points" corresponded to amino acids 18 (Lys --> Asn), 72 (Lys --> Arg), 273 (Cys --> Ser), and 324-326 (Ser-Gly-Lys --> Arg) in the LcrV sequence of the reference Y. pestis strain CO92. One possible explanation for polymorphism in amino acid sequences of LcrV among different strains is that strain-specific variation resulted from adaptation of the plague pathogen to different rodent and lagomorph hosts.
Single nucleotide variations: Biological impact and theoretical interpretation

PubMed Central

Katsonis, Panagiotis; Koire, Amanda; Wilson, Stephen Joseph; Hsu, Teng-Kuei; Lua, Rhonald C; Wilkins, Angela Dawn; Lichtarge, Olivier

2014-01-01

Genome-wide association studies (GWAS) and whole-exome sequencing (WES) generate massive amounts of genomic variant information, and a major challenge is to identify which variations drive disease or contribute to phenotypic traits. Because the majority of known disease-causing mutations are exonic non-synonymous single nucleotide variations (nsSNVs), most studies focus on whether these nsSNVs affect protein function. Computational studies show that the impact of nsSNVs on protein function reflects sequence homology and structural information and predict the impact through statistical methods, machine learning techniques, or models of protein evolution. Here, we review impact prediction methods and discuss their underlying principles, their advantages and limitations, and how they compare to and complement one another. Finally, we present current applications and future directions for these methods in biological research and medical genetics. PMID:25234433
Sequence variations in RepMP2/3 and RepMP4 elements reveal intragenomic homologous DNA recombination events in Mycoplasma pneumoniae.

PubMed

Spuesens, Emiel B M; Oduber, Minoushka; Hoogenboezem, Theo; Sluijter, Marcel; Hartwig, Nico G; van Rossum, Annemarie M C; Vink, Cornelis

2009-07-01

The gene encoding major adhesin protein P1 of Mycoplasma pneumoniae, MPN141, contains two DNA sequence stretches, designated RepMP2/3 and RepMP4, which display variation among strains. This variation allows strains to be differentiated into two major P1 genotypes (1 and 2) and several variants. Interestingly, multiple versions of the RepMP2/3 and RepMP4 elements exist at other sites within the bacterial genome. Because these versions are closely related in sequence, but not identical, it has been hypothesized that they have the capacity to recombine with their counterparts within MPN141, and thereby serve as a source of sequence variation of the P1 protein. In order to determine the variation within the RepMP2/3 and RepMP4 elements, both within the bacterial genome and among strains, we analysed the DNA sequences of all RepMP2/3 and RepMP4 elements within the genomes of 23 M. pneumoniae strains. Our data demonstrate that: (i) recombination is likely to have occurred between two RepMP2/3 elements in four of the strains, and (ii) all previously described P1 genotypes can be explained by inter-RepMP recombination events. Moreover, the difference between the two major P1 genotypes was reflected in all RepMP elements, such that subtype 1 and 2 strains can be differentiated on the basis of sequence variation in each RepMP element. This implies that subtype 1 and subtype 2 strains represent evolutionarily diverged strain lineages. Finally, a classification scheme is proposed in which the P1 genotype of M. pneumoniae isolates can be described in a sequence-based, universal fashion.
BayesPI-BAR: a new biophysical model for characterization of regulatory sequence variations

PubMed Central

Wang, Junbai; Batmanov, Kirill

2015-01-01

Sequence variations in regulatory DNA regions are known to cause functionally important consequences for gene expression. DNA sequence variations may have an essential role in determining phenotypes and may be linked to disease; however, their identification through analysis of massive genome-wide sequencing data is a great challenge. In this work, a new computational pipeline, a Bayesian method for protein–DNA interaction with binding affinity ranking (BayesPI-BAR), is proposed for quantifying the effect of sequence variations on protein binding. BayesPI-BAR uses biophysical modeling of protein–DNA interactions to predict single nucleotide polymorphisms (SNPs) that cause significant changes in the binding affinity of a regulatory region for transcription factors (TFs). The method includes two new parameters (TF chemical potentials or protein concentrations and direct TF binding targets) that are neglected by previous methods. The new method is verified on 67 known human regulatory SNPs, of which 47 (70%) have predicted true TFs ranked in the top 10. Importantly, the performance of BayesPI-BAR, which uses principal component analysis to integrate multiple predictions from various TF chemical potentials, is found to be better than that of existing programs, such as sTRAP and is-rSNP, when evaluated on the same SNPs. BayesPI-BAR is a publicly available tool and is able to carry out parallelized computation, which helps to investigate a large number of TFs or SNPs and to detect disease-associated regulatory sequence variations in the sea of genome-wide noncoding regions. PMID:26202972
How Many Protein Sequences Fold to a Given Structure? A Coevolutionary Analysis.

PubMed

Tian, Pengfei; Best, Robert B

2017-10-17

Quantifying the relationship between protein sequence and structure is key to understanding the protein universe. A fundamental measure of this relationship is the total number of amino acid sequences that can fold to a target protein structure, known as the "sequence capacity," which has been suggested as a proxy for how designable a given protein fold is. Although sequence capacity has been extensively studied using lattice models and theory, numerical estimates for real protein structures are currently lacking. In this work, we have quantitatively estimated the sequence capacity of 10 proteins with a variety of different structures using a statistical model based on residue-residue co-evolution to capture the variation of sequences from the same protein family. Remarkably, we find that even for the smallest protein folds, such as the WW domain, the number of foldable sequences is extremely large, exceeding the Avogadro constant. In agreement with earlier theoretical work, the calculated sequence capacity is positively correlated with the size of the protein, or better, the density of contacts. This allows the absolute sequence capacity of a given protein to be approximately predicted from its structure. On the other hand, the relative sequence capacity, i.e., normalized by the total number of possible sequences, is an extremely tiny number and is strongly anti-correlated with the protein length. Thus, although there may be more foldable sequences for larger proteins, it will be much harder to find them. Lastly, we have correlated the evolutionary age of proteins in the CATH database with their sequence capacity as predicted by our model. The results suggest a trade-off between the opposing requirements of high designability and the likelihood of a novel fold emerging by chance. Published by Elsevier Inc.
A wide extent of inter-strain diversity in virulent and vaccine strains of alphaherpesviruses.

PubMed

Szpara, Moriah L; Tafuri, Yolanda R; Parsons, Lance; Shamim, S Rafi; Verstrepen, Kevin J; Legendre, Matthieu; Enquist, L W

2011-10-01

Alphaherpesviruses are widespread in the human population, and include herpes simplex virus 1 (HSV-1) and 2, and varicella zoster virus (VZV). These viral pathogens cause epithelial lesions, and then infect the nervous system to cause lifelong latency, reactivation, and spread. A related veterinary herpesvirus, pseudorabies (PRV), causes similar disease in livestock that result in significant economic losses. Vaccines developed for VZV and PRV serve as useful models for the development of an HSV-1 vaccine. We present full genome sequence comparisons of the PRV vaccine strain Bartha, and two virulent PRV isolates, Kaplan and Becker. These genome sequences were determined by high-throughput sequencing and assembly, and present new insights into the attenuation of a mammalian alphaherpesvirus vaccine strain. We find many previously unknown coding differences between PRV Bartha and the virulent strains, including changes to the fusion proteins gH and gB, and over forty other viral proteins. Inter-strain variation in PRV protein sequences is much closer to levels previously observed for HSV-1 than for the highly stable VZV proteome. Almost 20% of the PRV genome contains tandem short sequence repeats (SSRs), a class of nucleic acids motifs whose length-variation has been associated with changes in DNA binding site efficiency, transcriptional regulation, and protein interactions. We find SSRs throughout the herpesvirus family, and provide the first global characterization of SSRs in viruses, both within and between strains. We find SSR length variation between different isolates of PRV and HSV-1, which may provide a new mechanism for phenotypic variation between strains. Finally, we detected a small number of polymorphic bases within each plaque-purified PRV strain, and we characterize the effect of passage and plaque-purification on these polymorphisms. These data add to growing evidence that even plaque-purified stocks of stable DNA viruses exhibit limited sequence heterogeneity, which likely seeds future strain evolution.
Conversion of amino-acid sequence in proteins to classical music: search for auditory patterns

PubMed Central

2007-01-01

We have converted genome-encoded protein sequences into musical notes to reveal auditory patterns without compromising musicality. We derived a reduced range of 13 base notes by pairing similar amino acids and distinguishing them using variations of three-note chords and codon distribution to dictate rhythm. The conversion will help make genomic coding sequences more approachable for the general public, young children, and vision-impaired scientists. PMID:17477882
Analysis of protein-coding genetic variation in 60,706 humans.

PubMed

Lek, Monkol; Karczewski, Konrad J; Minikel, Eric V; Samocha, Kaitlin E; Banks, Eric; Fennell, Timothy; O'Donnell-Luria, Anne H; Ware, James S; Hill, Andrew J; Cummings, Beryl B; Tukiainen, Taru; Birnbaum, Daniel P; Kosmicki, Jack A; Duncan, Laramie E; Estrada, Karol; Zhao, Fengmei; Zou, James; Pierce-Hoffman, Emma; Berghout, Joanne; Cooper, David N; Deflaux, Nicole; DePristo, Mark; Do, Ron; Flannick, Jason; Fromer, Menachem; Gauthier, Laura; Goldstein, Jackie; Gupta, Namrata; Howrigan, Daniel; Kiezun, Adam; Kurki, Mitja I; Moonshine, Ami Levy; Natarajan, Pradeep; Orozco, Lorena; Peloso, Gina M; Poplin, Ryan; Rivas, Manuel A; Ruano-Rubio, Valentin; Rose, Samuel A; Ruderfer, Douglas M; Shakir, Khalid; Stenson, Peter D; Stevens, Christine; Thomas, Brett P; Tiao, Grace; Tusie-Luna, Maria T; Weisburd, Ben; Won, Hong-Hee; Yu, Dongmei; Altshuler, David M; Ardissino, Diego; Boehnke, Michael; Danesh, John; Donnelly, Stacey; Elosua, Roberto; Florez, Jose C; Gabriel, Stacey B; Getz, Gad; Glatt, Stephen J; Hultman, Christina M; Kathiresan, Sekar; Laakso, Markku; McCarroll, Steven; McCarthy, Mark I; McGovern, Dermot; McPherson, Ruth; Neale, Benjamin M; Palotie, Aarno; Purcell, Shaun M; Saleheen, Danish; Scharf, Jeremiah M; Sklar, Pamela; Sullivan, Patrick F; Tuomilehto, Jaakko; Tsuang, Ming T; Watkins, Hugh C; Wilson, James G; Daly, Mark J; MacArthur, Daniel G

2016-08-18

Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.
NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins

PubMed Central

Pruitt, Kim D.; Tatusova, Tatiana; Maglott, Donna R.

2005-01-01

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes sequence data that are currently publicly available in the archival databases. The database incorporates data from over 2400 organisms and includes over one million proteins representing significant taxonomic diversity spanning prokaryotes, eukaryotes and viruses. Nucleotide and protein sequences are explicitly linked, and the sequences are linked to other resources including the NCBI Map Viewer and Gene. Sequences are annotated to include coding regions, conserved domains, variation, references, names, database cross-references, and other features using a combined approach of collaboration and other input from the scientific community, automated annotation, propagation from GenBank and curation by NCBI staff. PMID:15608248
Intra- and inter-isolate variation of ribosomal and protein-coding genes in Pleurotus: implications for molecular identification and phylogeny on fungal groups.

PubMed

He, Xiao-Lan; Li, Qian; Peng, Wei-Hong; Zhou, Jie; Cao, Xue-Lian; Wang, Di; Huang, Zhong-Qian; Tan, Wei; Li, Yu; Gan, Bing-Cheng

2017-06-26

The internal transcribed spacer (ITS), RNA polymerase II second largest subunit (RPB2), and elongation factor 1-alpha (EF1α) are often used in fungal taxonomy and phylogenetic analysis. As we know, an ideal molecular marker used in molecular identification and phylogenetic studies is homogeneous within species, and interspecific variation exceeds intraspecific variation. However, during our process of performing ITS, RPB2, and EF1α sequencing on the Pleurotus spp., we found that intra-isolate sequence polymorphism might be present in these genes because direct sequencing of PCR products failed in some isolates. Therefore, we detected intra- and inter-isolate variation of the three genes in Pleurotus by polymerase chain reaction amplification and cloning in this study. Results showed that intra-isolate variation of ITS was not uncommon but the polymorphic level in each isolate was relatively low in Pleurotus; intra-isolate variations of EF1α and RPB2 sequences were present in an unexpectedly high amount. The polymorphism level differed significantly between ITS, RPB2, and EF1α in the same individual, and the intra-isolate heterogeneity level of each gene varied between isolates within the same species. Intra-isolate and intraspecific variation of ITS in the tested isolates was less than interspecific variation, and intra-isolate and intraspecific variation of RPB2 was probably equal with interspecific divergence. Meanwhile, intra-isolate and intraspecific variation of EF1α could exceed interspecific divergence. These findings suggested that RPB2 and EF1α are not desirable barcoding candidates for Pleurotus. We also discussed the reason why rDNA and protein-coding genes showed variants within a single isolate in Pleurotus, but must be addressed in further research. Our study demonstrated that intra-isolate variation of ribosomal and protein-coding genes are likely widespread in fungi. This has implications for studies on fungal evolution, taxonomy, phylogenetics, and population genetics. More extensive sampling of these genes and other candidates will be required to ensure reliability as phylogenetic markers and DNA barcodes.

UniDrug-target: a computational tool to identify unique drug targets in pathogenic bacteria.

PubMed

Chanumolu, Sree Krishna; Rout, Chittaranjan; Chauhan, Rajinder S

2012-01-01

Targeting conserved proteins of bacteria through antibacterial medications has resulted in both the development of resistant strains and changes to human health by destroying beneficial microbes which eventually become breeding grounds for the evolution of resistances. Despite the availability of more than 800 genomes sequences, 430 pathways, 4743 enzymes, 9257 metabolic reactions and protein (three-dimensional) 3D structures in bacteria, no pathogen-specific computational drug target identification tool has been developed. A web server, UniDrug-Target, which combines bacterial biological information and computational methods to stringently identify pathogen-specific proteins as drug targets, has been designed. Besides predicting pathogen-specific proteins essentiality, chokepoint property, etc., three new algorithms were developed and implemented by using protein sequences, domains, structures, and metabolic reactions for construction of partial metabolic networks (PMNs), determination of conservation in critical residues, and variation analysis of residues forming similar cavities in proteins sequences. First, PMNs are constructed to determine the extent of disturbances in metabolite production by targeting a protein as drug target. Conservation of pathogen-specific protein's critical residues involved in cavity formation and biological function determined at domain-level with low-matching sequences. Last, variation analysis of residues forming similar cavities in proteins sequences from pathogenic versus non-pathogenic bacteria and humans is performed. The server is capable of predicting drug targets for any sequenced pathogenic bacteria having fasta sequences and annotated information. The utility of UniDrug-Target server was demonstrated for Mycobacterium tuberculosis (H37Rv). The UniDrug-Target identified 265 mycobacteria pathogen-specific proteins, including 17 essential proteins which can be potential drug targets. UniDrug-Target is expected to accelerate pathogen-specific drug targets identification which will increase their success and durability as drugs developed against them have less chance to develop resistances and adverse impact on environment. The server is freely available at http://117.211.115.67/UDT/main.html. The standalone application (source codes) is available at http://www.bioinformatics.org/ftp/pub/bioinfojuit/UDT.rar.
Rebelling for a Reason: Protein Structural “Outliers”

PubMed Central

Arumugam, Gandhimathi; Nair, Anu G.; Hariharaputran, Sridhar; Ramanathan, Sowdhamini

2013-01-01

Analysis of structural variation in domain superfamilies can reveal constraints in protein evolution which aids protein structure prediction and classification. Structure-based sequence alignment of distantly related proteins, organized in PASS2 database, provides clues about structurally conserved regions among different functional families. Some superfamily members show large structural differences which are functionally relevant. This paper analyses the impact of structural divergence on function for multi-member superfamilies, selected from the PASS2 superfamily alignment database. Functional annotations within superfamilies, with structural outliers or ‘rebels’, are discussed in the context of structural variations. Overall, these data reinforce the idea that functional similarities cannot be extrapolated from mere structural conservation. The implication for fold-function prediction is that the functional annotations can only be inherited with very careful consideration, especially at low sequence identities. PMID:24073209
Sequence variation of koala retrovirus transmembrane protein p15E among koalas from different geographic regions.

PubMed

Ishida, Yasuko; McCallister, Chelsea; Nikolaidis, Nikolas; Tsangaras, Kyriakos; Helgen, Kristofer M; Greenwood, Alex D; Roca, Alfred L

2015-01-15

The koala retrovirus (KoRV), which is transitioning from an exogenous to an endogenous form, has been associated with high mortality in koalas. For other retroviruses, the envelope protein p15E has been considered a candidate for vaccine development. We therefore examined proviral sequence variation of KoRV p15E in a captive Queensland and three wild southern Australian koalas. We generated 163 sequences with intact open reading frames, which grouped into 39 distinct haplotypes. Sixteen distinct haplotypes comprising 139 of the sequences (85%) coded for the same polypeptide. Among the remaining 23 haplotypes, 22 were detected only once among the sequences, and each had 1 or 2 non-synonymous differences from the majority sequence. Several analyses suggested that p15E was under purifying selection. Important epitopes and domains were highly conserved across the p15E sequences and in previously reported exogenous KoRVs. Overall, these results support the potential use of p15E for KoRV vaccine development. Copyright © 2014 Elsevier Inc. All rights reserved.
Replica exchange molecular dynamics simulation of structure variation from α/4β-fold to 3α-fold protein.

PubMed

Lazim, Raudah; Mei, Ye; Zhang, Dawei

2012-03-01

Replica exchange molecular dynamics (REMD) simulation provides an efficient conformational sampling tool for the study of protein folding. In this study, we explore the mechanism directing the structure variation from α/4β-fold protein to 3α-fold protein after mutation by conducting REMD simulation on 42 replicas with temperatures ranging from 270 K to 710 K. The simulation began from a protein possessing the primary structure of GA88 but the tertiary structure of GB88, two G proteins with "high sequence identity." Albeit the large Cα-root mean square deviation (RMSD) of the folded protein (4.34 Å at 270 K and 4.75 Å at 304 K), a variation in tertiary structure was observed. Together with the analysis of secondary structure assignment, cluster analysis and principal component, it provides insights to the folding and unfolding pathway of 3α-fold protein and α/4β-fold protein respectively paving the way toward the understanding of the ongoings during conformational variation.
Non-codingRNA sequence variations in human chronic lymphocytic leukemia and colorectal cancer.

PubMed

Wojcik, Sylwia E; Rossi, Simona; Shimizu, Masayoshi; Nicoloso, Milena S; Cimmino, Amelia; Alder, Hansjuerg; Herlea, Vlad; Rassenti, Laura Z; Rai, Kanti R; Kipps, Thomas J; Keating, Michael J; Croce, Carlo M; Calin, George A

2010-02-01

Cancer is a genetic disease in which the interplay between alterations in protein-coding genes and non-coding RNAs (ncRNAs) plays a fundamental role. In recent years, the full coding component of the human genome was sequenced in various cancers, whereas such attempts related to ncRNAs are still fragmentary. We screened genomic DNAs for sequence variations in 148 microRNAs (miRNAs) and ultraconserved regions (UCRs) loci in patients with chronic lymphocytic leukemia (CLL) or colorectal cancer (CRC) by Sanger technique and further tried to elucidate the functional consequences of some of these variations. We found sequence variations in miRNAs in both sporadic and familial CLL cases, mutations of UCRs in CLLs and CRCs and, in certain instances, detected functional effects of these variations. Furthermore, by integrating our data with previously published data on miRNA sequence variations, we have created a catalog of DNA sequence variations in miRNAs/ultraconserved genes in human cancers. These findings argue that ncRNAs are targeted by both germ line and somatic mutations as well as by single-nucleotide polymorphisms with functional significance for human tumorigenesis. Sequence variations in ncRNA loci are frequent and some have functional and biological significance. Such information can be exploited to further investigate on a genome-wide scale the frequency of genetic variations in ncRNAs and their functional meaning, as well as for the development of new diagnostic and prognostic markers for leukemias and carcinomas.
Non-codingRNA sequence variations in human chronic lymphocytic leukemia and colorectal cancer

PubMed Central

Wojcik, Sylwia E.; Rossi, Simona; Shimizu, Masayoshi; Nicoloso, Milena S.; Cimmino, Amelia; Alder, Hansjuerg; Herlea, Vlad; Rassenti, Laura Z.; Rai, Kanti R.; Kipps, Thomas J.; Keating, Michael J.

2010-01-01

Cancer is a genetic disease in which the interplay between alterations in protein-coding genes and non-coding RNAs (ncRNAs) plays a fundamental role. In recent years, the full coding component of the human genome was sequenced in various cancers, whereas such attempts related to ncRNAs are still fragmentary. We screened genomic DNAs for sequence variations in 148 microRNAs (miRNAs) and ultraconserved regions (UCRs) loci in patients with chronic lymphocytic leukemia (CLL) or colorectal cancer (CRC) by Sanger technique and further tried to elucidate the functional consequences of some of these variations. We found sequence variations in miRNAs in both sporadic and familial CLL cases, mutations of UCRs in CLLs and CRCs and, in certain instances, detected functional effects of these variations. Furthermore, by integrating our data with previously published data on miRNA sequence variations, we have created a catalog of DNA sequence variations in miRNAs/ultraconserved genes in human cancers. These findings argue that ncRNAs are targeted by both germ line and somatic mutations as well as by single-nucleotide polymorphisms with functional significance for human tumorigenesis. Sequence variations in ncRNA loci are frequent and some have functional and biological significance. Such information can be exploited to further investigate on a genome-wide scale the frequency of genetic variations in ncRNAs and their functional meaning, as well as for the development of new diagnostic and prognostic markers for leukemias and carcinomas. PMID:19926640
Integrating mRNA and protein sequencing enables the detection and quantitative profiling of natural protein sequence variants of Populus trichocarpa

DOE Office of Scientific and Technical Information (OSTI.GOV)

Abraham, Paul E.; Wang, Xiaojing; Ranjan, Priya

The availability of next-generation sequencing technologies has rapidly transformed our ability to link genotypes to phenotypes, and as such, promises to facilitate the dissection of genetic contribution to complex traits. Although discoveries of genetic associations will further our understanding of biology, once candidate variants have been identified, investigators are faced with the challenge of characterizing the functional effects on proteins encoded by such genes. Here we show how next-generation RNA sequencing data can be exploited to construct genotype-specific protein sequence databases, which provide a clearer picture of the molecular toolbox underlying cellular and organismal processes and their variation in amore » natural population. For this study, we used two individual genotypes (DENA-17-3 and VNDL-27-4) from a recent genome wide association (GWA) study of Populus trichocarpa, an obligate outcrosser that exhibits tremendous phenotypic variation across the natural population. This strategy allowed us to comprehensively catalogue proteins containing single amino acid polymorphisms (SAAPs) and insertions and deletions (INDELS). Based on large-scale identification of SAAPs, we profiled the frequency of 128 types of naturally occurring amino acid substitutions, with a subset of SAAPs occurring in regions of the genome having strong polymorphism patterns consistent with recent positive and/or divergent selection. In addition, we were able to explore the diploid landscape of Populus at the proteome-level, allowing the characterization of heterozygous variants.« less
Integrating mRNA and protein sequencing enables the detection and quantitative profiling of natural protein sequence variants of Populus trichocarpa

DOE PAGES

Abraham, Paul E.; Wang, Xiaojing; Ranjan, Priya; ...

2015-10-20

The availability of next-generation sequencing technologies has rapidly transformed our ability to link genotypes to phenotypes, and as such, promises to facilitate the dissection of genetic contribution to complex traits. Although discoveries of genetic associations will further our understanding of biology, once candidate variants have been identified, investigators are faced with the challenge of characterizing the functional effects on proteins encoded by such genes. Here we show how next-generation RNA sequencing data can be exploited to construct genotype-specific protein sequence databases, which provide a clearer picture of the molecular toolbox underlying cellular and organismal processes and their variation in amore » natural population. For this study, we used two individual genotypes (DENA-17-3 and VNDL-27-4) from a recent genome wide association (GWA) study of Populus trichocarpa, an obligate outcrosser that exhibits tremendous phenotypic variation across the natural population. This strategy allowed us to comprehensively catalogue proteins containing single amino acid polymorphisms (SAAPs) and insertions and deletions (INDELS). Based on large-scale identification of SAAPs, we profiled the frequency of 128 types of naturally occurring amino acid substitutions, with a subset of SAAPs occurring in regions of the genome having strong polymorphism patterns consistent with recent positive and/or divergent selection. In addition, we were able to explore the diploid landscape of Populus at the proteome-level, allowing the characterization of heterozygous variants.« less
PknB remains an essential and a conserved target for drug development in susceptible and MDR strains of M. Tuberculosis.

PubMed

Gupta, Anamika; Pal, Sudhir K; Pandey, Divya; Fakir, Najneen A; Rathod, Sunita; Sinha, Dhiraj; SivaKumar, S; Sinha, Pallavi; Periera, Mycal; Balgam, Shilpa; Sekar, Gomathi; UmaDevi, K R; Anupurba, Shampa; Nema, Vijay

2017-08-18

The Mycobacterium tuberculosis (M.tb) protein kinase B (PknB) which is now proved to be essential for the growth and survival of M.tb, is a transmembrane protein with a potential to be a good drug target. However it is not known if this target remains conserved in otherwise resistant isolates from clinical origin. The present study describes the conservation analysis of sequences covering the inhibitor binding domain of PknB to assess if it remains conserved in susceptible and resistant clinical strains of mycobacteria picked from three different geographical areas of India. A total of 116 isolates from North, South and West India were used in the study with a variable profile of their susceptibilities towards streptomycin, isoniazid, rifampicin, ethambutol and ofloxacin. Isolates were also spoligotyped in order to find if the conservation pattern of pknB gene remain consistent or differ with different spoligotypes. The impact of variation as found in the study was analyzed using Molecular dynamics simulations. The sequencing results with 115/116 isolates revealed the conserved nature of pknB sequences irrespective of their susceptibility status and spoligotypes. The only variation found was in one strains wherein pnkB sequence had G to A mutation at 664 position translating into a change of amino acid, Valine to Isoleucine. After analyzing the impact of this sequence variation using Molecular dynamics simulations, it was observed that the variation is causing no significant change in protein structure or the inhibitor binding. Hence, the study endorses that PknB is an ideal target for drug development and there is no pre-existing or induced resistance with respect to the sequences involved in inhibitor binding. Also if the mutation that we are reporting for the first time is found again in subsequent work, it should be checked with phenotypic profile before drawing the conclusion that it would affect the activity in any way. Bioinformatics analysis in our study says that it has no significant effect on the binding and hence the activity of the protein.
Automated design evolution of stereochemically randomized protein foldamers

NASA Astrophysics Data System (ADS)

Ranbhor, Ranjit; Kumar, Anil; Patel, Kirti; Ramakrishnan, Vibin; Durani, Susheel

2018-05-01

Diversification of chain stereochemistry opens up the possibilities of an ‘in principle’ increase in the design space of proteins. This huge increase in the sequence and consequent structural variation is aimed at the generation of smart materials. To diversify protein structure stereochemically, we introduced L- and D-α-amino acids as the design alphabet. With a sequence design algorithm, we explored the usage of specific variables such as chirality and the sequence of this alphabet in independent steps. With molecular dynamics, we folded stereochemically diverse homopolypeptides and evaluated their ‘fitness’ for possible design as protein-like foldamers. We propose a fitness function to prune the most optimal fold among 1000 structures simulated with an automated repetitive simulated annealing molecular dynamics (AR-SAMD) approach. The highly scored poly-leucine fold with sequence lengths of 24 and 30 amino acids were later sequence-optimized using a Dead End Elimination cum Monte Carlo based optimization tool. This paper demonstrates a novel approach for the de novo design of protein-like foldamers.
Evaluation of targeted exome sequencing for 28 protein-based blood group systems, including the homologous gene systems, for blood group genotyping.

PubMed

Schoeman, Elizna M; Lopez, Genghis H; McGowan, Eunike C; Millard, Glenda M; O'Brien, Helen; Roulis, Eileen V; Liew, Yew-Wah; Martin, Jacqueline R; McGrath, Kelli A; Powley, Tanya; Flower, Robert L; Hyland, Catherine A

2017-04-01

Blood group single nucleotide polymorphism genotyping probes for a limited range of polymorphisms. This study investigated whether massively parallel sequencing (also known as next-generation sequencing), with a targeted exome strategy, provides an extended blood group genotype and the extent to which massively parallel sequencing correctly genotypes in homologous gene systems, such as RH and MNS. Donor samples (n = 28) that were extensively phenotyped and genotyped using single nucleotide polymorphism typing, were analyzed using the TruSight One Sequencing Panel and MiSeq platform. Genes for 28 protein-based blood group systems, GATA1, and KLF1 were analyzed. Copy number variation analysis was used to characterize complex structural variants in the GYPC and RH systems. The average sequencing depth per target region was 66.2 ± 39.8. Each sample harbored on average 43 ± 9 variants, of which 10 ± 3 were used for genotyping. For the 28 samples, massively parallel sequencing variant sequences correctly matched expected sequences based on single nucleotide polymorphism genotyping data. Copy number variation analysis defined the Rh C/c alleles and complex RHD hybrids. Hybrid RHD*D-CE-D variants were correctly identified, but copy number variation analysis did not confidently distinguish between D and CE exon deletion versus rearrangement. The targeted exome sequencing strategy employed extended the range of blood group genotypes detected compared with single nucleotide polymorphism typing. This single-test format included detection of complex MNS hybrid cases and, with copy number variation analysis, defined RH hybrid genes along with the RHCE*C allele hitherto difficult to resolve by variant detection. The approach is economical compared with whole-genome sequencing and is suitable for a red blood cell reference laboratory setting. © 2017 AABB.
Scop3D: three-dimensional visualization of sequence conservation.

PubMed

Vermeire, Tessa; Vermaere, Stijn; Schepens, Bert; Saelens, Xavier; Van Gucht, Steven; Martens, Lennart; Vandermarliere, Elien

2015-04-01

The integration of a protein's structure with its known sequence variation provides insight on how that protein evolves, for instance in terms of (changing) function or immunogenicity. Yet, collating the corresponding sequence variants into a multiple sequence alignment, calculating each position's conservation, and mapping this information back onto a relevant structure is not straightforward. We therefore built the Sequence Conservation on Protein 3D structure (scop3D) tool to perform these tasks automatically. The output consists of two modified PDB files in which the B-values for each position are replaced by the percentage sequence conservation, or the information entropy for each position, respectively. Furthermore, text files with absolute and relative amino acid occurrences for each position are also provided, along with snapshots of the protein from six distinct directions in space. The visualization provided by scop3D can for instance be used as an aid in vaccine development or to identify antigenic hotspots, which we here demonstrate based on an analysis of the fusion proteins of human respiratory syncytial virus and mumps virus. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Association of Amine-Receptor DNA Sequence Variants with Associative Learning in the Honeybee.

PubMed

Lagisz, Malgorzata; Mercer, Alison R; de Mouzon, Charlotte; Santos, Luana L S; Nakagawa, Shinichi

2016-03-01

Octopamine- and dopamine-based neuromodulatory systems play a critical role in learning and learning-related behaviour in insects. To further our understanding of these systems and resulting phenotypes, we quantified DNA sequence variations at six loci coding octopamine-and dopamine-receptors and their association with aversive and appetitive learning traits in a population of honeybees. We identified 79 polymorphic sequence markers (mostly SNPs and a few insertions/deletions) located within or close to six candidate genes. Intriguingly, we found that levels of sequence variation in the protein-coding regions studied were low, indicating that sequence variation in the coding regions of receptor genes critical to learning and memory is strongly selected against. Non-coding and upstream regions of the same genes, however, were less conserved and sequence variations in these regions were weakly associated with between-individual differences in learning-related traits. While these associations do not directly imply a specific molecular mechanism, they suggest that the cross-talk between dopamine and octopamine signalling pathways may influence olfactory learning and memory in the honeybee.
Terminal region sequence variations in variola virus DNA.

PubMed

Massung, R F; Loparev, V N; Knight, J C; Totmenin, A V; Chizhikov, V E; Parsons, J M; Safronov, P F; Gutorov, V V; Shchelkunov, S N; Esposito, J J

1996-07-15

Genome DNA terminal region sequences were determined for a Brazilian alastrim variola minor virus strain Garcia-1966 that was associated with an 0.8% case-fatality rate and African smallpox strains Congo-1970 and Somalia-1977 associated with variola major (9.6%) and minor (0.4%) mortality rates, respectively. A base sequence identity of > or = 98.8% was determined after aligning 30 kb of the left- or right-end region sequences with cognate sequences previously determined for Asian variola major strains India-1967 (31% death rate) and Bangladesh-1975 (18.5% death rate). The deduced amino acid sequences of putative proteins of > or = 65 amino acids also showed relatively high identity, although the Asian and African viruses were clearly more related to each other than to alastrim virus. Alastrim virus contained only 10 of 70 proteins that were 100% identical to homologs in Asian strains, and 7 alastrim-specific proteins were noted.
Evolution of proteins.

NASA Technical Reports Server (NTRS)

Dayhoff, M. O.

1971-01-01

The amino acid sequences of proteins from living organisms are dealt with. The structure of proteins is first discussed; the variation in this structure from one biological group to another is illustrated by the first halves of the sequences of cytochrome c, and a phylogenetic tree is derived from the cytochrome c data. The relative geological times associated with the events of this tree are discussed. Errors which occur in the duplication of cells during the evolutionary process are examined. Particular attention is given to evolution of mutant proteins, globins, ferredoxin, and transfer ribonucleic acids (tRNA's). Finally, a general outline of biological evolution is presented.
Multi-step formation, evolution, and functionalization of new cytoplasmic male sterility genes in the plant mitochondrial genomes

PubMed Central

Tang, Huiwu; Zheng, Xingmei; Li, Chuliang; Xie, Xianrong; Chen, Yuanling; Chen, Letian; Zhao, Xiucai; Zheng, Huiqi; Zhou, Jiajian; Ye, Shan; Guo, Jingxin; Liu, Yao-Guang

2017-01-01

New gene origination is a major source of genomic innovations that confer phenotypic changes and biological diversity. Generation of new mitochondrial genes in plants may cause cytoplasmic male sterility (CMS), which can promote outcrossing and increase fitness. However, how mitochondrial genes originate and evolve in structure and function remains unclear. The rice Wild Abortive type of CMS is conferred by the mitochondrial gene WA352c (previously named WA352) and has been widely exploited in hybrid rice breeding. Here, we reconstruct the evolutionary trajectory of WA352c by the identification and analyses of 11 mitochondrial genomic recombinant structures related to WA352c in wild and cultivated rice. We deduce that these structures arose through multiple rearrangements among conserved mitochondrial sequences in the mitochondrial genome of the wild rice Oryza rufipogon, coupled with substoichiometric shifting and sequence variation. We identify two expressed but nonfunctional protogenes among these structures, and show that they could evolve into functional CMS genes via sequence variations that could relieve the self-inhibitory potential of the proteins. These sequence changes would endow the proteins the ability to interact with the nucleus-encoded mitochondrial protein COX11, resulting in premature programmed cell death in the anther tapetum and male sterility. Furthermore, we show that the sequences that encode the COX11-interaction domains in these WA352c-related genes have experienced purifying selection during evolution. We propose a model for the formation and evolution of new CMS genes via a “multi-recombination/protogene formation/functionalization” mechanism involving gradual variations in the structure, sequence, copy number, and function. PMID:27725674
Integrative structural annotation of de novo RNA-Seq provides an accurate reference gene set of the enormous genome of the onion (Allium cepa L.)

PubMed Central

Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil

2015-01-01

The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp. PMID:25362073
Fine mapping and identification of a candidate gene for the barley Un8 true loose smut resistance gene.

PubMed

Zang, Wen; Eckstein, Peter E; Colin, Mark; Voth, Doug; Himmelbach, Axel; Beier, Sebastian; Stein, Nils; Scoles, Graham J; Beattie, Aaron D

2015-07-01

The candidate gene for the barley Un8 true loose smut resistance gene encodes a deduced protein containing two tandem protein kinase domains. In North America, durable resistance against all known isolates of barley true loose smut, caused by the basidiomycete pathogen Ustilago nuda (Jens.) Rostr. (U. nuda), is under the control of the Un8 resistance gene. Previous genetic studies mapped Un8 to the long arm of chromosome 5 (1HL). Here, a population of 4625 lines segregating for Un8 was used to delimit the Un8 gene to a 0.108 cM interval on chromosome arm 1HL, and assign it to fingerprinted contig 546 of the barley physical map. The minimal tilling path was identified for the Un8 locus using two flanking markers and consisted of two overlapping bacterial artificial chromosomes. One gene located close to a marker co-segregating with Un8 showed high sequence identity to a disease resistance gene containing two kinase domains. Sequence of the candidate gene from the parents of the segregating population, and in an additional 19 barley lines representing a broader spectrum of diversity, showed there was no intron in alleles present in either resistant or susceptible lines, and fifteen amino acid variations unique to the deduced protein sequence in resistant lines differentiated it from the deduced protein sequences in susceptible lines. Some of these variations were present within putative functional domains which may cause a loss of function in the deduced protein sequences within susceptible lines.
DNA-binding proteins from marine bacteria expand the known sequence diversity of TALE-like repeats

PubMed Central

de Lange, Orlando; Wolf, Christina; Thiel, Philipp; Krüger, Jens; Kleusch, Christian; Kohlbacher, Oliver; Lahaye, Thomas

2015-01-01

Transcription Activator-Like Effectors (TALEs) of Xanthomonas bacteria are programmable DNA binding proteins with unprecedented target specificity. Comparative studies into TALE repeat structure and function are hindered by the limited sequence variation among TALE repeats. More sequence-diverse TALE-like proteins are known from Ralstonia solanacearum (RipTALs) and Burkholderia rhizoxinica (Bats), but RipTAL and Bat repeats are conserved with those of TALEs around the DNA-binding residue. We study two novel marine-organism TALE-like proteins (MOrTL1 and MOrTL2), the first to date of non-terrestrial origin. We have assessed their DNA-binding properties and modelled repeat structures. We found that repeats from these proteins mediate sequence specific DNA binding conforming to the TALE code, despite low sequence similarity to TALE repeats, and with novel residues around the BSR. However, MOrTL1 repeats show greater sequence discriminating power than MOrTL2 repeats. Sequence alignments show that there are only three residues conserved between repeats of all TALE-like proteins including the two new additions. This conserved motif could prove useful as an identifier for future TALE-likes. Additionally, comparing MOrTL repeats with those of other TALE-likes suggests a common evolutionary origin for the TALEs, RipTALs and Bats. PMID:26481363
Indel PDB: a database of structural insertions and deletions derived from sequence alignments of closely related proteins.

PubMed

Hsing, Michael; Cherkasov, Artem

2008-06-25

Insertions and deletions (indels) represent a common type of sequence variations, which are less studied and pose many important biological questions. Recent research has shown that the presence of sizable indels in protein sequences may be indicative of protein essentiality and their role in protein interaction networks. Examples of utilization of indels for structure-based drug design have also been recently demonstrated. Nonetheless many structural and functional characteristics of indels remain less researched or unknown. We have created a web-based resource, Indel PDB, representing a structural database of insertions/deletions identified from the sequence alignments of highly similar proteins found in the Protein Data Bank (PDB). Indel PDB utilized large amounts of available structural information to characterize 1-, 2- and 3-dimensional features of indel sites. Indel PDB contains 117,266 non-redundant indel sites extracted from 11,294 indel-containing proteins. Unlike loop databases, Indel PDB features more indel sequences with secondary structures including alpha-helices and beta-sheets in addition to loops. The insertion fragments have been characterized by their sequences, lengths, locations, secondary structure composition, solvent accessibility, protein domain association and three dimensional structures. By utilizing the data available in Indel PDB, we have studied and presented here several sequence and structural features of indels. We anticipate that Indel PDB will not only enable future functional studies of indels, but will also assist protein modeling efforts and identification of indel-directed drug binding sites.

Evolutionary and biophysical relationships among the papillomavirus E2 proteins.

PubMed

Blakaj, Dukagjin M; Fernandez-Fuentes, Narcis; Chen, Zigui; Hegde, Rashmi; Fiser, Andras; Burk, Robert D; Brenowitz, Michael

2009-01-01

Infection by human papillomavirus (HPV) may result in clinical conditions ranging from benign warts to invasive cancer. The HPV E2 protein represses oncoprotein transcription and is required for viral replication. HPV E2 binds to palindromic DNA sequences of highly conserved four base pair sequences flanking an identical length variable 'spacer'. E2 proteins directly contact the conserved but not the spacer DNA. Variation in naturally occurring spacer sequences results in differential protein affinity that is dependent on their sensitivity to the spacer DNA's unique conformational and/or dynamic properties. This article explores the biophysical character of this core viral protein with the goal of identifying characteristics that associated with risk of virally caused malignancy. The amino acid sequence, 3d structure and electrostatic features of the E2 protein DNA binding domain are highly conserved; specific interactions with DNA binding sites have also been conserved. In contrast, the E2 protein's transactivation domain does not have extensive surfaces of highly conserved residues. Rather, regions of high conservation are localized to small surface patches. Implications to cancer biology are discussed.
The role of heterologous chloroplast sequence elements in transgene integration and expression.

PubMed

Ruhlman, Tracey; Verma, Dheeraj; Samson, Nalapalli; Daniell, Henry

2010-04-01

Heterologous regulatory elements and flanking sequences have been used in chloroplast transformation of several crop species, but their roles and mechanisms have not yet been investigated. Nucleotide sequence identity in the photosystem II protein D1 (psbA) upstream region is 59% across all taxa; similar variation was consistent across all genes and taxa examined. Secondary structure and predicted Gibbs free energy values of the psbA 5' untranslated region (UTR) among different families reflected this variation. Therefore, chloroplast transformation vectors were made for tobacco (Nicotiana tabacum) and lettuce (Lactuca sativa), with endogenous (Nt-Nt, Ls-Ls) or heterologous (Nt-Ls, Ls-Nt) psbA promoter, 5' UTR and 3' UTR, regulating expression of the anthrax protective antigen (PA) or human proinsulin (Pins) fused with the cholera toxin B-subunit (CTB). Unique lettuce flanking sequences were completely eliminated during homologous recombination in the transplastomic tobacco genomes but not unique tobacco sequences. Nt-Ls or Ls-Nt transplastomic lines showed reduction of 80% PA and 97% CTB-Pins expression when compared with endogenous psbA regulatory elements, which accumulated up to 29.6% total soluble protein PA and 72.0% total leaf protein CTB-Pins, 2-fold higher than Rubisco. Transgene transcripts were reduced by 84% in Ls-Nt-CTB-Pins and by 72% in Nt-Ls-PA lines. Transcripts containing endogenous 5' UTR were stabilized in nonpolysomal fractions. Stromal RNA-binding proteins were preferentially associated with endogenous psbA 5' UTR. A rapid and reproducible regeneration system was developed for lettuce commercial cultivars by optimizing plant growth regulators. These findings underscore the need for sequencing complete crop chloroplast genomes, utilization of endogenous regulatory elements and flanking sequences, as well as optimization of plant growth regulators for efficient chloroplast transformation.
The Role of Heterologous Chloroplast Sequence Elements in Transgene Integration and Expression1[W][OA

PubMed Central

Ruhlman, Tracey; Verma, Dheeraj; Samson, Nalapalli; Daniell, Henry

2010-01-01

Heterologous regulatory elements and flanking sequences have been used in chloroplast transformation of several crop species, but their roles and mechanisms have not yet been investigated. Nucleotide sequence identity in the photosystem II protein D1 (psbA) upstream region is 59% across all taxa; similar variation was consistent across all genes and taxa examined. Secondary structure and predicted Gibbs free energy values of the psbA 5′ untranslated region (UTR) among different families reflected this variation. Therefore, chloroplast transformation vectors were made for tobacco (Nicotiana tabacum) and lettuce (Lactuca sativa), with endogenous (Nt-Nt, Ls-Ls) or heterologous (Nt-Ls, Ls-Nt) psbA promoter, 5′ UTR and 3′ UTR, regulating expression of the anthrax protective antigen (PA) or human proinsulin (Pins) fused with the cholera toxin B-subunit (CTB). Unique lettuce flanking sequences were completely eliminated during homologous recombination in the transplastomic tobacco genomes but not unique tobacco sequences. Nt-Ls or Ls-Nt transplastomic lines showed reduction of 80% PA and 97% CTB-Pins expression when compared with endogenous psbA regulatory elements, which accumulated up to 29.6% total soluble protein PA and 72.0% total leaf protein CTB-Pins, 2-fold higher than Rubisco. Transgene transcripts were reduced by 84% in Ls-Nt-CTB-Pins and by 72% in Nt-Ls-PA lines. Transcripts containing endogenous 5′ UTR were stabilized in nonpolysomal fractions. Stromal RNA-binding proteins were preferentially associated with endogenous psbA 5′ UTR. A rapid and reproducible regeneration system was developed for lettuce commercial cultivars by optimizing plant growth regulators. These findings underscore the need for sequencing complete crop chloroplast genomes, utilization of endogenous regulatory elements and flanking sequences, as well as optimization of plant growth regulators for efficient chloroplast transformation. PMID:20130101
Protein 3D Structure Computed from Evolutionary Sequence Variation

PubMed Central

Sheridan, Robert; Hopf, Thomas A.; Pagnani, Andrea; Zecchina, Riccardo; Sander, Chris

2011-01-01

The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing. In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy. We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7–4.8 Å Cα-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes. PMID:22163331
A single determinant dominates the rate of yeast protein evolution.

PubMed

Drummond, D Allan; Raval, Alpan; Wilke, Claus O

2006-02-01

A gene's rate of sequence evolution is among the most fundamental evolutionary quantities in common use, but what determines evolutionary rates has remained unclear. Here, we carry out the first combined analysis of seven predictors (gene expression level, dispensability, protein abundance, codon adaptation index, gene length, number of protein-protein interactions, and the gene's centrality in the interaction network) previously reported to have independent influences on protein evolutionary rates. Strikingly, our analysis reveals a single dominant variable linked to the number of translation events which explains 40-fold more variation in evolutionary rate than any other, suggesting that protein evolutionary rate has a single major determinant among the seven predictors. The dominant variable explains nearly half the variation in the rate of synonymous and protein evolution. We show that the two most commonly used methods to disentangle the determinants of evolutionary rate, partial correlation analysis and ordinary multivariate regression, produce misleading or spurious results when applied to noisy biological data. We overcome these difficulties by employing principal component regression, a multivariate regression of evolutionary rate against the principal components of the predictor variables. Our results support the hypothesis that translational selection governs the rate of synonymous and protein sequence evolution in yeast.
Genome-wide association analysis of milk yield traits in Nordic Red Cattle using imputed whole genome sequence variants.

PubMed

Iso-Touru, T; Sahana, G; Guldbrandtsen, B; Lund, M S; Vilkki, J

2016-03-22

The Nordic Red Cattle consisting of three different populations from Finland, Sweden and Denmark are under a joint breeding value estimation system. The long history of recording of production and health traits offers a great opportunity to study production traits and identify causal variants behind them. In this study, we used whole genome sequence level data from 4280 progeny tested Nordic Red Cattle bulls to scan the genome for loci affecting milk, fat and protein yields. Using a genome-wise significance threshold, regions on Bos taurus chromosomes 5, 14, 23, 25 and 26 were associated with fat yield. Regions on chromosomes 5, 14, 16, 19, 20 and 25 were associated with milk yield and chromosomes 5, 14 and 25 had regions associated with protein yield. Significantly associated variations were found in 227 genes for fat yield, 72 genes for milk yield and 30 genes for protein yield. Ingenuity Pathway Analysis was used to identify networks connecting these genes displaying significant hits. When compared to previously mapped genomic regions associated with fertility, significantly associated variations were found in 5 genes common for fat yield and fertility, thus linking these two traits via biological networks. This is the first time when whole genome sequence data is utilized to study genomic regions affecting milk production in the Nordic Red Cattle population. Sequence level data offers the possibility to study quantitative traits in detail but still cannot unambiguously reveal which of the associated variations is causative. Linkage disequilibrium creates difficulties to pinpoint the causative genes and variations. One solution to overcome these difficulties is the identification of the functional gene networks and pathways to reveal important interacting genes as candidates for the observed effects. This information on target genomic regions may be exploited to improve genomic prediction.
Toward rules relating zinc finger protein sequences and DNA binding site preferences.

PubMed

Desjarlais, J R; Berg, J M

1992-08-15

Zinc finger proteins of the Cys2-His2 type consist of tandem arrays of domains, where each domain appears to contact three adjacent base pairs of DNA through three key residues. We have designed and prepared a series of variants of the central zinc finger within the DNA binding domain of Sp1 by using information from an analysis of a large data base of zinc finger protein sequences. Through systematic variations at two of the three contact positions (underlined), relatively specific recognition of sequences of the form 5'-GGGGN(G or T)GGG-3' has been achieved. These results provide the basis for rules that may develop into a code that will allow the design of zinc finger proteins with preselected DNA site specificity.
Robustness of Reconstructed Ancestral Protein Functions to Statistical Uncertainty.

PubMed

Eick, Geeta N; Bridgham, Jamie T; Anderson, Douglas P; Harms, Michael J; Thornton, Joseph W

2017-02-01

Hypotheses about the functions of ancient proteins and the effects of historical mutations on them are often tested using ancestral protein reconstruction (APR)-phylogenetic inference of ancestral sequences followed by synthesis and experimental characterization. Usually, some sequence sites are ambiguously reconstructed, with two or more statistically plausible states. The extent to which the inferred functions and mutational effects are robust to uncertainty about the ancestral sequence has not been studied systematically. To address this issue, we reconstructed ancestral proteins in three domain families that have different functions, architectures, and degrees of uncertainty; we then experimentally characterized the functional robustness of these proteins when uncertainty was incorporated using several approaches, including sampling amino acid states from the posterior distribution at each site and incorporating the alternative amino acid state at every ambiguous site in the sequence into a single "worst plausible case" protein. In every case, qualitative conclusions about the ancestral proteins' functions and the effects of key historical mutations were robust to sequence uncertainty, with similar functions observed even when scores of alternate amino acids were incorporated. There was some variation in quantitative descriptors of function among plausible sequences, suggesting that experimentally characterizing robustness is particularly important when quantitative estimates of ancient biochemical parameters are desired. The worst plausible case method appears to provide an efficient strategy for characterizing the functional robustness of ancestral proteins to large amounts of sequence uncertainty. Sampling from the posterior distribution sometimes produced artifactually nonfunctional proteins for sequences reconstructed with substantial ambiguity. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Coiled-coil length: Size does matter.

PubMed

Surkont, Jaroslaw; Diekmann, Yoan; Ryder, Pearl V; Pereira-Leal, Jose B

2015-12-01

Protein evolution is governed by processes that alter primary sequence but also the length of proteins. Protein length may change in different ways, but insertions, deletions and duplications are the most common. An optimal protein size is a trade-off between sequence extension, which may change protein stability or lead to acquisition of a new function, and shrinkage that decreases metabolic cost of protein synthesis. Despite the general tendency for length conservation across orthologous proteins, the propensity to accept insertions and deletions is heterogeneous along the sequence. For example, protein regions rich in repetitive peptide motifs are well known to extensively vary their length across species. Here, we analyze length conservation of coiled-coils, domains formed by an ubiquitous, repetitive peptide motif present in all domains of life, that frequently plays a structural role in the cell. We observed that, despite the repetitive nature, the length of coiled-coil domains is generally highly conserved throughout the tree of life, even when the remaining parts of the protein change, including globular domains. Length conservation is independent of primary amino acid sequence variation, and represents a conservation of domain physical size. This suggests that the conservation of domain size is due to functional constraints. © 2015 Wiley Periodicals, Inc.
Integrative analysis of RNA, translation, and protein levels reveals distinct regulatory variation across humans

PubMed Central

Cenik, Can; Cenik, Elif Sarinay; Byeon, Gun W.; Grubert, Fabian; Candille, Sophie I.; Spacek, Damek; Alsallakh, Bilal; Tilgner, Hagen; Araya, Carlos L.; Tang, Hua; Ricci, Emiliano; Snyder, Michael P.

2015-01-01

Elucidating the consequences of genetic differences between humans is essential for understanding phenotypic diversity and personalized medicine. Although variation in RNA levels, transcription factor binding, and chromatin have been explored, little is known about global variation in translation and its genetic determinants. We used ribosome profiling, RNA sequencing, and mass spectrometry to perform an integrated analysis in lymphoblastoid cell lines from a diverse group of individuals. We find significant differences in RNA, translation, and protein levels suggesting diverse mechanisms of personalized gene expression control. Combined analysis of RNA expression and ribosome occupancy improves the identification of individual protein level differences. Finally, we identify genetic differences that specifically modulate ribosome occupancy—many of these differences lie close to start codons and upstream ORFs. Our results reveal a new level of gene expression variation among humans and indicate that genetic variants can cause changes in protein levels through effects on translation. PMID:26297486
Diversity of Pneumolysin and Pneumococcal Histidine Triad Protein D of Streptococcus pneumoniae Isolated from Invasive Diseases in Korean Children.

PubMed

Yun, Ki Wook; Lee, Hyunju; Choi, Eun Hwa; Lee, Hoan Jong

2015-01-01

Pneumolysin (Ply) and pneumococcal histidine triad protein D (PhtD) are candidate proteins for a next-generation pneumococcal vaccine. We aimed to analyze the genetic diversity and antigenic heterogeneity of Ply and PhtD for 173 pneumococci isolated from invasive diseases in Korean children. Allele was designated based on the variation of amino acid sequence. Antigenicity was predicted by the amino acid hydrophobicity of the region. There were seven and 39 allele types for the ply and phtD genes, respectively. The nucleotide sequence identity was 97.2%-99.9% for ply and 91.4%-98.0% for phtD gene. Only minor variations in hydrophobicity were noted among the antigenicity plots of Ply and PhtD. Overall, the allele types of the ply and phtD genes were remarkably homogeneous, and the antigenic diversity of the corresponding proteins was very limited. The Ply and PhtD could be useful antigens for universal pneumococcal vaccines.
Genetic diversity and antigenicity variation of Babesia bovis merozoite surface antigen-1 (MSA-1) in Thailand.

PubMed

Tattiyapong, Muncharee; Sivakumar, Thillaiampalam; Takemae, Hitoshi; Simking, Pacharathon; Jittapalapong, Sathaporn; Igarashi, Ikuo; Yokoyama, Naoaki

2016-07-01

Babesia bovis, an intraerythrocytic protozoan parasite, causes severe clinical disease in cattle worldwide. The genetic diversity of parasite antigens often results in different immune profiles in infected animals, hindering efforts to develop immune control methodologies against the B. bovis infection. In this study, we analyzed the genetic diversity of the merozoite surface antigen-1 (msa-1) gene using 162 B. bovis-positive blood DNA samples sourced from cattle populations reared in different geographical regions of Thailand. The identity scores shared among 93 msa-1 gene sequences isolated by PCR amplification were 43.5-100%, and the similarity values among the translated amino acid sequences were 42.8-100%. Of 23 total clades detected in our phylogenetic analysis, Thai msa-1 gene sequences occurred in 18 clades; seven among them were composed of sequences exclusively from Thailand. To investigate differential antigenicity of isolated MSA-1 proteins, we expressed and purified eight recombinant MSA-1 (rMSA-1) proteins, including an rMSA-1 from B. bovis Texas (T2Bo) strain and seven rMSA-1 proteins based on the Thai msa-1 sequences. When these antigens were analyzed in a western blot assay, anti-T2Bo cattle serum strongly reacted with the rMSA-1 from T2Bo, as well as with three other rMSA-1 proteins that shared 54.9-68.4% sequence similarity with T2Bo MSA-1. In contrast, no or weak reactivity was observed for the remaining rMSA-1 proteins, which shared low sequence similarity (35.0-39.7%) with T2Bo MSA-1. While demonstrating the high genetic diversity of the B. bovis msa-1 gene in Thailand, the present findings suggest that the genetic diversity results in antigenicity variations among the MSA-1 antigens of B. bovis in Thailand. Copyright © 2016 Elsevier B.V. All rights reserved.
[Sequencing and analysis of complete genome of rabies viruses isolated from Chinese Ferret-Badger and dog in Zhejiang province].

PubMed

Lei, Yong-Liang; Wang, Xiao-Guang; Tao, Xiao-Yan; Li, Hao; Meng, Sheng-Li; Chen, Xiu-Ying; Liu, Fu-Ming; Ye, Bi-Feng; Tang, Qing

2010-01-01

Based on sequencing the full-length genomes of four Chinese Ferret-Badger and dog, we analyze the properties of rabies viruses genetic variation in molecular level, get the information about rabies viruses prevalence and variation in Zhejiang, and enrich the genome database of rabies viruses street strains isolated from China. Rabies viruses in suckling mice were isolated, overlapped fragments were amplified by RT-PCR and full-length genomes were assembled to analyze the nucleotide and deduced protein similarities and phylogenetic analyses from Chinese Ferret-Badger, dog, sika deer, vole, used vaccine strain were determined. The four full-length genomes were sequenced completely and had the same genetic structure with the length of 11, 923 nts or 11, 925 nts including 58 nts-Leader, 1353 nts-NP, 894 nts-PP, 609 nts-MP, 1575 nts-GP, 6386 nts-LP, and 2, 5, 5 nts- intergenic regions(IGRs), 423 nts-Pseudogene-like sequence (psi), 70 nts-Trailer. The four full-length genomes were in accordance with the properties of Rhabdoviridae Lyssa virus by BLAST and multi-sequence alignment. The nucleotide and amino acid sequences among Chinese strains had the highest similarity, especially among animals of the same species. Of the four full-length genomes, the similarity in amino acid level was dramatically higher than that in nucleotide level, so the nucleotide mutations happened in these four genomes were most synonymous mutations. Compared with the reference rabies viruses, the lengths of the five protein coding regions had no change, no recombination, only with a few point mutations. It was evident that the five proteins appeared to be stable. The variation sites and types of the four genomes were similar to the reference vaccine or street strains. And the four strains were genotype 1 according to the multi-sequence and phylogenetic analyses, which possessed the distinct district characteristics of China. Therefore, these four rabies viruses are likely to be street viruses already existing in the natural world.
Hubby and Lewontin on Protein Variation in Natural Populations: When Molecular Genetics Came to the Rescue of Population Genetics.

PubMed

Charlesworth, Brian; Charlesworth, Deborah; Coyne, Jerry A; Langley, Charles H

2016-08-01

The 1966 GENETICS papers by John Hubby and Richard Lewontin were a landmark in the study of genome-wide levels of variability. They used the technique of gel electrophoresis of enzymes and proteins to study variation in natural populations of Drosophila pseudoobscura, at a set of loci that had been chosen purely for technical convenience, without prior knowledge of their levels of variability. Together with the independent study of human populations by Harry Harris, this seminal study provided the first relatively unbiased picture of the extent of genetic variability in protein sequences within populations, revealing that many genes had surprisingly high levels of diversity. These papers stimulated a large research program that found similarly high electrophoretic variability in many different species and led to statistical tools for interpreting the data in terms of population genetics processes such as genetic drift, balancing and purifying selection, and the effects of selection on linked variants. The current use of whole-genome sequences in studies of variation is the direct descendant of this pioneering work. Copyright © 2016 by the Genetics Society of America.
Cloning, characterization, expression and comparative analysis of pig Golgi membrane sphingomyelin synthase 1.

PubMed

Guillén, Natalia; Navarro, María A; Surra, Joaquín C; Arnal, Carmen; Fernández-Juan, Marta; Cebrián-Pérez, Jose Alvaro; Osada, Jesús

2007-02-15

Pig sphingomyelin synthase 1 (SMS1) cDNA was cloned, characterized and compared to the human ortholog. Porcine protein consists of 413 amino acids and displays a 97% sequence identity with human protein. A phylogenic tree of proteins reveals that porcine SMS1 is more closely related to bovine and rodent proteins than to human. Analysis of protein mass was higher than the theoretical prediction based on amino acid sequence suggesting a kind of posttranslational modification. Quantitative representation of tissue distribution obtained by real-time RT-PCR showed that it was widely expressed although important variations in levels were obtained among organs. Thus, the cardiovascular system, especially the heart, showed the highest value of all the tissues studied. Regional differences of expression were observed in the central nervous system and intestinal tract. Analysis of the hepatic mRNA and protein expressions of SMS1 following turpentine treatment revealed a progressive decrease in the former paralleled by a decrease in the protein concentration. These findings indicate the variation in expression in the different tissues might suggest a different requirement of Golgi sphingomyelin for the specific function in each organ and a regulation of the enzyme in response to turpentine-induced hepatic injury.
Distribution, functional impact, and origin mechanisms of copy number variation in the barley genome

PubMed Central

2013-01-01

Background There is growing evidence for the prevalence of copy number variation (CNV) and its role in phenotypic variation in many eukaryotic species. Here we use array comparative genomic hybridization to explore the extent of this type of structural variation in domesticated barley cultivars and wild barleys. Results A collection of 14 barley genotypes including eight cultivars and six wild barleys were used for comparative genomic hybridization. CNV affects 14.9% of all the sequences that were assessed. Higher levels of CNV diversity are present in the wild accessions relative to cultivated barley. CNVs are enriched near the ends of all chromosomes except 4H, which exhibits the lowest frequency of CNVs. CNV affects 9.5% of the coding sequences represented on the array and the genes affected by CNV are enriched for sequences annotated as disease-resistance proteins and protein kinases. Sequence-based comparisons of CNV between cultivars Barke and Morex provided evidence that DNA repair mechanisms of double-strand breaks via single-stranded annealing and synthesis-dependent strand annealing play an important role in the origin of CNV in barley. Conclusions We present the first catalog of CNVs in a diploid Triticeae species, which opens the door for future genome diversity research in a tribe that comprises the economically important cereal species wheat, barley, and rye. Our findings constitute a valuable resource for the identification of CNV affecting genes of agronomic importance. We also identify potential mechanisms that can generate variation in copy number in plant genomes. PMID:23758725
Detection of a single nucleotide polymorphism in the human alpha-lactalbumin gene: implications for human milk proteins.

PubMed

Chowanadisai, Winyoo; Kelleher, Shannon L; Nemeth, Jennifer F; Yachetti, Stephen; Kuhlman, Charles F; Jackson, Joan G; Davis, Anne M; Lien, Eric L; Lönnerdal, Bo

2005-05-01

Variability in the protein composition of breast milk has been observed in many women and is believed to be due to natural variation of the human population. Single nucleotide polymorphisms (SNPs) are present throughout the entire human genome, but the impact of this variation on human milk composition and biological activity and infant nutrition and health is unclear. The goals of this study were to characterize a variant of human alpha-lactalbumin observed in milk from a Filipino population by determining the location of the polymorphism in the amino acid and genomic sequences of alpha-lactalbumin. Milk and blood samples were collected from 20 Filipino women, and milk samples were collected from an additional 450 women from nine different countries. alpha-Lactalbumin concentration was measured by high-performance liquid chromatography (HPLC), and milk samples containing the variant form of the protein were identified with both HPLC and mass spectrometry (MS). The molecular weight of the variant form was measured by MS, and the location of the polymorphism was narrowed down by protein reduction, alkylation and trypsin digestion. Genomic DNA was isolated from whole blood, and the polymorphism location and subject genotype were determined by amplifying the entire coding sequence of human alpha-lactalbumin by PCR, followed by DNA sequencing. A variant form of alpha-lactalbumin was observed in HPLC chromatograms, and the difference in molecular weight was determined by MS (wild type=14,070 Da, variant=14,056 Da). Protein reduction and digestion narrowed the polymorphism between the 33rd and 77th amino acid of the protein. The genetic polymorphism was identified as adenine to guanine, which translates to a substitution from isoleucine to valine at amino acid 46. The frequency of variation was higher in milk from China, Japan and Philippines, which suggests that this polymorphism is most prevalent in Asia. There are SNPs in the genome for human milk proteins and their implications for protein bioactivity and infant nutrition need to be considered.
Role of DNA conformation & energetic insights in Msx-1-DNA recognition as revealed by molecular dynamics studies on specific and nonspecific complexes.

PubMed

Kachhap, Sangita; Singh, Balvinder

2015-01-01

In most of homeodomain-DNA complexes, glutamine or lysine is present at 50th position and interacts with 5th and 6th nucleotide of core recognition region. Molecular dynamics simulations of Msx-1-DNA complex (Q50-TG) and its variant complexes, that is specific (Q50K-CC), nonspecific (Q50-CC) having mutation in DNA and (Q50K-TG) in protein, have been carried out. Analysis of protein-DNA interactions and structure of DNA in specific and nonspecific complexes show that amino acid residues use sequence-dependent shape of DNA to interact. The binding free energies of all four complexes were analysed to define role of amino acid residue at 50th position in terms of binding strength considering the variation in DNA on stability of protein-DNA complexes. The order of stability of protein-DNA complexes shows that specific complexes are more stable than nonspecific ones. Decomposition analysis shows that N-terminal amino acid residues have been found to contribute maximally in binding free energy of protein-DNA complexes. Among specific protein-DNA complexes, K50 contributes more as compared to Q50 towards binding free energy in respective complexes. The sequence dependence of local conformation of DNA enables Q50/Q50K to make hydrogen bond with nucleotide(s) of DNA. The changes in amino acid sequence of protein are accommodated and stabilized around TAAT core region of DNA having variation in nucleotides.
Integrative structural annotation of de novo RNA-Seq provides an accurate reference gene set of the enormous genome of the onion (Allium cepa L.).

PubMed

Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil

2015-02-01

The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Ligand-mediated protein degradation reveals functional conservation among sequence variants of the CUL4-type E3 ligase substrate receptor cereblon.

PubMed

Akuffo, Afua A; Alontaga, Aileen Y; Metcalf, Rainer; Beatty, Matthew S; Becker, Andreas; McDaniel, Jessica M; Hesterberg, Rebecca S; Goodheart, William E; Gunawan, Steven; Ayaz, Muhammad; Yang, Yan; Karim, Md Rezaul; Orobello, Morgan E; Daniel, Kenyon; Guida, Wayne; Yoder, Jeffrey A; Rajadhyaksha, Anjali M; Schönbrunn, Ernst; Lawrence, Harshani R; Lawrence, Nicholas J; Epling-Burnette, Pearlie K

2018-04-20

Upon binding to thalidomide and other immunomodulatory drugs, the E3 ligase substrate receptor cereblon (CRBN) promotes proteosomal destruction by engaging the DDB1-CUL4A-Roc1-RBX1 E3 ubiquitin ligase in human cells but not in mouse cells, suggesting that sequence variations in CRBN may cause its inactivation. Therapeutically, CRBN engagers have the potential for broad applications in cancer and immune therapy by specifically reducing protein expression through targeted ubiquitin-mediated degradation. To examine the effects of defined sequence changes on CRBN's activity, we performed a comprehensive study using complementary theoretical, biophysical, and biological assays aimed at understanding CRBN's nonprimate sequence variations. With a series of recombinant thalidomide-binding domain (TBD) proteins, we show that CRBN sequence variants retain their drug-binding properties to both classical immunomodulatory drugs and dBET1, a chemical compound and targeting ligand designed to degrade bromodomain-containing 4 (BRD4) via a CRBN-dependent mechanism. We further show that dBET1 stimulates CRBN's E3 ubiquitin-conjugating function and degrades BRD4 in both mouse and human cells. This insight paves the way for studies of CRBN-dependent proteasome-targeting molecules in nonprimate models and provides a new understanding of CRBN's substrate-recruiting function. © 2018 by The American Society for Biochemistry and Molecular Biology, Inc.

DNA-binding proteins from marine bacteria expand the known sequence diversity of TALE-like repeats.

PubMed

de Lange, Orlando; Wolf, Christina; Thiel, Philipp; Krüger, Jens; Kleusch, Christian; Kohlbacher, Oliver; Lahaye, Thomas

2015-11-16

Transcription Activator-Like Effectors (TALEs) of Xanthomonas bacteria are programmable DNA binding proteins with unprecedented target specificity. Comparative studies into TALE repeat structure and function are hindered by the limited sequence variation among TALE repeats. More sequence-diverse TALE-like proteins are known from Ralstonia solanacearum (RipTALs) and Burkholderia rhizoxinica (Bats), but RipTAL and Bat repeats are conserved with those of TALEs around the DNA-binding residue. We study two novel marine-organism TALE-like proteins (MOrTL1 and MOrTL2), the first to date of non-terrestrial origin. We have assessed their DNA-binding properties and modelled repeat structures. We found that repeats from these proteins mediate sequence specific DNA binding conforming to the TALE code, despite low sequence similarity to TALE repeats, and with novel residues around the BSR. However, MOrTL1 repeats show greater sequence discriminating power than MOrTL2 repeats. Sequence alignments show that there are only three residues conserved between repeats of all TALE-like proteins including the two new additions. This conserved motif could prove useful as an identifier for future TALE-likes. Additionally, comparing MOrTL repeats with those of other TALE-likes suggests a common evolutionary origin for the TALEs, RipTALs and Bats. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Sequencing of mitochondrial genomes of nine Aspergillus and Penicillium species identifies mobile introns and accessory genes as main sources of genome size variability.

PubMed

Joardar, Vinita; Abrams, Natalie F; Hostetler, Jessica; Paukstelis, Paul J; Pakala, Suchitra; Pakala, Suman B; Zafar, Nikhat; Abolude, Olukemi O; Payne, Gary; Andrianopoulos, Alex; Denning, David W; Nierman, William C

2012-12-12

The genera Aspergillus and Penicillium include some of the most beneficial as well as the most harmful fungal species such as the penicillin-producer Penicillium chrysogenum and the human pathogen Aspergillus fumigatus, respectively. Their mitochondrial genomic sequences may hold vital clues into the mechanisms of their evolution, population genetics, and biology, yet only a handful of these genomes have been fully sequenced and annotated. Here we report the complete sequence and annotation of the mitochondrial genomes of six Aspergillus and three Penicillium species: A. fumigatus, A. clavatus, A. oryzae, A. flavus, Neosartorya fischeri (A. fischerianus), A. terreus, P. chrysogenum, P. marneffei, and Talaromyces stipitatus (P. stipitatum). The accompanying comparative analysis of these and related publicly available mitochondrial genomes reveals wide variation in size (25-36 Kb) among these closely related fungi. The sources of genome expansion include group I introns and accessory genes encoding putative homing endonucleases, DNA and RNA polymerases (presumed to be of plasmid origin) and hypothetical proteins. The two smallest sequenced genomes (A. terreus and P. chrysogenum) do not contain introns in protein-coding genes, whereas the largest genome (T. stipitatus), contains a total of eleven introns. All of the sequenced genomes have a group I intron in the large ribosomal subunit RNA gene, suggesting that this intron is fixed in these species. Subsequent analysis of several A. fumigatus strains showed low intraspecies variation. This study also includes a phylogenetic analysis based on 14 concatenated core mitochondrial proteins. The phylogenetic tree has a different topology from published multilocus trees, highlighting the challenges still facing the Aspergillus systematics. The study expands the genomic resources available to fungal biologists by providing mitochondrial genomes with consistent annotations for future genetic, evolutionary and population studies. Despite the conservation of the core genes, the mitochondrial genomes of Aspergillus and Penicillium species examined here exhibit significant amount of interspecies variation. Most of this variation can be attributed to accessory genes and mobile introns, presumably acquired by horizontal gene transfer of mitochondrial plasmids and intron homing.
COMPUTER SIMULATION STUDY OF AMYLOID FIBRIL FORMATION BY PALINDROMIC SEQUENCES IN PRION PEPTIDES

PubMed Central

Wagoner, Victoria; Cheon, Mookyung; Chang, Iksoo; Hall, Carol

2011-01-01

We simulate the aggregation of large systems containing palindromic peptides from the Syrian hamster prion protein SHaPrP 113–120 (AGAAAAGA) and the mouse prion protein MoPrP 111–120 (VAGAAAAGAV) and eight sequence variations: GAAAAAAG, (AG)4, A8, GAAAGAAA, A10, V10, GAVAAAAVAG, and VAVAAAAVAV The first two peptides are thought to act as the Velcro that holds the parent prion proteins together in amyloid structures and can form fibrils themselves. Kinetic events along the fibrillization pathway influence the types of structures that occur and variations in the sequence affect aggregation kinetics and fibrillar structure. Discontinuous molecular dynamics simulations using the PRIME20 force field are performed on systems containing 48 peptides starting from a random coil configuration. Depending on the sequence, fibrillar structures form spontaneously over a range of temperatures, below which amorphous aggregates form and above which no aggregation occurs. AGAAAAGA forms well organized fibrillar structures whereas VAGAAAAGAV forms less well organized structures that are partially fibrillar and partially amorphous. The degree of order in the fibrillar structure stems in part from the types of kinetic events leading up to its formation, with AGAAAAGA forming less amorphous structures early in the simulation than VAGAAAAGAV. The ability to form fibrils increases as the chain length and the length of the stretch of hydrophobic residues increase. However as the hydrophobicity of the sequence increases, the ability to form well-ordered structures decreases. Thus, longer hydrophobic sequences form slightly disordered aggregates that are partially fibrillar and partially amorphous. Subtle changes in sequence result in slightly different fibril structures. PMID:21557317
Identification of related gene/protein names based on an HMM of name variations.

PubMed

Yeganova, L; Smith, L; Wilbur, W J

2004-04-01

Gene and protein names follow few, if any, true naming conventions and are subject to great variation in different occurrences of the same name. This gives rise to two important problems in natural language processing. First, can one locate the names of genes or proteins in free text, and second, can one determine when two names denote the same gene or protein? The first of these problems is a special case of the problem of named entity recognition, while the second is a special case of the problem of automatic term recognition (ATR). We study the second problem, that of gene or protein name variation. Here we describe a system which, given a query gene or protein name, identifies related gene or protein names in a large list. The system is based on a dynamic programming algorithm for sequence alignment in which the mutation matrix is allowed to vary under the control of a fully trainable hidden Markov model.
Identification of transcript polymorphisms for seed quality improvement by exploring soybean genetic diversity

USDA-ARS?s Scientific Manuscript database

The difference in seed oil composition and content among soybean genotypes could be mostly attributed to transcript sequence and/or expression variations of oil-related genes that that lead to changes in the functions of the proteins that they encode and/or their accumulation in seeds. We sequenced ...
Overview of the creative genome: effects of genome structure and sequence on the generation of variation and evolution.

PubMed

Caporale, Lynn Helena

2012-09-01

This overview of a special issue of Annals of the New York Academy of Sciences discusses uneven distribution of distinct types of variation across the genome, the dependence of specific types of variation upon distinct classes of DNA sequences and/or the induction of specific proteins, the circumstances in which distinct variation-generating systems are activated, and the implications of this work for our understanding of evolution and of cancer. Also discussed is the value of non text-based computational methods for analyzing information carried by DNA, early insights into organizational frameworks that affect genome behavior, and implications of this work for comparative genomics. © 2012 New York Academy of Sciences.
Expression Differentiation Is Constrained to Low-Expression Proteins over Ecological Timescales

PubMed Central

Margres, Mark J.; Wray, Kenneth P.; Seavy, Margaret; McGivern, James J.; Herrera, Nathanael D.; Rokyta, Darin R.

2016-01-01

Protein expression level is one of the strongest predictors of protein sequence evolutionary rate, with high-expression protein sequences evolving at slower rates than low-expression protein sequences largely because of constraints on protein folding and function. Expression evolutionary rates also have been shown to be negatively correlated with expression level across human and mouse orthologs over relatively long divergence times (i.e., ∼100 million years). Long-term evolutionary patterns, however, often cannot be extrapolated to microevolutionary processes (and vice versa), and whether this relationship holds for traits evolving under directional selection within a single species over ecological timescales (i.e., <5000 years) is unknown and not necessarily expected. Expression is a metabolically costly process, and the expression level of a particular protein is predicted to be a tradeoff between the benefit of its function and the costs of its expression. Selection should drive the expression level of all proteins close to values that maximize fitness, particularly for high-expression proteins because of the increased energetic cost of production. Therefore, stabilizing selection may reduce the amount of standing expression variation for high-expression proteins, and in combination with physiological constraints that may place an upper bound on the range of beneficial expression variation, these constraints could severely limit the availability of beneficial expression variants. To determine whether rapid-expression evolution was restricted to low-expression proteins owing to these constraints on highly expressed proteins over ecological timescales, we compared venom protein expression levels across mainland and island populations for three species of pit vipers. We detected significant differentiation in protein expression levels in two of the three species and found that rapid-expression differentiation was restricted to low-expression proteins. Our results suggest that various constraints on high-expression proteins reduce the availability of beneficial expression variants relative to low-expression proteins, enabling low-expression proteins to evolve and potentially lead to more rapid adaptation. PMID:26546003
3D RNA and functional interactions from evolutionary couplings

PubMed Central

Weinreb, Caleb; Riesselman, Adam; Ingraham, John B.; Gross, Torsten; Sander, Chris; Marks, Debora S.

2016-01-01

Summary Non-coding RNAs are ubiquitous, but the discovery of new RNA gene sequences far outpaces research on their structure and functional interactions. We mine the evolutionary sequence record to derive precise information about function and structure of RNAs and RNA-protein complexes. As in protein structure prediction, we use maximum entropy global probability models of sequence co-variation to infer evolutionarily constrained nucleotide-nucleotide interactions within RNA molecules, and nucleotide-amino acid interactions in RNA-protein complexes. The predicted contacts allow all-atom blinded 3D structure prediction at good accuracy for several known RNA structures and RNA-protein complexes. For unknown structures, we predict contacts in 160 non-coding RNA families. Beyond 3D structure prediction, evolutionary couplings help identify important functional interactions, e.g., at switch points in riboswitches and at a complex nucleation site in HIV. Aided by accelerating sequence accumulation, evolutionary coupling analysis can accelerate the discovery of functional interactions and 3D structures involving RNA. PMID:27087444
RNA-ID, a Powerful Tool for Identifying and Characterizing Regulatory Sequences.

PubMed

Brule, C E; Dean, K M; Grayhack, E J

2016-01-01

The identification and analysis of sequences that regulate gene expression is critical because regulated gene expression underlies biology. RNA-ID is an efficient and sensitive method to discover and investigate regulatory sequences in the yeast Saccharomyces cerevisiae, using fluorescence-based assays to detect green fluorescent protein (GFP) relative to a red fluorescent protein (RFP) control in individual cells. Putative regulatory sequences can be inserted either in-frame or upstream of a superfolder GFP fusion protein whose expression, like that of RFP, is driven by the bidirectional GAL1,10 promoter. In this chapter, we describe the methodology to identify and study cis-regulatory sequences in the RNA-ID system, explaining features and variations of the RNA-ID reporter, as well as some applications of this system. We describe in detail the methods to analyze a single regulatory sequence, from construction of a single GFP variant to assay of variants by flow cytometry, as well as modifications required to screen libraries of different strains simultaneously. We also describe subsequent analyses of regulatory sequences. © 2016 Elsevier Inc. All rights reserved.
Evolutionary Influenced Interaction Pattern as Indicator for the Investigation of Natural Variants Causing Nephrogenic Diabetes Insipidus

PubMed Central

Labudde, Dirk

2015-01-01

The importance of short membrane sequence motifs has been shown in many works and emphasizes the related sequence motif analysis. Together with specific transmembrane helix-helix interactions, the analysis of interacting sequence parts is helpful for understanding the process during membrane protein folding and in retaining the three-dimensional fold. Here we present a simple high-throughput analysis method for deriving mutational information of interacting sequence parts. Applied on aquaporin water channel proteins, our approach supports the analysis of mutational variants within different interacting subsequences and finally the investigation of natural variants which cause diseases like, for example, nephrogenic diabetes insipidus. In this work we demonstrate a simple method for massive membrane protein data analysis. As shown, the presented in silico analyses provide information about interacting sequence parts which are constrained by protein evolution. We present a simple graphical visualization medium for the representation of evolutionary influenced interaction pattern pairs (EIPPs) adapted to mutagen investigations of aquaporin-2, a protein whose mutants are involved in the rare endocrine disorder known as nephrogenic diabetes insipidus, and membrane proteins in general. Furthermore, we present a new method to derive new evolutionary variations within EIPPs which can be used for further mutagen laboratory investigations. PMID:26180540
Evolutionary Influenced Interaction Pattern as Indicator for the Investigation of Natural Variants Causing Nephrogenic Diabetes Insipidus.

PubMed

Grunert, Steffen; Labudde, Dirk

2015-01-01

The importance of short membrane sequence motifs has been shown in many works and emphasizes the related sequence motif analysis. Together with specific transmembrane helix-helix interactions, the analysis of interacting sequence parts is helpful for understanding the process during membrane protein folding and in retaining the three-dimensional fold. Here we present a simple high-throughput analysis method for deriving mutational information of interacting sequence parts. Applied on aquaporin water channel proteins, our approach supports the analysis of mutational variants within different interacting subsequences and finally the investigation of natural variants which cause diseases like, for example, nephrogenic diabetes insipidus. In this work we demonstrate a simple method for massive membrane protein data analysis. As shown, the presented in silico analyses provide information about interacting sequence parts which are constrained by protein evolution. We present a simple graphical visualization medium for the representation of evolutionary influenced interaction pattern pairs (EIPPs) adapted to mutagen investigations of aquaporin-2, a protein whose mutants are involved in the rare endocrine disorder known as nephrogenic diabetes insipidus, and membrane proteins in general. Furthermore, we present a new method to derive new evolutionary variations within EIPPs which can be used for further mutagen laboratory investigations.
The Number, Organization, and Size of Polymorphic Membrane Protein Coding Sequences as well as the Most Conserved Pmp Protein Differ within and across Chlamydia Species.

PubMed

Van Lent, Sarah; Creasy, Heather Huot; Myers, Garry S A; Vanrompay, Daisy

2016-01-01

Variation is a central trait of the polymorphic membrane protein (Pmp) family. The number of pmp coding sequences differs between Chlamydia species, but it is unknown whether the number of pmp coding sequences is constant within a Chlamydia species. The level of conservation of the Pmp proteins has previously only been determined for Chlamydia trachomatis. As different Pmp proteins might be indispensible for the pathogenesis of different Chlamydia species, this study investigated the conservation of Pmp proteins both within and across C. trachomatis,C. pneumoniae,C. abortus, and C. psittaci. The pmp coding sequences were annotated in 16 C. trachomatis, 6 C. pneumoniae, 2 C. abortus, and 16 C. psittaci genomes. The number and organization of polymorphic membrane coding sequences differed within and across the analyzed Chlamydia species. The length of coding sequences of pmpA,pmpB, and pmpH was conserved among all analyzed genomes, while the length of pmpE/F and pmpG, and remarkably also of the subtype pmpD, differed among the analyzed genomes. PmpD, PmpA, PmpH, and PmpA were the most conserved Pmp in C. trachomatis,C. pneumoniae,C. abortus, and C. psittaci, respectively. PmpB was the most conserved Pmp across the 4 analyzed Chlamydia species. © 2016 S. Karger AG, Basel.
Comparison and correlation of Simple Sequence Repeats distribution in genomes of Brucella species

PubMed Central

Kiran, Jangampalli Adi Pradeep; Chakravarthi, Veeraraghavulu Praveen; Kumar, Yellapu Nanda; Rekha, Somesula Swapna; Kruti, Srinivasan Shanthi; Bhaskar, Matcha

2011-01-01

Computational genomics is one of the important tools to understand the distribution of closely related genomes including simple sequence repeats (SSRs) in an organism, which gives valuable information regarding genetic variations. The central objective of the present study was to screen the SSRs distributed in coding and non-coding regions among different human Brucella species which are involved in a range of pathological disorders. Computational analysis of the SSRs in the Brucella indicates few deviations from expected random models. Statistical analysis also reveals that tri-nucleotide SSRs are overrepresented and tetranucleotide SSRs underrepresented in Brucella genomes. From the data, it can be suggested that over expressed tri-nucleotide SSRs in genomic and coding regions might be responsible in the generation of functional variation of proteins expressed which in turn may lead to different pathogenicity, virulence determinants, stress response genes, transcription regulators and host adaptation proteins of Brucella genomes. Abbreviations SSRs - Simple Sequence Repeats, ORFs - Open Reading Frames. PMID:21738309
A gene variation of 14-3-3 zeta isoform in rat hippocampus.

PubMed

Murakami, K; Situ, S Y; Eshete, F

1996-11-14

A variant form of 14-3-3 zeta was isolated from the rat hippocampal cDNA library. The cloned cDNA is 1687 bp in length and it contains an entire ORF (nt = 63-797) with 245 amino acids that is characteristic to 14-3-3 zeta subtype. By comparing with reported sequences of 14-3-3 zeta, we found three nucleotide substitutions within the coding sequence in our clone; C<-->T transition at nt = 325 and G<-->C transversions at nt = 387 and 388. Both are missense mutations, leading ACG (Thr) to ATG (Met) and CGT (Arg) to GCT (Ala) conversions at residue 88 and 109, respectively. Our results show that at least three different genetic variants of 14-3-3 zeta are present in rat species which results in protein variations. Such mutation in the amino acid sequence is an important indication of the diverse functions of this protein and may also contribute to the recent contradictory observations regarding the role of the 14-3-3 zeta subtype.
Integrative analysis of RNA, translation, and protein levels reveals distinct regulatory variation across humans.

PubMed

Cenik, Can; Cenik, Elif Sarinay; Byeon, Gun W; Grubert, Fabian; Candille, Sophie I; Spacek, Damek; Alsallakh, Bilal; Tilgner, Hagen; Araya, Carlos L; Tang, Hua; Ricci, Emiliano; Snyder, Michael P

2015-11-01

Elucidating the consequences of genetic differences between humans is essential for understanding phenotypic diversity and personalized medicine. Although variation in RNA levels, transcription factor binding, and chromatin have been explored, little is known about global variation in translation and its genetic determinants. We used ribosome profiling, RNA sequencing, and mass spectrometry to perform an integrated analysis in lymphoblastoid cell lines from a diverse group of individuals. We find significant differences in RNA, translation, and protein levels suggesting diverse mechanisms of personalized gene expression control. Combined analysis of RNA expression and ribosome occupancy improves the identification of individual protein level differences. Finally, we identify genetic differences that specifically modulate ribosome occupancy--many of these differences lie close to start codons and upstream ORFs. Our results reveal a new level of gene expression variation among humans and indicate that genetic variants can cause changes in protein levels through effects on translation. © 2015 Cenik et al.; Published by Cold Spring Harbor Laboratory Press.
Full-Length Venom Protein cDNA Sequences from Venom-Derived mRNA: Exploring Compositional Variation and Adaptive Multigene Evolution

PubMed Central

Modahl, Cassandra M.; Mackessy, Stephen P.

2016-01-01

Envenomation of humans by snakes is a complex and continuously evolving medical emergency, and treatment is made that much more difficult by the diverse biochemical composition of many venoms. Venomous snakes and their venoms also provide models for the study of molecular evolutionary processes leading to adaptation and genotype-phenotype relationships. To compare venom complexity and protein sequences, venom gland transcriptomes are assembled, which usually requires the sacrifice of snakes for tissue. However, toxin transcripts are also present in venoms, offering the possibility of obtaining cDNA sequences directly from venom. This study provides evidence that unknown full-length venom protein transcripts can be obtained from the venoms of multiple species from all major venomous snake families. These unknown venom protein cDNAs are obtained by the use of primers designed from conserved signal peptide sequences within each venom protein superfamily. This technique was used to assemble a partial venom gland transcriptome for the Middle American Rattlesnake (Crotalus simus tzabcan) by amplifying sequences for phospholipases A2, serine proteases, C-lectins, and metalloproteinases from within venom. Phospholipase A2 sequences were also recovered from the venoms of several rattlesnakes and an elapid snake (Pseudechis porphyriacus), and three-finger toxin sequences were recovered from multiple rear-fanged snake species, demonstrating that the three major clades of advanced snakes (Elapidae, Viperidae, Colubridae) have stable mRNA present in their venoms. These cDNA sequences from venom were then used to explore potential activities derived from protein sequence similarities and evolutionary histories within these large multigene superfamilies. Venom-derived sequences can also be used to aid in characterizing venoms that lack proteomic profiles and identify sequence characteristics indicating specific envenomation profiles. This approach, requiring only venom, provides access to cDNA sequences in the absence of living specimens, even from commercial venom sources, to evaluate important regional differences in venom composition and to study snake venom protein evolution. PMID:27280639
Searching for evidence of selection in avian DNA barcodes.

PubMed

Kerr, Kevin C R

2011-11-01

The barcode of life project has assembled a tremendous number of mitochondrial cytochrome c oxidase I (COI) sequences. Although these sequences were gathered to develop a DNA-based system for species identification, it has been suggested that further biological inferences may also be derived from this wealth of data. Recurrent selective sweeps have been invoked as an evolutionary mechanism to explain limited intraspecific COI diversity, particularly in birds, but this hypothesis has not been formally tested. In this study, I collated COI sequences from previous barcoding studies on birds and tested them for evidence of selection. Using this expanded data set, I re-examined the relationships between intraspecific diversity and interspecific divergence and sampling effort, respectively. I employed the McDonald-Kreitman test to test for neutrality in sequence evolution between closely related pairs of species. Because amino acid sequences were generally constrained between closely related pairs, I also included broader intra-order comparisons to quantify patterns of protein variation in avian COI sequences. Lastly, using 22 published whole mitochondrial genomes, I compared the evolutionary rate of COI against the other 12 protein-coding mitochondrial genes to assess intragenomic variability. I found no conclusive evidence of selective sweeps. Most evidence pointed to an overall trend of strong purifying selection and functional constraint. The COI protein did vary across the class Aves, but to a very limited extent. COI was the least variable gene in the mitochondrial genome, suggesting that other genes might be more informative for probing factors constraining mitochondrial variation within species. © 2011 Blackwell Publishing Ltd.
Sweet Taste Receptor Gene Variation and Aspartame Taste in Primates and Other Species

PubMed Central

Li, Xia; Bachmanov, Alexander A.; Maehashi, Kenji; Li, Weihua; Lim, Raymond; Brand, Joseph G.; Beauchamp, Gary K.; Reed, Danielle R.; Thai, Chloe

2011-01-01

Aspartame is a sweetener added to foods and beverages as a low-calorie sugar replacement. Unlike sugars, which are apparently perceived as sweet and desirable by a range of mammals, the ability to taste aspartame varies, with humans, apes, and Old World monkeys perceiving aspartame as sweet but not other primate species. To investigate whether the ability to perceive the sweetness of aspartame correlates with variations in the DNA sequence of the genes encoding sweet taste receptor proteins, T1R2 and T1R3, we sequenced these genes in 9 aspartame taster and nontaster primate species. We then compared these sequences with sequences of their orthologs in 4 other nontasters species. We identified 9 variant sites in the gene encoding T1R2 and 32 variant sites in the gene encoding T1R3 that distinguish aspartame tasters and nontasters. Molecular docking of aspartame to computer-generated models of the T1R2 + T1R3 receptor dimer suggests that species variation at a secondary, allosteric binding site in the T1R2 protein is the most likely origin of differences in perception of the sweetness of aspartame. These results identified a previously unknown site of aspartame interaction with the sweet receptor and suggest that the ability to taste aspartame might have developed during evolution to exploit a specialized food niche. PMID:21414996
Sweet taste receptor gene variation and aspartame taste in primates and other species.

PubMed

Li, Xia; Bachmanov, Alexander A; Maehashi, Kenji; Li, Weihua; Lim, Raymond; Brand, Joseph G; Beauchamp, Gary K; Reed, Danielle R; Thai, Chloe; Floriano, Wely B

2011-06-01

Aspartame is a sweetener added to foods and beverages as a low-calorie sugar replacement. Unlike sugars, which are apparently perceived as sweet and desirable by a range of mammals, the ability to taste aspartame varies, with humans, apes, and Old World monkeys perceiving aspartame as sweet but not other primate species. To investigate whether the ability to perceive the sweetness of aspartame correlates with variations in the DNA sequence of the genes encoding sweet taste receptor proteins, T1R2 and T1R3, we sequenced these genes in 9 aspartame taster and nontaster primate species. We then compared these sequences with sequences of their orthologs in 4 other nontasters species. We identified 9 variant sites in the gene encoding T1R2 and 32 variant sites in the gene encoding T1R3 that distinguish aspartame tasters and nontasters. Molecular docking of aspartame to computer-generated models of the T1R2 + T1R3 receptor dimer suggests that species variation at a secondary, allosteric binding site in the T1R2 protein is the most likely origin of differences in perception of the sweetness of aspartame. These results identified a previously unknown site of aspartame interaction with the sweet receptor and suggest that the ability to taste aspartame might have developed during evolution to exploit a specialized food niche.
Sequence variations of the partially dominant DELLA gene Rht-B1c in wheat and their functional impacts

PubMed Central

Ma, Zhengqiang

2013-01-01

Rht-B1c, allelic to the DELLA protein-encoding gene Rht-B1a, is a natural mutation documented in common wheat (Triticum aestivum). It confers variation to a number of traits related to cell and plant morphology, seed dormancy, and photosynthesis. The present study was conducted to examine the sequence variations of Rht-B1c and their functional impacts. The results showed that Rht-B1c was partially dominant or co-dominant for plant height, and exhibited an increased dwarfing effect. At the sequence level, Rht-B1c differed from Rht-B1a by one 2kb Veju retrotransposon insertion, three coding region single nucleotide polymorphisms (SNPs), one 197bp insertion, and four SNPs in the 1kb upstream sequence. Haplotype investigations, association analyses, transient expression assays, and expression profiling showed that the Veju insertion was primarily responsible for the extreme dwarfing effect. It was found that the Veju insertion changed processing of the Rht-B1c transcripts and resulted in DELLA motif primary structure disruption. Expression assays showed that Rht-B1c caused reduction of total Rht-1 transcript levels, and up-regulation of GATA-like transcription factors and genes positively regulated by these factors, suggesting that one way in which Rht-1 proteins affect plant growth and development is through GATA-like transcription factor regulation. PMID:23918966

Theileria parva antigens recognized by CD8+ T cells show varying degrees of diversity in buffalo-derived infected cell lines.

PubMed

Sitt, Tatjana; Pelle, Roger; Chepkwony, Maurine; Morrison, W Ivan; Toye, Philip

2018-05-06

The extent of sequence diversity among the genes encoding 10 antigens (Tp1-10) known to be recognized by CD8+ T lymphocytes from cattle immune to Theileria parva was analysed. The sequences were derived from parasites in 23 buffalo-derived cell lines, three cattle-derived isolates and one cloned cell line obtained from a buffalo-derived stabilate. The results revealed substantial variation among the antigens through sequence diversity. The greatest nucleotide and amino acid diversity were observed in Tp1, Tp2 and Tp9. Tp5 and Tp7 showed the least amount of allelic diversity, and Tp5, Tp6 and Tp7 had the lowest levels of protein diversity. Tp6 was the most conserved protein; only a single non-synonymous substitution was found in all obtained sequences. The ratio of non-synonymous: synonymous substitutions varied from 0.84 (Tp1) to 0.04 (Tp6). Apart from Tp2 and Tp9, we observed no variation in the other defined CD8+ T cell epitopes (Tp4, 5, 7 and 8), indicating that epitope variation is not a universal feature of T. parva antigens. In addition to providing markers that can be used to examine the diversity in T. parva populations, the results highlight the potential for using conserved antigens to develop vaccines that provide broad protection against T. parva.
Clustering evolving proteins into homologous families.

PubMed

Chan, Cheong Xin; Mahbob, Maisarah; Ragan, Mark A

2013-04-08

Clustering sequences into groups of putative homologs (families) is a critical first step in many areas of comparative biology and bioinformatics. The performance of clustering approaches in delineating biologically meaningful families depends strongly on characteristics of the data, including content bias and degree of divergence. New, highly scalable methods have recently been introduced to cluster the very large datasets being generated by next-generation sequencing technologies. However, there has been little systematic investigation of how characteristics of the data impact the performance of these approaches. Using clusters from a manually curated dataset as reference, we examined the performance of a widely used graph-based Markov clustering algorithm (MCL) and a greedy heuristic approach (UCLUST) in delineating protein families coded by three sets of bacterial genomes of different G+C content. Both MCL and UCLUST generated clusters that are comparable to the reference sets at specific parameter settings, although UCLUST tends to under-cluster compositionally biased sequences (G+C content 33% and 66%). Using simulated data, we sought to assess the individual effects of sequence divergence, rate heterogeneity, and underlying G+C content. Performance decreased with increasing sequence divergence, decreasing among-site rate variation, and increasing G+C bias. Two MCL-based methods recovered the simulated families more accurately than did UCLUST. MCL using local alignment distances is more robust across the investigated range of sequence features than are greedy heuristics using distances based on global alignment. Our results demonstrate that sequence divergence, rate heterogeneity and content bias can individually and in combination affect the accuracy with which MCL and UCLUST can recover homologous protein families. For application to data that are more divergent, and exhibit higher among-site rate variation and/or content bias, MCL may often be the better choice, especially if computational resources are not limiting.
[Genetic variation analysis of canine parvovirus VP2 gene in China].

PubMed

Yi, Li; Cheng, Shi-Peng; Yan, Xi-Jun; Wang, Jian-Ke; Luo, Bin

2009-11-01

To recognize the molecular biology character, phylogenetic relationship and the state quo prevalent of Canine parvovirus (CPV), Faecal samnples from pet dogs with acute enteritis in the cities of Beijing, Wuhan, and Nanjing were collected and tested for CPV by PCR and other assay between 2006 and 2008. There was no CPV to FPV (MEV) variation by PCR-RFLP analysis in all samples. The complete ORFs of VP2 genes were obtained by PCR from 15 clinical CPVs and 2 CPV vaccine strains. All amplicons were cloned and sequenced. Analysis of the VP2 sequences showed that clinical CPVs both belong to CPV-2a subtype, and could be classified into a new cluster by amino acids contrasting which contains Tyr-->Ile (324) mutation. Besides the 2 CPV vaccine strains belong to CPV-2 subtype, and both of them have scattered variation in amino acids residues of VP2 protein. Construction of the phylogenetic tree based on CPV VP2 sequence showed these 15 CPV clinical strains were in close relationship with Korea strain K001 than CPV-2a isolates in other countries at early time, It is indicated that the canine parvovirus genetic variation was associated with location and time in some degree. The survey of CPV capsid protein VP2 gene provided the useful information for the identification of CPV types and understanding of their genetic relationship.
Retroviral DNA Integration Directed by HIV Integration Protein in Vitro

NASA Astrophysics Data System (ADS)

Bushman, Frederic D.; Fujiwara, Tamio; Craigie, Robert

1990-09-01

Efficient retroviral growth requires integration of a DNA copy of the viral RNA genome into a chromosome of the host. As a first step in analyzing the mechanism of integration of human immunodeficiency virus (HIV) DNA, a cell-free system was established that models the integration reaction. The in vitro system depends on the HIV integration (IN) protein, which was partially purified from insect cells engineered to express IN protein in large quantities. Integration was detected in a biological assay that scores the insertion of a linear DNA containing HIV terminal sequences into a λ DNA target. Some integration products generated in this assay contained five-base pair duplications of the target DNA at the recombination junctions, a characteristic of HIV integration in vivo; the remaining products contained aberrant junctional sequences that may have been produced in a variation of the normal reaction. These results indicate that HIV IN protein is the only viral protein required to insert model HIV DNA sequences into a target DNA in vitro.
An experimental phylogeny to benchmark ancestral sequence reconstruction

PubMed Central

Randall, Ryan N.; Radford, Caelan E.; Roof, Kelsey A.; Natarajan, Divya K.; Gaucher, Eric A.

2016-01-01

Ancestral sequence reconstruction (ASR) is a still-burgeoning method that has revealed many key mechanisms of molecular evolution. One criticism of the approach is an inability to validate its algorithms within a biological context as opposed to a computer simulation. Here we build an experimental phylogeny using the gene of a single red fluorescent protein to address this criticism. The evolved phylogeny consists of 19 operational taxonomic units (leaves) and 17 ancestral bifurcations (nodes) that display a wide variety of fluorescent phenotypes. The 19 leaves then serve as ‘modern' sequences that we subject to ASR analyses using various algorithms and to benchmark against the known ancestral genotypes and ancestral phenotypes. We confirm computer simulations that show all algorithms infer ancient sequences with high accuracy, yet we also reveal wide variation in the phenotypes encoded by incorrectly inferred sequences. Specifically, Bayesian methods incorporating rate variation significantly outperform the maximum parsimony criterion in phenotypic accuracy. Subsampling of extant sequences had minor effect on the inference of ancestral sequences. PMID:27628687
Intrastrain heterogeneity of the mgpB gene in Mycoplasma genitalium is extensive in vitro and in vivo and suggests that variation is generated via recombination with repetitive chromosomal sequences.

PubMed

Iverson-Cabral, Stefanie L; Astete, Sabina G; Cohen, Craig R; Rocha, Eduardo P C; Totten, Patricia A

2006-07-01

Mycoplasma genitalium is associated with reproductive tract disease in women and may persist in the lower genital tract for months, potentially increasing the risk of upper tract infection and transmission to uninfected partners. Despite its exceptionally small genome (580 kb), approximately 4% is composed of repeated elements known as MgPar sequences (MgPa repeats) based on their homology to the mgpB gene that encodes the immunodominant MgPa adhesin protein. The presence of these MgPar sequences, as well as mgpB variability between M. genitalium strains, suggests that mgpB and MgPar sequences recombine to produce variant MgPa proteins. To examine the extent and generation of diversity within single strains of the organism, we examined mgpB variation within M. genitalium strain G-37 and observed sequence heterogeneity that could be explained by recombination between the mgpB expression site and putative donor MgPar sequences. Similarly, we analyzed mgpB sequences from cervical specimens from a persistently infected woman (21 months) and identified 17 different mgpB variants within a single infecting M. genitalium strain, confirming that mgpB heterogeneity occurs over the course of a natural infection. These observations support the hypothesis that recombination occurs between the mgpB gene and MgPar sequences and that the resulting antigenically distinct MgPa variants may contribute to immune evasion and persistence of infection.
[Complete genome sequencing and analyses of rabies viruses isolated from wild animals (Chinese Ferret-Badger) in Zhejiang province].

PubMed

Lei, Yong-Liang; Wang, Xiao-Guang; Liu, Fu-Ming; Chen, Xiu-Ying; Ye, Bi-Feng; Mei, Jian-Hua; Lan, Jin-Quan; Tang, Qing

2009-08-01

Based on sequencing the full-length genomes of two Chinese Ferret-Badger, we analyzed the properties of rabies viruses genetic variation in molecular level to get information on prevalence and variation of rabies viruses in Zhejiang, and to enrich the genome database of rabies viruses street strains isolated from Chinese wildlife. Overlapped fragments were amplified by RT-PCR and full-length genomes were assembled to analyze the nucleotide and deduced protein similarities and phylogenetic analyses of the N genes from Chinese Ferret-Badger, sika deer, vole, dog. Vaccine strains were then determined. The two full-length genomes were completely sequenced to find out that they had the same genetic structure with 11 923 nts including 58 nts-Leader, 1353 nts-NP, 894 nts-PP, 609 nts-MP, 1575 nts-GP, 6386 nts-LP, and 2, 5, 5 nts- intergenic regions (IGRs), 423 nts-Pseudogene-like sequence (Psi), 70 nts-Trailer. The two full-length genomes were in accordance with the properties of Rhabdoviridae Lyssa virus by blast and multi-sequence alignment. The nucleotide and amino acid sequences among Chinese strains had the highest similarity, especially among animals of the same species. Of the two full-length genomes, the similarity in amino acid level was dramatically higher than that in nucleotide level, so that the nucleotide mutations happened in these two genomes were most probably as synonymous mutations. Compared to the referenced rabies viruses, the lengths of the five protein coding regions did not show any changes or recombination, but only with a few-point mutations. It was evident that the five proteins appeared to be stable. The variation sites and types of the two ferret badgers genomes were similar to the referenced vaccine or street strains. The two strains were genotype 1 according to the multi-sequence and phylogenetic analyses, which possessing the distinct geographyphic characteristics of China. All the evidence suggested a cue that these two ferret badgers rabies viruses were likely to be street virus that already circulating in wildlife.
A survey of single nucleotide polymorphisms identified from whole-genome sequencing and their functional effect in the porcine genome.

PubMed

Keel, B N; Nonneman, D J; Rohrer, G A

2017-08-01

Genetic variants detected from sequence have been used to successfully identify causal variants and map complex traits in several organisms. High and moderate impact variants, those expected to alter or disrupt the protein coded by a gene and those that regulate protein production, likely have a more significant effect on phenotypic variation than do other types of genetic variants. Hence, a comprehensive list of these functional variants would be of considerable interest in swine genomic studies, particularly those targeting fertility and production traits. Whole-genome sequence was obtained from 72 of the founders of an intensely phenotyped experimental swine herd at the U.S. Meat Animal Research Center (USMARC). These animals included all 24 of the founding boars (12 Duroc and 12 Landrace) and 48 Yorkshire-Landrace composite sows. Sequence reads were mapped to the Sscrofa10.2 genome build, resulting in a mean of 6.1 fold (×) coverage per genome. A total of 22 342 915 high confidence SNPs were identified from the sequenced genomes. These included 21 million previously reported SNPs and 79% of the 62 163 SNPs on the PorcineSNP60 BeadChip assay. Variation was detected in the coding sequence or untranslated regions (UTRs) of 87.8% of the genes in the porcine genome: loss-of-function variants were predicted in 504 genes, 10 202 genes contained nonsynonymous variants, 10 773 had variation in UTRs and 13 010 genes contained synonymous variants. Approximately 139 000 SNPs were classified as loss-of-function, nonsynonymous or regulatory, which suggests that over 99% of the variation detected in our pigs could potentially be ignored, allowing us to focus on a much smaller number of functional SNPs during future analyses. Published 2017. This article is a U.S. Government work and is in the public domain in the USA.
Understanding Neurodevelopmental Disorders: The Promise of Regulatory Variation in the 3'UTRome.

PubMed

Wanke, Kai A; Devanna, Paolo; Vernes, Sonja C

2018-04-01

Neurodevelopmental disorders have a strong genetic component, but despite widespread efforts, the specific genetic factors underlying these disorders remain undefined for a large proportion of affected individuals. Given the accessibility of exome sequencing, this problem has thus far been addressed from a protein-centric standpoint; however, protein-coding regions only make up ∼1% to 2% of the human genome. With the advent of whole genome sequencing we are in the midst of a paradigm shift as it is now possible to interrogate the entire sequence of the human genome (coding and noncoding) to fill in the missing heritability of complex disorders. These new technologies bring new challenges, as the number of noncoding variants identified per individual can be overwhelming, making it prudent to focus on noncoding regions of known function, for which the effects of variation can be predicted and directly tested to assess pathogenicity. The 3'UTRome is a region of the noncoding genome that perfectly fulfills these criteria and is of high interest when searching for pathogenic variation related to complex neurodevelopmental disorders. Herein, we review the regulatory roles of the 3'UTRome as binding sites for microRNAs or RNA binding proteins, or during alternative polyadenylation. We detail existing evidence that these regions contribute to neurodevelopmental disorders and outline strategies for identification and validation of novel putatively pathogenic variation in these regions. This evidence suggests that studying the 3'UTRome will lead to the identification of new risk factors, new candidate disease genes, and a better understanding of the molecular mechanisms contributing to neurodevelopmental disorders. Copyright © 2017 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.
In Silico Detection of Sequence Variations Modifying Transcriptional Regulation

PubMed Central

Andersen, Malin C; Engström, Pär G; Lithwick, Stuart; Arenillas, David; Eriksson, Per; Lenhard, Boris; Wasserman, Wyeth W; Odeberg, Jacob

2008-01-01

Identification of functional genetic variation associated with increased susceptibility to complex diseases can elucidate genes and underlying biochemical mechanisms linked to disease onset and progression. For genes linked to genetic diseases, most identified causal mutations alter an encoded protein sequence. Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription. However, it remains a challenge to separate causal genetic variations from linked neutral variations. Here we present an in silico driven approach to identify possible genetic variation in regulatory sequences. The approach combines phylogenetic footprinting and transcription factor binding site prediction to identify variation in candidate cis-regulatory elements. The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs. In the absence of additional information about an analyzed gene, the poor specificity of binding site prediction is prohibitive to its application. However, when additional data is available that can give guidance on which transcription factor is involved in the regulation of the gene, the in silico binding site prediction improves the selection of candidate regulatory polymorphisms for further analyses. The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers). The RAVEN system is available at http://www.cisreg.ca for all researchers interested in the detection and characterization of regulatory sequence variation. PMID:18208319
Measles virus minigenomes encoding two autofluorescent proteins reveal cell-to-cell variation in reporter expression dependent on viral sequences between the transcription units.

PubMed

Rennick, Linda J; Duprex, W Paul; Rima, Bert K

2007-10-01

Transcription from morbillivirus genomes commences at a single promoter in the 3' non-coding terminus, with the six genes being transcribed sequentially. The 3' and 5' untranslated regions (UTRs) of the genes (mRNA sense), together with the intergenic trinucleotide spacer, comprise the non-coding sequences (NCS) of the virus and contain the conserved gene end and gene start signals, respectively. Bicistronic minigenomes containing transcription units (TUs) encoding autofluorescent reporter proteins separated by measles virus (MV) NCS were used to give a direct estimation of gene expression in single, living cells by assessing the relative amounts of each fluorescent protein in each cell. Initially, five minigenomes containing each of the MV NCS were generated. Assays were developed to determine the amount of each fluorescent protein in cells at both cell population and single-cell levels. This revealed significant variations in gene expression between cells expressing the same NCS-containing minigenome. The minigenome containing the M/F NCS produced significantly lower amounts of fluorescent protein from the second TU (TU2), compared with the other minigenomes. A minigenome with a truncated F 5' UTR had increased expression from TU2. This UTR is 524 nt longer than the other MV 5' UTRs. Insertions into the 5' UTR of the enhanced green fluorescent protein gene in the minigenome containing the N/P NCS showed that specific sequences, rather than just the additional length of F 5' UTR, govern this decreased expression from TU2.
The Genome of the Obligately Intracellular Bacterium Ehrlichia canis Reveals Themes of Complex Membrane Structure and Immune Evasion Strategies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mavromatis, K; Doyle, C Kuyler; Lykidis, A

2006-01-01

Ehrlichia canis, a small obligately intracellular, tick-transmitted, gram-negative, {alpha}-proteobacterium, is the primary etiologic agent of globally distributed canine monocytic ehrlichiosis. Complete genome sequencing revealed that the E. canis genome consists of a single circular chromosome of 1,315,030 bp predicted to encode 925 proteins, 40 stable RNA species, 17 putative pseudogenes, and a substantial proportion of noncoding sequence (27%). Interesting genome features include a large set of proteins with transmembrane helices and/or signal sequences and a unique serine-threonine bias associated with the potential for O glycosylation that was prominent in proteins associated with pathogen-host interactions. Furthermore, two paralogous protein families associatedmore » with immune evasion were identified, one of which contains poly(G-C) tracts, suggesting that they may play a role in phase variation and facilitation of persistent infections. Genes associated with pathogen-host interactions were identified, including a small group encoding proteins (n = 12) with tandem repeats and another group encoding proteins with eukaryote-like ankyrin domains (n = 7).« less
The genome of obligately intracellular Ehrlichia canis revealsthemes of complex membrane structure and immune evasion strategies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mavromatis, K.; Kuyler Doyle, C.; Lykidis, A.

2005-09-01

Ehrlichia canis, a small obligately intracellular, tick-transmitted, gram-negative, a-proteobacterium is the primary etiologic agent of globally distributed canine monocytic ehrlichiosis. Complete genome sequencing revealed that the E. canis genome consists of a single circular chromosome of 1,315,030 bp predicted to encode 925 proteins, 40 stable RNA species, and 17 putative pseudogenes, and a substantial proportion of non-coding sequence (27 percent). Interesting genome features include a large set of proteins with transmembrane helices and/or signal sequences, and a unique serine-threonine bias associated with the potential for O-glycosylation that was prominent in proteins associated with pathogen-host interactions. Furthermore, two paralogous protein familiesmore » associated with immune evasion were identified, one of which contains poly G:C tracts, suggesting that they may play a role in phase variation and facilitation of persistent infections. Proteins associated with pathogen-host interactions were identified including a small group of proteins (12) with tandem repeats and another with eukaryotic-like ankyrin domains (7).« less
Molecular cloning of a cDNA coding for GTP cyclohydrolase I from Dictyostelium discoideum.

PubMed Central

Witter, K; Cahill, D J; Werner, T; Ziegler, I; Rödl, W; Bacher, A; Gütlich, M

1996-01-01

The GTP cyclohydrolase I (GTP-CH) gene of the cellular slime mould Dictyostelium discoideum has been cloned and sequenced. The 855 bp cDNA of this gene contains the open reading frame (ORF) encoding 232 amino acids with a predicted molecular mass of approx. 26 kDa. Southern blot analysis indicated the presence of a single gene for GTP-CH in Dictyostelium. PCR amplification of the ORF from chromosomal DNA and sequencing showed the existence of a 101 bp intron in the GTP-CH gene of Dictyostelium discoideum. The amino acid sequence has 47% and 49% positional identity to those of the human and yeast enzymes respectively. Most of the sequence variation between species is located in the N-terminal part of the protein. The overall identity with the E. coli protein is markedly lower. The enzyme was expressed in E. coli and purified as a 68 kDa fusion protein with the maltose-binding protein of E. coli. GTP-CH of Dictyostelium is heat-stable and showed maximal activity at 60 degrees C. The Km value for GTP is 50 microM. PMID:8870645
Genetic variability among Schistosoma japonicum isolates from the Philippines, Japan and China revealed by sequence analysis of three mitochondrial genes.

PubMed

Chen, Fen; Li, Juan; Sugiyama, Hiromu; Zhou, Dong-Hui; Song, Hui-Qun; Zhao, Guang-Hui; Zhu, Xing-Quan

2015-02-01

The present study examined sequence variability in the mitochondrial (mt) protein-coding genes cytochrome b (cytb), NADH dehydrogenase subunits 2 and 6 (nad2 and nad6) among 24 isolates of Schistosoma japonicum from different endemic regions in the Philippines, Japan and China. The complete cytb, nad2 and nad6 genes were amplified and sequenced separately from individual schistosome. Sequence variations for isolates from the Philippines were 0-0.5% for cytb, 0-0.6% for nad2, and 0-0.9% for nad6. Variation was 0-0.5%, 0.1-0.8%, 0-0.7% for corresponding genes for schistosome samples from mainland China. For worms in Japan, genetic variations were 0-0.2%, 0.1-0.2% and 0 for the three genes, respectively. Sequence variations were 0-1.0%, 0-1.8% and 0-1.1% for cytb, nad2 and nad6, respectively, among schistosome isolates from different geographical strains in the Philippines, Japan and China. Of the three countries, lowest sequence variations were found between isolates from mainland China and the Philippines and highest were detected between Japan and the Philippines in three mtDNA genes. Phylogenetic analyses based on the combined sequences of cytb, nad2 and nad6 revealed that all isolates in the Philippines clustered together sistered to samples from Yunnan and Zhejiang provinces in China, while isolates from Yamanashi in Japan were in a solitary clade. These results demonstrated the usefulness of the combined three mtDNA sequences for studying genetic diversity and population structure among S. japonicum isolates from the Philippines, China and Japan.
Understanding the mechanisms of protein-DNA interactions

NASA Astrophysics Data System (ADS)

Lavery, Richard

2004-03-01

Structural, biochemical and thermodynamic data on protein-DNA interactions show that specific recognition cannot be reduced to a simple set of binary interactions between the partners (such as hydrogen bonds, ion pairs or steric contacts). The mechanical properties of the partners also play a role and, in the case of DNA, variations in both conformation and flexibility as a function of base sequence can be a significant factor in guiding a protein to the correct binding site. All-atom molecular modeling offers a means of analyzing the role of different binding mechanisms within protein-DNA complexes of known structure. This however requires estimating the binding strengths for the full range of sequences with which a given protein can interact. Since this number grows exponentially with the length of the binding site it is necessary to find a method to accelerate the calculations. We have achieved this by using a multi-copy approach (ADAPT) which allows us to build a DNA fragment with a variable base sequence. The results obtained with this method correlate well with experimental consensus binding sequences. They enable us to show that indirect recognition mechanisms involving the sequence dependent properties of DNA play a significant role in many complexes. This approach also offers a means of predicting protein binding sites on the basis of binding energies, which is complementary to conventional lexical techniques.
Structurally detailed coarse-grained model for Sec-facilitated co-translational protein translocation and membrane integration

PubMed Central

Miller, Thomas F.

2017-01-01

We present a coarse-grained simulation model that is capable of simulating the minute-timescale dynamics of protein translocation and membrane integration via the Sec translocon, while retaining sufficient chemical and structural detail to capture many of the sequence-specific interactions that drive these processes. The model includes accurate geometric representations of the ribosome and Sec translocon, obtained directly from experimental structures, and interactions parameterized from nearly 200 μs of residue-based coarse-grained molecular dynamics simulations. A protocol for mapping amino-acid sequences to coarse-grained beads enables the direct simulation of trajectories for the co-translational insertion of arbitrary polypeptide sequences into the Sec translocon. The model reproduces experimentally observed features of membrane protein integration, including the efficiency with which polypeptide domains integrate into the membrane, the variation in integration efficiency upon single amino-acid mutations, and the orientation of transmembrane domains. The central advantage of the model is that it connects sequence-level protein features to biological observables and timescales, enabling direct simulation for the mechanistic analysis of co-translational integration and for the engineering of membrane proteins with enhanced membrane integration efficiency. PMID:28328943
Global assessment of genomic variation in cattle by genome resequencing and high-throughput genotyping

PubMed Central

2011-01-01

Background Integration of genomic variation with phenotypic information is an effective approach for uncovering genotype-phenotype associations. This requires an accurate identification of the different types of variation in individual genomes. Results We report the integration of the whole genome sequence of a single Holstein Friesian bull with data from single nucleotide polymorphism (SNP) and comparative genomic hybridization (CGH) array technologies to determine a comprehensive spectrum of genomic variation. The performance of resequencing SNP detection was assessed by combining SNPs that were identified to be either in identity by descent (IBD) or in copy number variation (CNV) with results from SNP array genotyping. Coding insertions and deletions (indels) were found to be enriched for size in multiples of 3 and were located near the N- and C-termini of proteins. For larger indels, a combination of split-read and read-pair approaches proved to be complementary in finding different signatures. CNVs were identified on the basis of the depth of sequenced reads, and by using SNP and CGH arrays. Conclusions Our results provide high resolution mapping of diverse classes of genomic variation in an individual bovine genome and demonstrate that structural variation surpasses sequence variation as the main component of genomic variability. Better accuracy of SNP detection was achieved with little loss of sensitivity when algorithms that implemented mapping quality were used. IBD regions were found to be instrumental for calculating resequencing SNP accuracy, while SNP detection within CNVs tended to be less reliable. CNV discovery was affected dramatically by platform resolution and coverage biases. The combined data for this study showed that at a moderate level of sequencing coverage, an ensemble of platforms and tools can be applied together to maximize the accurate detection of sequence and structural variants. PMID:22082336
Proteogenomic Investigation of Strain Variation in Clinical Mycobacterium tuberculosis Isolates.

PubMed

Heunis, Tiaan; Dippenaar, Anzaan; Warren, Robin M; van Helden, Paul D; van der Merwe, Ruben G; Gey van Pittius, Nicolaas C; Pain, Arnab; Sampson, Samantha L; Tabb, David L

2017-10-06

Mycobacterium tuberculosis consists of a large number of different strains that display unique virulence characteristics. Whole-genome sequencing has revealed substantial genetic diversity among clinical M. tuberculosis isolates, and elucidating the phenotypic variation encoded by this genetic diversity will be of the utmost importance to fully understand M. tuberculosis biology and pathogenicity. In this study, we integrated whole-genome sequencing and mass spectrometry (GeLC-MS/MS) to reveal strain-specific characteristics in the proteomes of two clinical M. tuberculosis Latin American-Mediterranean isolates. Using this approach, we identified 59 peptides containing single amino acid variants, which covered ∼9% of all coding nonsynonymous single nucleotide variants detected by whole-genome sequencing. Furthermore, we identified 29 distinct peptides that mapped to a hypothetical protein not present in the M. tuberculosis H37Rv reference proteome. Here, we provide evidence for the expression of this protein in the clinical M. tuberculosis SAWC3651 isolate. The strain-specific databases enabled confirmation of genomic differences (i.e., large genomic regions of difference and nonsynonymous single nucleotide variants) in these two clinical M. tuberculosis isolates and allowed strain differentiation at the proteome level. Our results contribute to the growing field of clinical microbial proteogenomics and can improve our understanding of phenotypic variation in clinical M. tuberculosis isolates.
The evolution of transcriptional regulation in eukaryotes

NASA Technical Reports Server (NTRS)

Wray, Gregory A.; Hahn, Matthew W.; Abouheif, Ehab; Balhoff, James P.; Pizer, Margaret; Rockman, Matthew V.; Romano, Laura A.

2003-01-01

Gene expression is central to the genotype-phenotype relationship in all organisms, and it is an important component of the genetic basis for evolutionary change in diverse aspects of phenotype. However, the evolution of transcriptional regulation remains understudied and poorly understood. Here we review the evolutionary dynamics of promoter, or cis-regulatory, sequences and the evolutionary mechanisms that shape them. Existing evidence indicates that populations harbor extensive genetic variation in promoter sequences, that a substantial fraction of this variation has consequences for both biochemical and organismal phenotype, and that some of this functional variation is sorted by selection. As with protein-coding sequences, rates and patterns of promoter sequence evolution differ considerably among loci and among clades for reasons that are not well understood. Studying the evolution of transcriptional regulation poses empirical and conceptual challenges beyond those typically encountered in analyses of coding sequence evolution: promoter organization is much less regular than that of coding sequences, and sequences required for the transcription of each locus reside at multiple other loci in the genome. Because of the strong context-dependence of transcriptional regulation, sequence inspection alone provides limited information about promoter function. Understanding the functional consequences of sequence differences among promoters generally requires biochemical and in vivo functional assays. Despite these challenges, important insights have already been gained into the evolution of transcriptional regulation, and the pace of discovery is accelerating.

Effect of the sequence data deluge on the performance of methods for detecting protein functional residues.

PubMed

Garrido-Martín, Diego; Pazos, Florencio

2018-02-27

The exponential accumulation of new sequences in public databases is expected to improve the performance of all the approaches for predicting protein structural and functional features. Nevertheless, this was never assessed or quantified for some widely used methodologies, such as those aimed at detecting functional sites and functional subfamilies in protein multiple sequence alignments. Using raw protein sequences as only input, these approaches can detect fully conserved positions, as well as those with a family-dependent conservation pattern. Both types of residues are routinely used as predictors of functional sites and, consequently, understanding how the sequence content of the databases affects them is relevant and timely. In this work we evaluate how the growth and change with time in the content of sequence databases affect five sequence-based approaches for detecting functional sites and subfamilies. We do that by recreating historical versions of the multiple sequence alignments that would have been obtained in the past based on the database contents at different time points, covering a period of 20 years. Applying the methods to these historical alignments allows quantifying the temporal variation in their performance. Our results show that the number of families to which these methods can be applied sharply increases with time, while their ability to detect potentially functional residues remains almost constant. These results are informative for the methods' developers and final users, and may have implications in the design of new sequencing initiatives.
Conserved Sequence Preferences Contribute to Substrate Recognition by the Proteasome*

PubMed Central

Yu, Houqing; Singh Gautam, Amit K.; Wilmington, Shameika R.; Wylie, Dennis; Martinez-Fonts, Kirby; Kago, Grace; Warburton, Marie; Chavali, Sreenivas; Inobe, Tomonao; Finkelstein, Ilya J.; Babu, M. Madan

2016-01-01

The proteasome has pronounced preferences for the amino acid sequence of its substrates at the site where it initiates degradation. Here, we report that modulating these sequences can tune the steady-state abundance of proteins over 2 orders of magnitude in cells. This is the same dynamic range as seen for inducing ubiquitination through a classic N-end rule degron. The stability and abundance of His3 constructs dictated by the initiation site affect survival of yeast cells and show that variation in proteasomal initiation can affect fitness. The proteasome's sequence preferences are linked directly to the affinity of the initiation sites to their receptor on the proteasome and are conserved between Saccharomyces cerevisiae, Schizosaccharomyces pombe, and human cells. These findings establish that the sequence composition of unstructured initiation sites influences protein abundance in vivo in an evolutionarily conserved manner and can affect phenotype and fitness. PMID:27226608
Inter-individual variation in expression: a missing link in biomarker biology?

PubMed

Little, Peter F R; Williams, Rohan B H; Wilkins, Marc R

2009-01-01

The past decade has seen an explosion of variation data demonstrating that diversity of both protein-coding sequences and of regulatory elements of protein-coding genes is common and of functional importance. In this article, we argue that genetic diversity can no longer be ignored in studies of human biology, even research projects without explicit genetic experimental design, and that this knowledge can, and must, inform research. By way of illustration, we focus on the potential role of genetic data in case-control studies to identify and validate cancer protein biomarkers. We argue that a consideration of genetics, in conjunction with proteomic biomarker discovery projects, should improve the proportion of biomarkers that can accurately classify patients.
A novel cry2Ab gene from the indigenous isolate Bacillus thuringiensis subsp. kurstaki.

PubMed

Sevim, Ali; Eryüzlü, Emine; Demirbağ, Zihni; Demir, Ismail

2012-01-01

A novel cry2Ab gene was cloned and sequenced from the indigenous isolate of Bacillus thuringiensis subsp. kurstaki. This gene was designated as cry2Ab25 and its sequence revealed an open reading frame of 1,902 bp encoding a 633 aa protein with calculated molecular mass of 70 kDa and pI value of 8.98. The amino acid sequence of the Cry2Ab25 protein was compared with previously known Cry2Ab toxins, and the phylogenetic relationships among them were determined. The deduced amino acid sequence of the Cry2Ab25 protein showed 99% homology to the known Cry2Ab proteins, except for Cry2Ab10 and Cry2Ab12 with 97% homology, and a variation in one amino acid residue in comparison with all known Cry2Ab proteins. The cry2Ab25 gene was expressed in Escherichia coli BL21(DE3) cells. Sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) revealed that the Cry2Ab25 protein is about 70 kDa. The toxin expressed in BL21(DE3) exhibited high toxicity against Malacosoma neustria and Rhagoletis cerasi with 73% and 75% mortality after 5 days of treatment, respectively.
Genetic variation of coat protein gene among the isolates of Rice tungro spherical virus from tungro-endemic states of the India.

PubMed

Mangrauthia, Satendra K; Malathi, P; Agarwal, Surekha; Ramkumar, G; Krishnaveni, D; Neeraja, C N; Madhav, M Sheshu; Ladhalakshmi, D; Balachandran, S M; Viraktamath, B C

2012-06-01

Rice tungro disease, one of the major constraints to rice production in South and Southeast Asia, is caused by a combination of two viruses: Rice tungro spherical virus (RTSV) and Rice tungro bacilliform virus (RTBV). The present study was undertaken to determine the genetic variation of RTSV population present in tungro endemic states of Indian subcontinent. Phylogenetic analysis based on coat protein sequences showed distinct divergence of Indian RTSV isolates into two groups; one consisted isolates from Hyderabad (Andhra Pradesh), Cuttack (Orissa), and Puducherry and another from West Bengal, Coimbatore (Tamil Nadu), and Kanyakumari (Tamil Nadu). The results obtained from phylogenetic study were further supported with the SNPs (single nucleotide polymorphism), INDELs (insertion and deletion) and evolutionary distance analysis. In addition, sequence difference count matrix revealed 2-68 nucleotides differences among all the Indian RTSV isolates taken in this study. However, at the protein level these differences were not significant as revealed by Ka/Ks ratio calculation. Sequence identity at nucleotide and amino acid level was 92-100% and 97-100%, respectively, among Indian isolates of RTSV. Understanding of the population structure of RTSV from tungro endemic regions of India would potentially provide insights into the molecular diversification of this virus.
Hierarchy and extremes in selections from pools of randomized proteins

PubMed Central

Boyer, Sébastien; Biswas, Dipanwita; Kumar Soshee, Ananda; Scaramozzino, Natale; Nizak, Clément; Rivoire, Olivier

2016-01-01

Variation and selection are the core principles of Darwinian evolution, but quantitatively relating the diversity of a population to its capacity to respond to selection is challenging. Here, we examine this problem at a molecular level in the context of populations of partially randomized proteins selected for binding to well-defined targets. We built several minimal protein libraries, screened them in vitro by phage display, and analyzed their response to selection by high-throughput sequencing. A statistical analysis of the results reveals two main findings. First, libraries with the same sequence diversity but built around different “frameworks” typically have vastly different responses; second, the distribution of responses of the best binders in a library follows a simple scaling law. We show how an elementary probabilistic model based on extreme value theory rationalizes the latter finding. Our results have implications for designing synthetic protein libraries, estimating the density of functional biomolecules in sequence space, characterizing diversity in natural populations, and experimentally investigating evolvability (i.e., the potential for future evolution). PMID:26969726
Hierarchy and extremes in selections from pools of randomized proteins.

PubMed

Boyer, Sébastien; Biswas, Dipanwita; Kumar Soshee, Ananda; Scaramozzino, Natale; Nizak, Clément; Rivoire, Olivier

2016-03-29

Variation and selection are the core principles of Darwinian evolution, but quantitatively relating the diversity of a population to its capacity to respond to selection is challenging. Here, we examine this problem at a molecular level in the context of populations of partially randomized proteins selected for binding to well-defined targets. We built several minimal protein libraries, screened them in vitro by phage display, and analyzed their response to selection by high-throughput sequencing. A statistical analysis of the results reveals two main findings. First, libraries with the same sequence diversity but built around different "frameworks" typically have vastly different responses; second, the distribution of responses of the best binders in a library follows a simple scaling law. We show how an elementary probabilistic model based on extreme value theory rationalizes the latter finding. Our results have implications for designing synthetic protein libraries, estimating the density of functional biomolecules in sequence space, characterizing diversity in natural populations, and experimentally investigating evolvability (i.e., the potential for future evolution).
Genome sequence variation in the constricta strain dramatically alters the protein interaction and localization map of Potato yellow dwarf virus

USDA-ARS?s Scientific Manuscript database

The genome sequence of the constricta strain of Potato yellow dwarf virus (CYDV) was determined to be 12,792 nucleotides long and organized into seven open reading frames with the gene order 3’-N-X-P-Y-M-G-L-5’, which encodes the nucleocapsid, phosphoprotein, movement, matrix, glycoprotein and RNA-d...
Sequence variations of the bovine prion protein gene (PRNP) in native Korean Hanwoo cattle

PubMed Central

Choi, Sangho

2012-01-01

Bovine spongiform encephalopathy (BSE) is one of the fatal neurodegenerative diseases known as transmissible spongiform encephalopathies (TSEs) caused by infectious prion proteins. Genetic variations correlated with susceptibility or resistance to TSE in humans and sheep have not been reported for bovine strains including those from Holstein, Jersey, and Japanese Black cattle. Here, we investigated bovine prion protein gene (PRNP) variations in Hanwoo cattle [Bos (B.) taurus coreanae], a native breed in Korea. We identified mutations and polymorphisms in the coding region of PRNP, determined their frequency, and evaluated their significance. We identified four synonymous polymorphisms and two non-synonymous mutations in PRNP, but found no novel polymorphisms. The sequence and number of octapeptide repeats were completely conserved, and the haplotype frequency of the coding region was similar to that of other B. taurus strains. When we examined the 23-bp and 12-bp insertion/deletion (indel) polymorphisms in the non-coding region of PRNP, Hanwoo cattle had a lower deletion allele and 23-bp del/12-bp del haplotype frequency than healthy and BSE-affected animals of other strains. Thus, Hanwoo are seemingly less susceptible to BSE than other strains due to the 23-bp and 12-bp indel polymorphisms. PMID:22705734
The evolutionary dynamics of variant antigen genes in Babesia reveal a history of genomic innovation underlying host–parasite interaction

PubMed Central

Jackson, Andrew P.; Otto, Thomas D.; Darby, Alistair; Ramaprasad, Abhinay; Xia, Dong; Echaide, Ignacio Eduardo; Farber, Marisa; Gahlot, Sunayna; Gamble, John; Gupta, Dinesh; Gupta, Yask; Jackson, Louise; Malandrin, Laurence; Malas, Tareq B.; Moussa, Ehab; Nair, Mridul; Reid, Adam J.; Sanders, Mandy; Sharma, Jyotsna; Tracey, Alan; Quail, Mike A.; Weir, William; Wastling, Jonathan M.; Hall, Neil; Willadsen, Peter; Lingelbach, Klaus; Shiels, Brian; Tait, Andy; Berriman, Matt; Allred, David R.; Pain, Arnab

2014-01-01

Babesia spp. are tick-borne, intraerythrocytic hemoparasites that use antigenic variation to resist host immunity, through sequential modification of the parasite-derived variant erythrocyte surface antigen (VESA) expressed on the infected red blood cell surface. We identified the genomic processes driving antigenic diversity in genes encoding VESA (ves1) through comparative analysis within and between three Babesia species, (B. bigemina, B. divergens and B. bovis). Ves1 structure diverges rapidly after speciation, notably through the evolution of shortened forms (ves2) from 5′ ends of canonical ves1 genes. Phylogenetic analyses show that ves1 genes are transposed between loci routinely, whereas ves2 genes are not. Similarly, analysis of sequence mosaicism shows that recombination drives variation in ves1 sequences, but less so for ves2, indicating the adoption of different mechanisms for variation of the two families. Proteomic analysis of the B. bigemina PR isolate shows that two dominant VESA1 proteins are expressed in the population, whereas numerous VESA2 proteins are co-expressed, consistent with differential transcriptional regulation of each family. Hence, VESA2 proteins are abundant and previously unrecognized elements of Babesia biology, with evolutionary dynamics consistently different to those of VESA1, suggesting that their functions are distinct. PMID:24799432
Specificity determinants for the abscisic acid response element.

PubMed

Sarkar, Aditya Kumar; Lahiri, Ansuman

2013-01-01

Abscisic acid (ABA) response elements (ABREs) are a group of cis-acting DNA elements that have been identified from promoter analysis of many ABA-regulated genes in plants. We are interested in understanding the mechanism of binding specificity between ABREs and a class of bZIP transcription factors known as ABRE binding factors (ABFs). In this work, we have modeled the homodimeric structure of the bZIP domain of ABRE binding factor 1 from Arabidopsis thaliana (AtABF1) and studied its interaction with ACGT core motif-containing ABRE sequences. We have also examined the variation in the stability of the protein-DNA complex upon mutating ABRE sequences using the protein design algorithm FoldX. The high throughput free energy calculations successfully predicted the ability of ABF1 to bind to alternative core motifs like GCGT or AAGT and also rationalized the role of the flanking sequences in determining the specificity of the protein-DNA interaction.
Dynamics of Agglutinin-Like Sequence (ALS) Protein Localization on the Surface of Candida Albicans

ERIC Educational Resources Information Center

Coleman, David Andrew

2009-01-01

The ALS gene family encodes large cell-surface glycoproteins associated with "C. albicans" pathogenesis. Als proteins are thought to act as adhesin molecules binding to host tissues. Wide variation in expression levels among the ALS genes exists and is related to cell morphology and environmental conditions. "ALS1," "ALS3," and "ALS4" are three of…
Influenza virus sequence feature variant type analysis: evidence of a role for NS1 in influenza virus host range restriction.

PubMed

Noronha, Jyothi M; Liu, Mengya; Squires, R Burke; Pickett, Brett E; Hale, Benjamin G; Air, Gillian M; Galloway, Summer E; Takimoto, Toru; Schmolke, Mirco; Hunt, Victoria; Klem, Edward; García-Sastre, Adolfo; McGee, Monnie; Scheuermann, Richard H

2012-05-01

Genetic drift of influenza virus genomic sequences occurs through the combined effects of sequence alterations introduced by a low-fidelity polymerase and the varying selective pressures experienced as the virus migrates through different host environments. While traditional phylogenetic analysis is useful in tracking the evolutionary heritage of these viruses, the specific genetic determinants that dictate important phenotypic characteristics are often difficult to discern within the complex genetic background arising through evolution. Here we describe a novel influenza virus sequence feature variant type (Flu-SFVT) approach, made available through the public Influenza Research Database resource (www.fludb.org), in which variant types (VTs) identified in defined influenza virus protein sequence features (SFs) are used for genotype-phenotype association studies. Since SFs have been defined for all influenza virus proteins based on known structural, functional, and immune epitope recognition properties, the Flu-SFVT approach allows the rapid identification of the molecular genetic determinants of important influenza virus characteristics and their connection to underlying biological functions. We demonstrate the use of the SFVT approach to obtain statistical evidence for effects of NS1 protein sequence variations in dictating influenza virus host range restriction.
SnipViz: a compact and lightweight web site widget for display and dissemination of multiple versions of gene and protein sequences.

PubMed

Jaschob, Daniel; Davis, Trisha N; Riffle, Michael

2014-07-23

As high throughput sequencing continues to grow more commonplace, the need to disseminate the resulting data via web applications continues to grow. Particularly, there is a need to disseminate multiple versions of related gene and protein sequences simultaneously--whether they represent alleles present in a single species, variations of the same gene among different strains, or homologs among separate species. Often this is accomplished by displaying all versions of the sequence at once in a manner that is not intuitive or space-efficient and does not facilitate human understanding of the data. Web-based applications needing to disseminate multiple versions of sequences would benefit from a drop-in module designed to effectively disseminate these data. SnipViz is a client-side software tool designed to disseminate multiple versions of related gene and protein sequences on web sites. SnipViz has a space-efficient, interactive, and dynamic interface for navigating, analyzing and visualizing sequence data. It is written using standard World Wide Web technologies (HTML, Javascript, and CSS) and is compatible with most web browsers. SnipViz is designed as a modular client-side web component and may be incorporated into virtually any web site and be implemented without any programming. SnipViz is a drop-in client-side module for web sites designed to efficiently visualize and disseminate gene and protein sequences. SnipViz is open source and is freely available at https://github.com/yeastrc/snipviz.
Spin models inferred from patient-derived viral sequence data faithfully describe HIV fitness landscapes

NASA Astrophysics Data System (ADS)

Shekhar, Karthik; Ruberman, Claire F.; Ferguson, Andrew L.; Barton, John P.; Kardar, Mehran; Chakraborty, Arup K.

2013-12-01

Mutational escape from vaccine-induced immune responses has thwarted the development of a successful vaccine against AIDS, whose causative agent is HIV, a highly mutable virus. Knowing the virus' fitness as a function of its proteomic sequence can enable rational design of potent vaccines, as this information can focus vaccine-induced immune responses to target mutational vulnerabilities of the virus. Spin models have been proposed as a means to infer intrinsic fitness landscapes of HIV proteins from patient-derived viral protein sequences. These sequences are the product of nonequilibrium viral evolution driven by patient-specific immune responses and are subject to phylogenetic constraints. How can such sequence data allow inference of intrinsic fitness landscapes? We combined computer simulations and variational theory á la Feynman to show that, in most circumstances, spin models inferred from patient-derived viral sequences reflect the correct rank order of the fitness of mutant viral strains. Our findings are relevant for diverse viruses.
Detection of nucleic acids by multiple sequential invasive cleavages

DOEpatents

Hall, Jeff G.; Lyamichev, Victor I.; Mast, Andrea L.; Brow, Mary Ann D.

1999-01-01

The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based on charge. The present invention also provides methods for the detection of non-target cleavage products via the formation of a complete and activated protein binding region. The invention further provides sensitive and specific methods for the detection of human cytomegalovirus nucleic acid in a sample.
Nucleic acid detection kits

DOEpatents

Hall, Jeff G.; Lyamichev, Victor I.; Mast, Andrea L.; Brow, Mary Ann; Kwiatkowski, Robert W.; Vavra, Stephanie H.

2005-03-29

The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based on charge. The present invention also provides methods for the detection of non-target cleavage products via the formation of a complete and activated protein binding region. The invention further provides sensitive and specific methods for the detection of nucleic acid from various viruses in a sample.
Detection of nucleic acids by multiple sequential invasive cleavages 02

DOEpatents

Hall, Jeff G.; Lyamichev, Victor I.; Mast, Andrea L.; Brow, Mary Ann D.

2002-01-01

The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based on charge. The present invention also provides methods for the detection of non-target cleavage products via the formation of a complete and activated protein binding region. The invention further provides sensitive and specific methods for the detection of human cytomegalovirus nucleic acid in a sample.
Detection of nucleic acids by multiple sequential invasive cleavages

DOEpatents

Hall, Jeff G; Lyamichev, Victor I; Mast, Andrea L; Brow, Mary Ann D

2012-10-16

The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based on charge. The present invention also provides methods for the detection of non-target cleavage products via the formation of a complete and activated protein binding region. The invention further provides sensitive and specific methods for the detection of human cytomegalovirus nucleic acid in a sample.
The maize stripe virus major noncapsid protein messenger RNA transcripts contain heterogeneous leader sequences at their 5' termini.

PubMed

Huiet, L; Feldstein, P A; Tsai, J H; Falk, B W

1993-12-01

Primer extension analyses and a PCR-based cloning strategy were used to identify and characterize 5' nucleotide sequences on the maize stripe virus (MStV) RNA4 mRNA transcripts encoding the major noncapsid protein (NCP). Direct RNA sequence analysis by primer extension showed that the NCP mRNA transcripts had 10-15 nucleotides beyond the 5' terminus of the MStV RNA4 nucleotide sequence. MStV genomic RNAs isolated from ribonucleoprotein particles (RNPs) lacked the additional 5' nucleotides. cDNA clones representing the 5' region of the mRNA transcripts were constructed, and the nucleotide sequences of the 5' regions were determined for 16 clones. Each was found to have a distinct 10-15 nucleotide sequence immediately 5' of the MStV RNA4 sequence. Eleven of 16 clones had the correct MStV RNA4 5' nucleotide sequence, while five showed minor variations at or near the 5' most MStV RNA4 nucleotide. These characteristics show strong similarities to other viral mRNA transcripts which are synthesized by cap snatching.

Tertiary alphabet for the observable protein structural universe.

PubMed

Mackenzie, Craig O; Zhou, Jianfu; Grigoryan, Gevorg

2016-11-22

Here, we systematically decompose the known protein structural universe into its basic elements, which we dub tertiary structural motifs (TERMs). A TERM is a compact backbone fragment that captures the secondary, tertiary, and quaternary environments around a given residue, comprising one or more disjoint segments (three on average). We seek the set of universal TERMs that capture all structure in the Protein Data Bank (PDB), finding remarkable degeneracy. Only ∼600 TERMs are sufficient to describe 50% of the PDB at sub-Angstrom resolution. However, more rare geometries also exist, and the overall structural coverage grows logarithmically with the number of TERMs. We go on to show that universal TERMs provide an effective mapping between sequence and structure. We demonstrate that TERM-based statistics alone are sufficient to recapitulate close-to-native sequences given either NMR or X-ray backbones. Furthermore, sequence variability predicted from TERM data agrees closely with evolutionary variation. Finally, locations of TERMs in protein chains can be predicted from sequence alone based on sequence signatures emergent from TERM instances in the PDB. For multisegment motifs, this method identifies spatially adjacent fragments that are not contiguous in sequence-a major bottleneck in structure prediction. Although all TERMs recur in diverse proteins, some appear specialized for certain functions, such as interface formation, metal coordination, or even water binding. Structural biology has benefited greatly from previously observed degeneracies in structure. The decomposition of the known structural universe into a finite set of compact TERMs offers exciting opportunities toward better understanding, design, and prediction of protein structure.
Genomic Analysis of Storage Protein Deficiency in Genetically Related Lines of Common Bean (Phaseolus vulgaris)

PubMed Central

Pandurangan, Sudhakar; Diapari, Marwan; Yin, Fuqiang; Munholland, Seth; Perry, Gregory E.; Chapman, B. Patrick; Huang, Shangzhi; Sparvoli, Francesca; Bollini, Roberto; Crosby, William L.; Pauls, Karl P.; Marsolais, Frédéric

2016-01-01

A series of genetically related lines of common bean (Phaseolus vulgaris L.) integrate a progressive deficiency in major storage proteins, the 7S globulin phaseolin and lectins. SARC1 integrates a lectin-like protein, arcelin-1 from a wild common bean accession. SMARC1N-PN1 is deficient in major lectins, including erythroagglutinating phytohemagglutinin (PHA-E) but not α-amylase inhibitor, and incorporates also a deficiency in phaseolin. SMARC1-PN1 is intermediate and shares the phaseolin deficiency. Sanilac is the parental background. To understand the genomic basis for variations in protein profiles previously determined by proteomics, the genotypes were submitted to short-fragment genome sequencing using an Illumina HiSeq 2000/2500 platform. Reads were aligned to reference sequences and subjected to de novo assembly. The results of the analyses identified polymorphisms responsible for the lack of specific storage proteins, as well as those associated with large differences in storage protein expression. SMARC1N-PN1 lacks the lectin genes pha-E and lec4-B17, and has the pseudogene pdlec1 in place of the functional pha-L gene. While the α-phaseolin gene appears absent, an approximately 20-fold decrease in β-phaseolin accumulation is associated with a single nucleotide polymorphism converting a G-box to an ACGT motif in the proximal promoter. Among residual lectins compensating for storage protein deficiency, mannose lectin FRIL and α-amylase inhibitor 1 genes are uniquely present in SMARC1N-PN1. An approximately 50-fold increase in α-amylase inhibitor like protein accumulation is associated with multiple polymorphisms introducing up to eight potential positive cis-regulatory elements in the proximal promoter specific to SMARC1N-PN1. An approximately 7-fold increase in accumulation of 11S globulin legumin is not associated with variation in proximal promoter sequence, suggesting that the identity of individual proteins involved in proteome rebalancing might also be determined at the translational level. PMID:27066039
Variation analysis of the severe acute respiratory syndrome coronavirus putative non-structural protein 2 gene and construction of three-dimensional model.

PubMed

Lu, Jia-hai; Zhang, Ding-mei; Wang, Guo-ling; Guo, Zhong-min; Zhang, Chuan-hai; Tan, Bing-yan; Ouyang, Li-ping; Lin, Li; Liu, Yi-min; Chen, Wei-qing; Ling, Wen-hua; Yu, Xin-bing; Zhong, Nan-shan

2005-05-05

The rapid transmission and high mortality rate made severe acute respiratory syndrome (SARS) a global threat for which no efficacious therapy is available now. Without sufficient knowledge about the SARS coronavirus (SARS-CoV), it is impossible to define the candidate for the anti-SARS targets. The putative non-structural protein 2 (nsp2) (3CL(pro), following the nomenclature by Gao et al, also known as nsp5 in Snidjer et al) of SARS-CoV plays an important role in viral transcription and replication, and is an attractive target for anti-SARS drug development, so we carried on this study to have an insight into putative polymerase nsp2 of SARS-CoV Guangdong (GD) strain. The SARS-CoV strain was isolated from a SARS patient in Guangdong, China, and cultured in Vero E6 cells. The nsp2 gene was amplified by reverse transcription-polymerase chain reaction (RT-PCR) and cloned into eukaryotic expression vector pCI-neo (pCI-neo/nsp2). Then the recombinant eukaryotic expression vector pCI-neo/nsp2 was transfected into COS-7 cells using lipofectin reagent to express the nsp2 protein. The expressive protein of SARS-CoV nsp2 was analyzed by 7% sodium dodecylsulfate polyacrylamide gel electrophoresis (SDS-PAGE). The nucleotide sequence and protein sequence of GD nsp2 were compared with that of other SARS-CoV strains by nucleotide-nucleotide basic local alignment search tool (BLASTN) and protein-protein basic local alignment search tool (BLASTP) to investigate its variance trend during the transmission. The secondary structure of GD strain and that of other strains were predicted by Garnier-Osguthorpe-Robson (GOR) Secondary Structure Prediction. Three-dimensional-PSSM Protein Fold Recognition (Threading) Server was employed to construct the three-dimensional model of the nsp2 protein. The putative polymerase nsp2 gene of GD strain was amplified by RT-PCR. The eukaryotic expression vector (pCI-neo/nsp2) was constructed and expressed the protein in COS-7 cells successfully. The result of sequencing and sequence comparison with other SARS-CoV strains showed that nsp2 gene was relatively conservative during the transmission and total five base sites mutated in about 100 strains investigated, three of which in the early and middle phases caused synonymous mutation, and another two base sites variation in the late phase resulted in the amino acid substitutions and secondary structure changes. The three-dimensional structure of the nsp2 protein was successfully constructed. The results suggest that polymerase nsp2 is relatively stable during the phase of epidemic. The amino acid and secondary structure change may be important for viral infection. The fact that majority of single nucleotide variations (SNVs) are predicted to cause synonymous, as well as the result of low mutation rate of nsp2 gene in the epidemic variations, indicates that the nsp2 is conservative and could be a target for anti-SARS drugs. The three-dimensional structure result indicates that the nsp2 protein of GD strain is high homologous with 3CL(pro) of SARS-CoV urbani strain, 3CL(pro) of transmissible gastroenteritis virus and 3CL(pro) of human coronavirus 229E strain, which further suggests that nsp2 protein of GD strain possesses the activity of 3CL(pro).
Identification of a specific intronic PEAR1 gene variant associated with greater platelet aggregability and protein expression

PubMed Central

Yanek, Lisa R.; Yang, Xiao Ping; Mathias, Rasika; Herrera-Galeano, J. Enrique; Suktitipat, Bhoom; Qayyum, Rehan; Johnson, Andrew D.; Chen, Ming-Huei; Tofler, Geoffrey H.; Ruczinski, Ingo; Friedman, Alan D.; Gylfason, Arnaldur; Thorsteinsdottir, Unnur; Bray, Paul F.; O'Donnell, Christopher J.; Becker, Diane M.; Becker, Lewis C.

2011-01-01

Genetic variation is thought to contribute to variability in platelet function; however, the specific variants and mechanisms that contribute to altered platelet function are poorly defined. With the use of a combination of fine mapping and sequencing of the platelet endothelial aggregation receptor 1 (PEAR1) gene we identified a common variant (rs12041331) in intron 1 that accounts for ≤ 15% of total phenotypic variation in platelet function. Association findings were robust in 1241 persons of European ancestry (P = 2.22 × 10−8) and were replicated down to the variant and nucleotide level in 835 persons of African ancestry (P = 2.31 × 10−27) and in an independent sample of 2755 persons of European descent (P = 1.64 × 10−5). Sequencing confirmed that variation at rs12041331 accounted most strongly (P = 2.07 × 10−6) for the relation between the PEAR1 gene and platelet function phenotype. A dose-response relation between the number of G alleles at rs12041331 and expression of PEAR1 protein in human platelets was confirmed by Western blotting and ELISA. Similarly, the G allele was associated with greater protein expression in a luciferase reporter assay. These experiments identify the precise genetic variant in PEAR1 associated with altered platelet function and provide a plausible biologic mechanism to explain the association between variation in the PEAR1 gene and platelet function phenotype. PMID:21791418
Targeted Quantitation of Proteins by Mass Spectrometry

PubMed Central

2013-01-01

Quantitative measurement of proteins is one of the most fundamental analytical tasks in a biochemistry laboratory, but widely used immunochemical methods often have limited specificity and high measurement variation. In this review, we discuss applications of multiple-reaction monitoring (MRM) mass spectrometry, which allows sensitive, precise quantitative analyses of peptides and the proteins from which they are derived. Systematic development of MRM assays is permitted by databases of peptide mass spectra and sequences, software tools for analysis design and data analysis, and rapid evolution of tandem mass spectrometer technology. Key advantages of MRM assays are the ability to target specific peptide sequences, including variants and modified forms, and the capacity for multiplexing that allows analysis of dozens to hundreds of peptides. Different quantitative standardization methods provide options that balance precision, sensitivity, and assay cost. Targeted protein quantitation by MRM and related mass spectrometry methods can advance biochemistry by transforming approaches to protein measurement. PMID:23517332
Targeted quantitation of proteins by mass spectrometry.

PubMed

Liebler, Daniel C; Zimmerman, Lisa J

2013-06-04

Quantitative measurement of proteins is one of the most fundamental analytical tasks in a biochemistry laboratory, but widely used immunochemical methods often have limited specificity and high measurement variation. In this review, we discuss applications of multiple-reaction monitoring (MRM) mass spectrometry, which allows sensitive, precise quantitative analyses of peptides and the proteins from which they are derived. Systematic development of MRM assays is permitted by databases of peptide mass spectra and sequences, software tools for analysis design and data analysis, and rapid evolution of tandem mass spectrometer technology. Key advantages of MRM assays are the ability to target specific peptide sequences, including variants and modified forms, and the capacity for multiplexing that allows analysis of dozens to hundreds of peptides. Different quantitative standardization methods provide options that balance precision, sensitivity, and assay cost. Targeted protein quantitation by MRM and related mass spectrometry methods can advance biochemistry by transforming approaches to protein measurement.
Structural details (kinks and non-α conformations) in transmembrane helices are intrahelically determined and can be predicted by sequence pattern descriptors

PubMed Central

Rigoutsos, Isidore; Riek, Peter; Graham, Robert M.; Novotny, Jiri

2003-01-01

One of the promising methods of protein structure prediction involves the use of amino acid sequence-derived patterns. Here we report on the creation of non-degenerate motif descriptors derived through data mining of training sets of residues taken from the transmembrane-spanning segments of polytopic proteins. These residues correspond to short regions in which there is a deviation from the regular α-helical character (i.e. π-helices, 310-helices and kinks). A ‘search engine’ derived from these motif descriptors correctly identifies, and discriminates amongst instances of the above ‘non-canonical’ helical motifs contained in the SwissProt/TrEMBL database of protein primary structures. Our results suggest that deviations from α-helicity are encoded locally in sequence patterns only about 7–9 residues long and can be determined in silico directly from the amino acid sequence. Delineation of such variations in helical habit is critical to understanding the complex structure–function relationships of polytopic proteins and for drug discovery. The success of our current methodology foretells development of similar prediction tools capable of identifying other structural motifs from sequence alone. The method described here has been implemented and is available on the World Wide Web at http://cbcsrv.watson.ibm.com/Ttkw.html. PMID:12888523
Structural details (kinks and non-alpha conformations) in transmembrane helices are intrahelically determined and can be predicted by sequence pattern descriptors.

PubMed

Rigoutsos, Isidore; Riek, Peter; Graham, Robert M; Novotny, Jiri

2003-08-01

One of the promising methods of protein structure prediction involves the use of amino acid sequence-derived patterns. Here we report on the creation of non-degenerate motif descriptors derived through data mining of training sets of residues taken from the transmembrane-spanning segments of polytopic proteins. These residues correspond to short regions in which there is a deviation from the regular alpha-helical character (i.e. pi-helices, 3(10)-helices and kinks). A 'search engine' derived from these motif descriptors correctly identifies, and discriminates amongst instances of the above 'non-canonical' helical motifs contained in the SwissProt/TrEMBL database of protein primary structures. Our results suggest that deviations from alpha-helicity are encoded locally in sequence patterns only about 7-9 residues long and can be determined in silico directly from the amino acid sequence. Delineation of such variations in helical habit is critical to understanding the complex structure-function relationships of polytopic proteins and for drug discovery. The success of our current methodology foretells development of similar prediction tools capable of identifying other structural motifs from sequence alone. The method described here has been implemented and is available on the World Wide Web at http://cbcsrv.watson.ibm.com/Ttkw.html.
Identification and characterization of transcript polymorphisms in soybean lines varying in oil composition and content.

PubMed

Goettel, Wolfgang; Xia, Eric; Upchurch, Robert; Wang, Ming-Li; Chen, Pengyin; An, Yong-Qiang Charles

2014-04-23

Variation in seed oil composition and content among soybean varieties is largely attributed to differences in transcript sequences and/or transcript accumulation of oil production related genes in seeds. Discovery and analysis of sequence and expression variations in these genes will accelerate soybean oil quality improvement. In an effort to identify these variations, we sequenced the transcriptomes of soybean seeds from nine lines varying in oil composition and/or total oil content. Our results showed that 69,338 distinct transcripts from 32,885 annotated genes were expressed in seeds. A total of 8,037 transcript expression polymorphisms and 50,485 transcript sequence polymorphisms (48,792 SNPs and 1,693 small Indels) were identified among the lines. Effects of the transcript polymorphisms on their encoded protein sequences and functions were predicted. The studies also provided independent evidence that the lack of FAD2-1A gene activity and a non-synonymous SNP in the coding sequence of FAB2C caused elevated oleic acid and stearic acid levels in soybean lines M23 and FAM94-41, respectively. As a proof-of-concept, we developed an integrated RNA-seq and bioinformatics approach to identify and functionally annotate transcript polymorphisms, and demonstrated its high effectiveness for discovery of genetic and transcript variations that result in altered oil quality traits. The collection of transcript polymorphisms coupled with their predicted functional effects will be a valuable asset for further discovery of genes, gene variants, and functional markers to improve soybean oil quality.
A Novel Locomotion-based Validation Assay for Candidate Drugs Using Drosophila DYT1 Disease Model

DTIC Science & Technology

2013-11-01

the genome using the same parental fly line, minimizing the effect of surrounding sequences and genetic variations on the ...locomotion and GTPC cyclrohydolase protein levels; (3) supplementation of dopamine can partially rescue the locomotion defects of Drosophila larvae...8217- GCGAACAACCAAAAAATCATTGAGATAATAAACTCCTCCATTAG-3’) to make dtorsin cDNA that lacks GAC (D307) (Fig. 1) respectively. After confirming mutated sequences , the insert was again
Insights into the Performance of SD Bioline Malaria Ag P.f/Pan Rapid Diagnostic Test and Plasmodium falciparum Histidine-Rich Protein 2 Gene Variation in Madagascar.

PubMed

Willie, Nigani; Mehlotra, Rajeev K; Howes, Rosalind E; Rakotomanga, Tovonahary A; Ramboarina, Stephanie; Ratsimbasoa, Arsène C; Zimmerman, Peter A

2018-06-01

Plasmodium falciparum histidine-rich protein 2 (PfHRP2) forms the basis of many current malaria rapid diagnostic tests (RDTs). However, the parasites lacking part or all of the pfhrp2 gene do not express the PfHRP2 protein and are, therefore, not identifiable by PfHRP2-detecting RDTs. We evaluated the performance of the SD Bioline Malaria Ag P.f/Pan RDT together with pfhrp2 variation in Madagascar. Genomic DNA isolated from 260 patient blood samples were polymerase chain reaction (PCR)-amplified for the parasite 18S rRNA and pfhrp2 genes. Post-PCR ligation detection reaction-fluorescent microsphere assay (LDR-FMA) was performed for the identification of parasite species. Plasmodium falciparum histidine-rich protein 2 amplicons were sequenced. Polymerase chain reaction diagnosis of patient samples showed that 29% (75/260) were infected and P. falciparum was present in 95% (71/75) of these PCR-positive samples. Comparing RDT and P. falciparum detection by LDR-FMA, eight samples were RDT negative but P. falciparum positive (false negatives), all of which were pfhrp2 positive. The sensitivity and specificity of the RDT were 87% and 90%, respectively. Seventy-three samples were amplified for pfhrp2 , from which nine randomly selected amplicons were sequenced, yielding 13 sequences. Amplification of pfhrp2 , combined with RDT analysis and P. falciparum detection by LDR-FMA, showed that there was no indication of pfhrp2 deletion. Sequence analysis of pfhrp2 showed that the correlation between pfhrp2 sequence structure and RDT detection rates was unclear. Although the observed absence of pfhrp2 deletion from the samples screened here is encouraging, continued monitoring of the efficacy of the SD Bioline Malaria Ag P.f/Pan RDT for malaria diagnosis in Madagascar is warranted.
Update on Genomic Databases and Resources at the National Center for Biotechnology Information.

PubMed

Tatusova, Tatiana

2016-01-01

The National Center for Biotechnology Information (NCBI), as a primary public repository of genomic sequence data, collects and maintains enormous amounts of heterogeneous data. Data for genomes, genes, gene expressions, gene variation, gene families, proteins, and protein domains are integrated with the analytical, search, and retrieval resources through the NCBI website, text-based search and retrieval system, provides a fast and easy way to navigate across diverse biological databases.Comparative genome analysis tools lead to further understanding of evolution processes quickening the pace of discovery. Recent technological innovations have ignited an explosion in genome sequencing that has fundamentally changed our understanding of the biology of living organisms. This huge increase in DNA sequence data presents new challenges for the information management system and the visualization tools. New strategies have been designed to bring an order to this genome sequence shockwave and improve the usability of associated data.
Species Identification of Bovine, Ovine and Porcine Type 1 Collagen; Comparing Peptide Mass Fingerprinting and LC-Based Proteomics Methods.

PubMed

Buckley, Mike

2016-03-24

Collagen is one of the most ubiquitous proteins in the animal kingdom and the dominant protein in extracellular tissues such as bone, skin and other connective tissues in which it acts primarily as a supporting scaffold. It has been widely investigated scientifically, not only as a biomedical material for regenerative medicine, but also for its role as a food source for both humans and livestock. Due to the long-term stability of collagen, as well as its abundance in bone, it has been proposed as a source of biomarkers for species identification not only for heat- and pressure-rendered animal feed but also in ancient archaeological and palaeontological specimens, typically carried out by peptide mass fingerprinting (PMF) as well as in-depth liquid chromatography (LC)-based tandem mass spectrometric methods. Through the analysis of the three most common domesticates species, cow, sheep, and pig, this research investigates the advantages of each approach over the other, investigating sites of sequence variation with known functional properties of the collagen molecule. Results indicate that the previously identified species biomarkers through PMF analysis are not among the most variable type 1 collagen peptides present in these tissues, the latter of which can be detected by LC-based methods. However, it is clear that the highly repetitive sequence motif of collagen throughout the molecule, combined with the variability of the sites and relative abundance levels of hydroxylation, can result in high scoring false positive peptide matches using these LC-based methods. Additionally, the greater alpha 2(I) chain sequence variation, in comparison to the alpha 1(I) chain, did not appear to be specific to any particular functional properties, implying that intra-chain functional constraints on sequence variation are not as great as inter-chain constraints. However, although some of the most variable peptides were only observed in LC-based methods, until the range of publicly available collagen sequences improves, the simplicity of the PMF approach and suitable range of peptide sequence variation observed makes it the ideal method for initial taxonomic identification prior to further analysis by LC-based methods only when required.
Hermes Transposon Distribution and Structure in Musca domestica

PubMed Central

Subramanian, Ramanand A.; Cathcart, Laura A.; Krafsur, Elliot S.; Atkinson, Peter W.

2009-01-01

Hermes are hAT transposons from Musca domestica that are very closely related to the hobo transposons from Drosophila melanogaster and are useful as gene vectors in a wide variety of organisms including insects, planaria, and yeast. hobo elements show distinct length variations in a rapidly evolving region of the transposase-coding region as a result of expansions and contractions of a simple repeat sequence encoding 3 amino acids threonine, proline, and glutamic acid (TPE). These variations in length may influence the function of the protein and the movement of hobo transposons in natural populations. Here, we determine the distribution of Hermes in populations of M. domestica as well as whether Hermes transposase has undergone similar sequence expansions and contractions during its evolution in this species. Hermes transposons were found in all M. domestica individuals sampled from 14 populations collected from 4 continents. All individuals with Hermes transposons had evidence for the presence of intact transposase open reading frames, and little sequence variation was observed among Hermes elements. A systematic analysis of the TPE-homologous region of the Hermes transposase-coding region revealed no evidence for length variation. The simple sequence repeat found in hobo elements is a feature of this transposon that evolved since the divergence of hobo and Hermes. PMID:19366812
Differential sequence diversity at merozoite surface protein-1 locus of Plasmodium knowlesi from humans and macaques in Thailand.

PubMed

Putaporntip, Chaturong; Thongaree, Siriporn; Jongwutiwes, Somchai

2013-08-01

To determine the genetic diversity and potential transmission routes of Plasmodium knowlesi, we analyzed the complete nucleotide sequence of the gene encoding the merozoite surface protein-1 of this simian malaria (Pkmsp-1), an asexual blood-stage vaccine candidate, from naturally infected humans and macaques in Thailand. Analysis of Pkmsp-1 sequences from humans (n=12) and monkeys (n=12) reveals five conserved and four variable domains. Most nucleotide substitutions in conserved domains were dimorphic whereas three of four variable domains contained complex repeats with extensive sequence and size variation. Besides purifying selection in conserved domains, evidence of intragenic recombination scattering across Pkmsp-1 was detected. The number of haplotypes, haplotype diversity, nucleotide diversity and recombination sites of human-derived sequences exceeded that of monkey-derived sequences. Phylogenetic networks based on concatenated conserved sequences of Pkmsp-1 displayed a character pattern that could have arisen from sampling process or the presence of two independent routes of P. knowlesi transmission, i.e. from macaques to human and from human to humans in Thailand. Copyright © 2013 Elsevier B.V. All rights reserved.
Strong positive selection and recombination drive the antigenic variation of the PilE protein of the human pathogen Neisseria meningitidis.

PubMed

Andrews, T Daniel; Gojobori, Takashi

2004-01-01

The PilE protein is the major component of the Neisseria meningitidis pilus, which is encoded by the pilE/pilS locus that includes an expressed gene and eight homologous silent fragments. The silent gene fragments have been shown to recombine through gene conversion with the expressed gene and thereby provide a means by which novel antigenic variants of the PilE protein can be generated. We have analyzed the evolutionary rate of the pilE gene using the nucleotide sequence of two complete pilE/pilS loci. The very high rate of evolution displayed by the PilE protein appears driven by both recombination and positive selection. Within the semivariable region of the pilE and pilS genes, recombination appears to occur within multiple small sequence blocks that lie between conserved sequence elements. Within the hypervariable region, positive selection was identified from comparison of the silent and expressed genes. The unusual gene conversion mechanism that operates at the pilE/pilS locus is a strategy employed by N. meningitidis to enhance mutation of certain regions of the PilE protein. The silent copies of the gene effectively allow "parallelized" evolution of pilE, thus enabling the encoded protein to rapidly explore a large area of sequence space in an effort to find novel antigenic variants.
Natural Variation of Epstein-Barr Virus Genes, Proteins, and Primary MicroRNA.

PubMed

Correia, Samantha; Palser, Anne; Elgueta Karstegl, Claudio; Middeldorp, Jaap M; Ramayanti, Octavia; Cohen, Jeffrey I; Hildesheim, Allan; Fellner, Maria Dolores; Wiels, Joelle; White, Robert E; Kellam, Paul; Farrell, Paul J

2017-08-01

Viral gene sequences from an enlarged set of about 200 Epstein-Barr virus (EBV) strains, including many primary isolates, have been used to investigate variation in key viral genetic regions, particularly LMP1, Zp, gp350, EBNA1, and the BART microRNA (miRNA) cluster 2. Determination of type 1 and type 2 EBV in saliva samples from people from a wide range of geographic and ethnic backgrounds demonstrates a small percentage of healthy white Caucasian British people carrying predominantly type 2 EBV. Linkage of Zp and gp350 variants to type 2 EBV is likely to be due to their genes being adjacent to the EBNA3 locus, which is one of the major determinants of the type 1/type 2 distinction. A novel classification of EBNA1 DNA binding domains, named QCIGP, results from phylogeny analysis of their protein sequences but is not linked to the type 1/type 2 classification. The BART cluster 2 miRNA region is classified into three major variants through single-nucleotide polymorphisms (SNPs) in the primary miRNA outside the mature miRNA sequences. These SNPs can result in altered levels of expression of some miRNAs from the BART variant frequently present in Chinese and Indonesian nasopharyngeal carcinoma (NPC) samples. The EBV genetic variants identified here provide a basis for future, more directed analysis of association of specific EBV variations with EBV biology and EBV-associated diseases. IMPORTANCE Incidence of diseases associated with EBV varies greatly in different parts of the world. Thus, relationships between EBV genome sequence variation and health, disease, geography, and ethnicity of the host may be important for understanding the role of EBV in diseases and for development of an effective EBV vaccine. This paper provides the most comprehensive analysis so far of variation in specific EBV genes relevant to these diseases and proposed EBV vaccines. By focusing on variation in LMP1, Zp, gp350, EBNA1, and the BART miRNA cluster 2, new relationships with the known type 1/type 2 strains are demonstrated, and a novel classification of EBNA1 and the BART miRNAs is proposed. Copyright © 2017 Correia et al.
Group A Human Rotavirus Genomics: Evidence that Gene Constellations Are Influenced by Viral Protein Interactions▿ †

PubMed Central

Heiman, Erica M.; McDonald, Sarah M.; Barro, Mario; Taraporewala, Zenobia F.; Bar-Magen, Tamara; Patton, John T.

2008-01-01

Group A human rotaviruses (HRVs) are the major cause of severe viral gastroenteritis in infants and young children. To gain insight into the level of genetic variation among HRVs, we determined the genome sequences for 10 strains belonging to different VP7 serotypes (G types). The HRVs chosen for this study, D, DS-1, P, ST3, IAL28, Se584, 69M, WI61, A64, and L26, were isolated from infected persons and adapted to cell culture to use as serotype references. Our sequencing results revealed that most of the individual proteins from each HRV belong to one of three genotypes (1, 2, or 3) based on their similarities to proteins of genogroup strains (Wa, DS-1, or AU-1, respectively). Strains D, P, ST3, IAL28, and WI61 encode genotype 1 (Wa-like) proteins, whereas strains DS-1 and 69M encode genotype 2 (DS-1-like) proteins. Of the 10 HRVs sequenced, 3 of them (Se584, A64, and L26) encode proteins belonging to more than one genotype, indicating that they are intergenogroup reassortants. We used amino acid sequence alignments to identify residues that distinguish proteins belonging to HRV genotype 1, 2, or 3. These genotype-specific changes cluster in definitive regions within each viral protein, many of which are sites of known protein-protein interactions. For the intermediate viral capsid protein (VP6), the changes map onto the atomic structure at the VP2-VP6, VP4-VP6, and VP7-VP6 interfaces. The results of this study provide evidence that group A HRV gene constellations exist and may be influenced by interactions among viral proteins during replication. PMID:18786998
Predicting the binding preference of transcription factors to individual DNA k-mers.

PubMed

Alleyne, Trevis M; Peña-Castillo, Lourdes; Badis, Gwenael; Talukder, Shaheynoor; Berger, Michael F; Gehrke, Andrew R; Philippakis, Anthony A; Bulyk, Martha L; Morris, Quaid D; Hughes, Timothy R

2009-04-15

Recognition of specific DNA sequences is a central mechanism by which transcription factors (TFs) control gene expression. Many TF-binding preferences, however, are unknown or poorly characterized, in part due to the difficulty associated with determining their specificity experimentally, and an incomplete understanding of the mechanisms governing sequence specificity. New techniques that estimate the affinity of TFs to all possible k-mers provide a new opportunity to study DNA-protein interaction mechanisms, and may facilitate inference of binding preferences for members of a given TF family when such information is available for other family members. We employed a new dataset consisting of the relative preferences of mouse homeodomains for all eight-base DNA sequences in order to ask how well we can predict the binding profiles of homeodomains when only their protein sequences are given. We evaluated a panel of standard statistical inference techniques, as well as variations of the protein features considered. Nearest neighbour among functionally important residues emerged among the most effective methods. Our results underscore the complexity of TF-DNA recognition, and suggest a rational approach for future analyses of TF families.
Genome-Wide Analysis of ZmDREB Genes and Their Association with Natural Variation in Drought Tolerance at Seedling Stage of Zea mays L

PubMed Central

Wang, Hongwei; Xin, Haibo; Yang, Xiaohong; Yan, Jianbing; Li, Jiansheng; Tran, Lam-Son Phan; Shinozaki, Kazuo; Yamaguchi-Shinozaki, Kazuko; Qin, Feng

2013-01-01

The worldwide production of maize (Zea mays L.) is frequently impacted by water scarcity and as a result, increased drought tolerance is a priority target in maize breeding programs. While DREB transcription factors have been demonstrated to play a central role in desiccation tolerance, whether or not natural sequence variations in these genes are associated with the phenotypic variability of this trait is largely unknown. In the present study, eighteen ZmDREB genes present in the maize B73 genome were cloned and systematically analyzed to determine their phylogenetic relationship, synteny with rice, maize and sorghum genomes; pattern of drought-responsive gene expression, and protein transactivation activity. Importantly, the association between the nucleic acid variation of each ZmDREB gene with drought tolerance was evaluated using a diverse population of maize consisting of 368 varieties from tropical and temperate regions. A significant association between the genetic variation of ZmDREB2.7 and drought tolerance at seedling stage was identified. Further analysis found that the DNA polymorphisms in the promoter region of ZmDREB2.7, but not the protein coding region itself, was associated with different levels of drought tolerance among maize varieties, likely due to distinct patterns of gene expression in response to drought stress. In vitro, protein-DNA binding assay demonstrated that ZmDREB2.7 protein could specifically interact with the target DNA sequences. The transgenic Arabidopsis overexpressing ZmDREB2.7 displayed enhanced tolerance to drought stress. Moreover, a favorable allele of ZmDREB2.7, identified in the drought-tolerant maize varieties, was effective in imparting plant tolerance to drought stress. Based upon these findings, we conclude that natural variation in the promoter of ZmDREB2.7 contributes to maize drought tolerance, and that the gene and its favorable allele may be an important genetic resource for the genetic improvement of drought tolerance in maize. PMID:24086146

Comprehensive global amino acid sequence analysis of PB1F2 protein of influenza A H5N1 viruses and the influenza A virus subtypes responsible for the 20th-century pandemics.

PubMed

Pasricha, Gunisha; Mishra, Akhilesh C; Chakrabarti, Alok K

2013-07-01

PB1F2 is the 11th protein of influenza A virus translated from +1 alternate reading frame of PB1 gene. Since the discovery, varying sizes and functions of the PB1F2 protein of influenza A viruses have been reported. Selection of PB1 gene segment in the pandemics, variable size and pleiotropic effect of PB1F2 intrigued us to analyze amino acid sequences of this protein in various influenza A viruses. Amino acid sequences for PB1F2 protein of influenza A H5N1, H1N1, H2N2, and H3N2 subtypes were obtained from Influenza Research Database. Multiple sequence alignments of the PB1F2 protein sequences of the aforementioned subtypes were used to determine the size, variable and conserved domains and to perform mutational analysis. Analysis showed that 96·4% of the H5N1 influenza viruses harbored full-length PB1F2 protein. Except for the 2009 pandemic H1N1 virus, all the subtypes of the 20th-century pandemic influenza viruses contained full-length PB1F2 protein. Through the years, PB1F2 protein of the H1N1 and H3N2 viruses has undergone much variation. PB1F2 protein sequences of H5N1 viruses showed both human- and avian host-specific conserved domains. Global database of PB1F2 protein revealed that N66S mutation was present only in 3·8% of the H5N1 strains. We found a novel mutation, N84S in the PB1F2 protein of 9·35% of the highly pathogenic avian influenza H5N1 influenza viruses. Varying sizes and mutations of the PB1F2 protein in different influenza A virus subtypes with pandemic potential were obtained. There was genetic divergence of the protein in various hosts which highlighted the host-specific evolution of the virus. However, studies are required to correlate this sequence variability with the virulence and pathogenicity. © 2012 John Wiley & Sons Ltd.
Amyloid β-sheet mimics that antagonize protein aggregation and reduce amyloid toxicity

NASA Astrophysics Data System (ADS)

Cheng, Pin-Nan; Liu, Cong; Zhao, Minglei; Eisenberg, David; Nowick, James S.

2012-11-01

The amyloid protein aggregation associated with diseases such as Alzheimer's, Parkinson's and type II diabetes (among many others) features a bewildering variety of β-sheet-rich structures in transition from native proteins to ordered oligomers and fibres. The variation in the amino-acid sequences of the β-structures presents a challenge to developing a model system of β-sheets for the study of various amyloid aggregates. Here, we introduce a family of robust β-sheet macrocycles that can serve as a platform to display a variety of heptapeptide sequences from different amyloid proteins. We have tailored these amyloid β-sheet mimics (ABSMs) to antagonize the aggregation of various amyloid proteins, thereby reducing the toxicity of amyloid aggregates. We describe the structures and inhibitory properties of ABSMs containing amyloidogenic peptides from the amyloid-β peptide associated with Alzheimer's disease, β2-microglobulin associated with dialysis-related amyloidosis, α-synuclein associated with Parkinson's disease, islet amyloid polypeptide associated with type II diabetes, human and yeast prion proteins, and Tau, which forms neurofibrillary tangles.
Genome wide re-sequencing of newly developed Rice Lines from common wild rice (Oryza rufipogon Griff.) for the identification of NBS-LRR genes.

PubMed

Liu, Wen; Ghouri, Fozia; Yu, Hang; Li, Xiang; Yu, Shuhong; Shahid, Muhammad Qasim; Liu, Xiangdong

2017-01-01

Common wild rice (Oryza rufipogon Griff.) is an important germplasm for rice breeding, which contains many resistance genes. Re-sequencing provides an unprecedented opportunity to explore the abundant useful genes at whole genome level. Here, we identified the nucleotide-binding site leucine-rich repeat (NBS-LRR) encoding genes by re-sequencing of two wild rice lines (i.e. Huaye 1 and Huaye 2) that were developed from common wild rice. We obtained 128 to 147 million reads with approximately 32.5-fold coverage depth, and uniquely covered more than 89.6% (> = 1 fold) of reference genomes. Two wild rice lines showed high SNP (single-nucleotide polymorphisms) variation rate in 12 chromosomes against the reference genomes of Nipponbare (japonica cultivar) and 93-11 (indica cultivar). InDels (insertion/deletion polymorphisms) count-length distribution exhibited normal distribution in the two lines, and most of the InDels were ranged from -5 to 5 bp. With reference to the Nipponbare genome sequence, we detected a total of 1,209,308 SNPs, 161,117 InDels and 4,192 SVs (structural variations) in Huaye 1, and 1,387,959 SNPs, 180,226 InDels and 5,305 SVs in Huaye 2. A total of 44.9% and 46.9% genes exhibited sequence variations in two wild rice lines compared to the Nipponbare and 93-11 reference genomes, respectively. Analysis of NBS-LRR mutant candidate genes showed that they were mainly distributed on chromosome 11, and NBS domain was more conserved than LRR domain in both wild rice lines. NBS genes depicted higher levels of genetic diversity in Huaye 1 than that found in Huaye 2. Furthermore, protein-protein interaction analysis showed that NBS genes mostly interacted with the cytochrome C protein (Os05g0420600, Os01g0885000 and BGIOSGA038922), while some NBS genes interacted with heat shock protein, DNA-binding activity, Phosphoinositide 3-kinase and a coiled coil region. We explored abundant NBS-LRR encoding genes in two common wild rice lines through genome wide re-sequencing, which proved to be a useful tool to exploit elite NBS-LRR genes in wild rice. The data here provide a foundation for future work aimed at dissecting the genetic basis of disease resistance in rice, and the two wild rice lines will be useful germplasm for the molecular improvement of cultivated rice.
Genome wide re-sequencing of newly developed Rice Lines from common wild rice (Oryza rufipogon Griff.) for the identification of NBS-LRR genes

PubMed Central

Yu, Hang; Li, Xiang; Yu, Shuhong; Shahid, Muhammad Qasim

2017-01-01

Common wild rice (Oryza rufipogon Griff.) is an important germplasm for rice breeding, which contains many resistance genes. Re-sequencing provides an unprecedented opportunity to explore the abundant useful genes at whole genome level. Here, we identified the nucleotide-binding site leucine-rich repeat (NBS-LRR) encoding genes by re-sequencing of two wild rice lines (i.e. Huaye 1 and Huaye 2) that were developed from common wild rice. We obtained 128 to 147 million reads with approximately 32.5-fold coverage depth, and uniquely covered more than 89.6% (> = 1 fold) of reference genomes. Two wild rice lines showed high SNP (single-nucleotide polymorphisms) variation rate in 12 chromosomes against the reference genomes of Nipponbare (japonica cultivar) and 93–11 (indica cultivar). InDels (insertion/deletion polymorphisms) count-length distribution exhibited normal distribution in the two lines, and most of the InDels were ranged from -5 to 5 bp. With reference to the Nipponbare genome sequence, we detected a total of 1,209,308 SNPs, 161,117 InDels and 4,192 SVs (structural variations) in Huaye 1, and 1,387,959 SNPs, 180,226 InDels and 5,305 SVs in Huaye 2. A total of 44.9% and 46.9% genes exhibited sequence variations in two wild rice lines compared to the Nipponbare and 93–11 reference genomes, respectively. Analysis of NBS-LRR mutant candidate genes showed that they were mainly distributed on chromosome 11, and NBS domain was more conserved than LRR domain in both wild rice lines. NBS genes depicted higher levels of genetic diversity in Huaye 1 than that found in Huaye 2. Furthermore, protein-protein interaction analysis showed that NBS genes mostly interacted with the cytochrome C protein (Os05g0420600, Os01g0885000 and BGIOSGA038922), while some NBS genes interacted with heat shock protein, DNA-binding activity, Phosphoinositide 3-kinase and a coiled coil region. We explored abundant NBS-LRR encoding genes in two common wild rice lines through genome wide re-sequencing, which proved to be a useful tool to exploit elite NBS-LRR genes in wild rice. The data here provide a foundation for future work aimed at dissecting the genetic basis of disease resistance in rice, and the two wild rice lines will be useful germplasm for the molecular improvement of cultivated rice. PMID:28700714
Lack of Detectable Allergenicity in Genetically Modified Maize Containing “Cry” Proteins as Compared to Native Maize Based on In Silico & In Vitro Analysis

PubMed Central

Mathur, Chandni; Kathuria, Pooran C.; Dahiya, Pushpa; Singh, Anand B.

2015-01-01

Background Genetically modified, (GM) crops with potential allergens must be evaluated for safety and endogenous IgE binding pattern compared to native variety, prior to market release. Objective To compare endogenous IgE binding proteins of three GM maize seeds containing Cry 1Ab,1Ac,1C transgenic proteins with non GM maize. Methods An integrated approach of in silico & in vitro methods was employed. Cry proteins were tested for presence of allergen sequence by FASTA in allergen databases. Biochemical assays for maize extracts were performed. Specific IgE (sIgE) and Immunoblot using food sensitized patients sera (n = 39) to non GM and GM maize antigens was performed. Results In silico approaches, confirmed for non sequence similarity of stated transgenic proteins in allergen databases. An insignificant (p> 0.05) variation in protein content between GM and non GM maize was observed. Simulated Gastric Fluid (SGF) revealed reduced number of stable protein fractions in GM then non GM maize which might be due to shift of constituent protein expression. Specific IgE values from patients showed insignificant difference in non GM and GM maize extracts. Five maize sensitized cases, recognized same 7 protein fractions of 88-28 kD as IgE bindng in both GM and non-GM maize, signifying absence of variation. Four of the reported IgE binding proteins were also found to be stable by SGF. Conclusion Cry proteins did not indicate any significant similarity of >35% in allergen databases. Immunoassays also did not identify appreciable differences in endogenous IgE binding in GM and non GM maize. PMID:25706412
Intra-domain phage display (ID-PhD) of peptides and protein mini-domains censored from canonical pIII phage display.

PubMed

Tjhung, Katrina F; Deiss, Frédérique; Tran, Jessica; Chou, Ying; Derda, Ratmir

2015-01-01

In this paper, we describe multivalent display of peptide and protein sequences typically censored from traditional N-terminal display on protein pIII of filamentous bacteriophage M13. Using site-directed mutagenesis of commercially available M13KE phage cloning vector, we introduced sites that permit efficient cloning using restriction enzymes between domains N1 and N2 of the pIII protein. As infectivity of phage is directly linked to the integrity of the connection between N1 and N2 domains, intra-domain phage display (ID-PhD) allows for simple quality control of the display and the natural variations in the displayed sequences. Additionally, direct linkage to phage propagation allows efficient monitoring of sequence cleavage, providing a convenient system for selection and evolution of protease-susceptible or protease-resistant sequences. As an example of the benefits of such an ID-PhD system, we displayed a negatively charged FLAG sequence, which is known to be post-translationally excised from pIII when displayed on the N-terminus, as well as positively charged sequences which suppress production of phage when displayed on the N-terminus. ID-PhD of FLAG exhibited sub-nanomolar apparent Kd suggesting multivalent nature of the display. A TEV-protease recognition sequence (TEVrs) co-expressed in tandem with FLAG, allowed us to demonstrate that 99.9997% of the phage displayed the FLAG-TEVrs tandem and can be recognized and cleaved by TEV-protease. The residual 0.0003% consisted of phage clones that have excised the insert from their genome. ID-PhD is also amenable to display of protein mini-domains, such as the 33-residue minimized Z-domain of protein A. We show that it is thus possible to use ID-PhD for multivalent display and selection of mini-domain proteins (Affibodies, scFv, etc.).
The evolutionary dynamics of variant antigen genes in Babesia reveal a history of genomic innovation underlying host-parasite interaction.

PubMed

Jackson, Andrew P; Otto, Thomas D; Darby, Alistair; Ramaprasad, Abhinay; Xia, Dong; Echaide, Ignacio Eduardo; Farber, Marisa; Gahlot, Sunayna; Gamble, John; Gupta, Dinesh; Gupta, Yask; Jackson, Louise; Malandrin, Laurence; Malas, Tareq B; Moussa, Ehab; Nair, Mridul; Reid, Adam J; Sanders, Mandy; Sharma, Jyotsna; Tracey, Alan; Quail, Mike A; Weir, William; Wastling, Jonathan M; Hall, Neil; Willadsen, Peter; Lingelbach, Klaus; Shiels, Brian; Tait, Andy; Berriman, Matt; Allred, David R; Pain, Arnab

2014-06-01

Babesia spp. are tick-borne, intraerythrocytic hemoparasites that use antigenic variation to resist host immunity, through sequential modification of the parasite-derived variant erythrocyte surface antigen (VESA) expressed on the infected red blood cell surface. We identified the genomic processes driving antigenic diversity in genes encoding VESA (ves1) through comparative analysis within and between three Babesia species, (B. bigemina, B. divergens and B. bovis). Ves1 structure diverges rapidly after speciation, notably through the evolution of shortened forms (ves2) from 5' ends of canonical ves1 genes. Phylogenetic analyses show that ves1 genes are transposed between loci routinely, whereas ves2 genes are not. Similarly, analysis of sequence mosaicism shows that recombination drives variation in ves1 sequences, but less so for ves2, indicating the adoption of different mechanisms for variation of the two families. Proteomic analysis of the B. bigemina PR isolate shows that two dominant VESA1 proteins are expressed in the population, whereas numerous VESA2 proteins are co-expressed, consistent with differential transcriptional regulation of each family. Hence, VESA2 proteins are abundant and previously unrecognized elements of Babesia biology, with evolutionary dynamics consistently different to those of VESA1, suggesting that their functions are distinct. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Position specific variation in the rate of evolution in transcription factor binding sites

PubMed Central

Moses, Alan M; Chiang, Derek Y; Kellis, Manolis; Lander, Eric S; Eisen, Michael B

2003-01-01

Background The binding sites of sequence specific transcription factors are an important and relatively well-understood class of functional non-coding DNAs. Although a wide variety of experimental and computational methods have been developed to characterize transcription factor binding sites, they remain difficult to identify. Comparison of non-coding DNA from related species has shown considerable promise in identifying these functional non-coding sequences, even though relatively little is known about their evolution. Results Here we analyse the genome sequences of the budding yeasts Saccharomyces cerevisiae, S. bayanus, S. paradoxus and S. mikatae to study the evolution of transcription factor binding sites. As expected, we find that both experimentally characterized and computationally predicted binding sites evolve slower than surrounding sequence, consistent with the hypothesis that they are under purifying selection. We also observe position-specific variation in the rate of evolution within binding sites. We find that the position-specific rate of evolution is positively correlated with degeneracy among binding sites within S. cerevisiae. We test theoretical predictions for the rate of evolution at positions where the base frequencies deviate from background due to purifying selection and find reasonable agreement with the observed rates of evolution. Finally, we show how the evolutionary characteristics of real binding motifs can be used to distinguish them from artefacts of computational motif finding algorithms. Conclusion As has been observed for protein sequences, the rate of evolution in transcription factor binding sites varies with position, suggesting that some regions are under stronger functional constraint than others. This variation likely reflects the varying importance of different positions in the formation of the protein-DNA complex. The characterization of the pattern of evolution in known binding sites will likely contribute to the effective use of comparative sequence data in the identification of transcription factor binding sites and is an important step toward understanding the evolution of functional non-coding DNA. PMID:12946282
The nucleotide sequence and genome organization of Plasmopara halstedii virus.

PubMed

Heller-Dohmen, Marion; Göpfert, Jens C; Pfannstiel, Jens; Spring, Otmar

2011-03-17

Only very few viruses of Oomycetes have been studied in detail. Isometric virions were found in different isolates of the oomycete Plasmopara halstedii, the downy mildew pathogen of sunflower. However, complete nucleotide sequences and data on the genome organization were lacking. Viral RNA of different P. halstedii isolates was subjected to nucleotide sequencing and analysis of the viral genome. The N-terminal sequence of the viral coat protein was determined using Top-Down MALDI-TOF analysis. The complete nucleotide sequences of both single-stranded RNA segments (RNA1 and RNA2) were established. RNA1 consisted of 2793 nucleotides (nt) exclusive its 3' poly(A) tract and a single open-reading frame (ORF1) of 2745 nt. ORF1 was framed by a 5' untranslated region (5' UTR) of 18 nt and a 3' untranslated region (3' UTR) of 30 nt. ORF1 contained motifs of RNA-dependent RNA polymerases (RdRp) and showed similarities to RdRp of Scleropthora macrospora virus A (SmV A) and viruses within the Nodaviridae family. RNA2 consisted of 1526 nt exclusive its 3' poly(A) tract and a second ORF (ORF2) of 1128 nt. ORF2 coded for the single viral coat protein (CP) and was framed by a 5' UTR of 164 nt and a 3' UTR of 234 nt. The deduced amino acid sequence of ORF2 was verified by nano-LC-ESI-MS/MS experiments. Top-Down MALDI-TOF analysis revealed the N-terminal sequence of the CP. The N-terminal sequence represented a region within ORF2 suggesting a proteolytic processing of the CP in vivo. The CP showed similarities to CP of SmV A and viruses within the Tombusviridae family. Fragments of RNA1 (ca. 1.9 kb) and RNA2 (ca. 1.4 kb) were used to analyze the nucleotide sequence variation of virions in different P. halstedii isolates. Viral sequence variation was 0.3% or less regardless of their host's pathotypes, the geographical origin and the sensitivity towards the fungicide metalaxyl. The results showed the presence of a single and new virus type in different P. halstedii isolates. Insignificant viral sequence variation indicated that the virus did not account for differences in pathogenicity of the oomycete P. halstedii.
Tertiary alphabet for the observable protein structural universe

PubMed Central

Mackenzie, Craig O.; Zhou, Jianfu; Grigoryan, Gevorg

2016-01-01

Here, we systematically decompose the known protein structural universe into its basic elements, which we dub tertiary structural motifs (TERMs). A TERM is a compact backbone fragment that captures the secondary, tertiary, and quaternary environments around a given residue, comprising one or more disjoint segments (three on average). We seek the set of universal TERMs that capture all structure in the Protein Data Bank (PDB), finding remarkable degeneracy. Only ∼600 TERMs are sufficient to describe 50% of the PDB at sub-Angstrom resolution. However, more rare geometries also exist, and the overall structural coverage grows logarithmically with the number of TERMs. We go on to show that universal TERMs provide an effective mapping between sequence and structure. We demonstrate that TERM-based statistics alone are sufficient to recapitulate close-to-native sequences given either NMR or X-ray backbones. Furthermore, sequence variability predicted from TERM data agrees closely with evolutionary variation. Finally, locations of TERMs in protein chains can be predicted from sequence alone based on sequence signatures emergent from TERM instances in the PDB. For multisegment motifs, this method identifies spatially adjacent fragments that are not contiguous in sequence—a major bottleneck in structure prediction. Although all TERMs recur in diverse proteins, some appear specialized for certain functions, such as interface formation, metal coordination, or even water binding. Structural biology has benefited greatly from previously observed degeneracies in structure. The decomposition of the known structural universe into a finite set of compact TERMs offers exciting opportunities toward better understanding, design, and prediction of protein structure. PMID:27810958
A comprehensive analysis of the Omp85/TpsB protein superfamily structural diversity, taxonomic occurrence, and evolution

PubMed Central

Heinz, Eva; Lithgow, Trevor

2014-01-01

Members of the Omp85/TpsB protein superfamily are ubiquitously distributed in Gram-negative bacteria, and function in protein translocation (e.g., FhaC) or the assembly of outer membrane proteins (e.g., BamA). Several recent findings are suggestive of a further level of variation in the superfamily, including the identification of the novel membrane protein assembly factor TamA and protein translocase PlpD. To investigate the diversity and the causal evolutionary events, we undertook a comprehensive comparative sequence analysis of the Omp85/TpsB proteins. A total of 10 protein subfamilies were apparent, distinguished in their domain structure and sequence signatures. In addition to the proteins FhaC, BamA, and TamA, for which structural and functional information is available, are families of proteins with so far undescribed domain architectures linked to the Omp85 β-barrel domain. This study brings a classification structure to a dynamic protein superfamily of high interest given its essential function for Gram-negative bacteria as well as its diverse domain architecture, and we discuss several scenarios of putative functions of these so far undescribed proteins. PMID:25101071
Structure of human POFUT2: insights into thrombospondin type 1 repeat fold and O-fucosylation

PubMed Central

Chen, Chun-I; Keusch, Jeremy J; Klein, Dominique; Hess, Daniel; Hofsteenge, Jan; Gut, Heinz

2012-01-01

Protein O-fucosylation is a post-translational modification found on serine/threonine residues of thrombospondin type 1 repeats (TSR). The fucose transfer is catalysed by the enzyme protein O-fucosyltransferase 2 (POFUT2) and >40 human proteins contain the TSR consensus sequence for POFUT2-dependent fucosylation. To better understand O-fucosylation on TSR, we carried out a structural and functional analysis of human POFUT2 and its TSR substrate. Crystal structures of POFUT2 reveal a variation of the classical GT-B fold and identify sugar donor and TSR acceptor binding sites. Structural findings are correlated with steady-state kinetic measurements of wild-type and mutant POFUT2 and TSR and give insight into the catalytic mechanism and substrate specificity. By using an artificial mini-TSR substrate, we show that specificity is not primarily encoded in the TSR protein sequence but rather in the unusual 3D structure of a small part of the TSR. Our findings uncover that recognition of distinct conserved 3D fold motifs can be used as a mechanism to achieve substrate specificity by enzymes modifying completely folded proteins of very wide sequence diversity and biological function. PMID:22588082
Identification and characterization of transcript polymorphisms in soybean lines varying in oil composition and content

PubMed Central

2014-01-01

Background Variation in seed oil composition and content among soybean varieties is largely attributed to differences in transcript sequences and/or transcript accumulation of oil production related genes in seeds. Discovery and analysis of sequence and expression variations in these genes will accelerate soybean oil quality improvement. Results In an effort to identify these variations, we sequenced the transcriptomes of soybean seeds from nine lines varying in oil composition and/or total oil content. Our results showed that 69,338 distinct transcripts from 32,885 annotated genes were expressed in seeds. A total of 8,037 transcript expression polymorphisms and 50,485 transcript sequence polymorphisms (48,792 SNPs and 1,693 small Indels) were identified among the lines. Effects of the transcript polymorphisms on their encoded protein sequences and functions were predicted. The studies also provided independent evidence that the lack of FAD2-1A gene activity and a non-synonymous SNP in the coding sequence of FAB2C caused elevated oleic acid and stearic acid levels in soybean lines M23 and FAM94-41, respectively. Conclusions As a proof-of-concept, we developed an integrated RNA-seq and bioinformatics approach to identify and functionally annotate transcript polymorphisms, and demonstrated its high effectiveness for discovery of genetic and transcript variations that result in altered oil quality traits. The collection of transcript polymorphisms coupled with their predicted functional effects will be a valuable asset for further discovery of genes, gene variants, and functional markers to improve soybean oil quality. PMID:24755115
Polymorphisms and variants in the prion protein sequence of European moose (Alces alces), reindeer (Rangifer tarandus), roe deer (Capreolus capreolus) and fallow deer (Dama dama) in Scandinavia

PubMed Central

Wik, Lotta; Mikko, Sofia; Klingeborn, Mikael; Stéen, Margareta; Simonsson, Magnus; Linné, Tommy

2012-01-01

The prion protein (PrP) sequence of European moose, reindeer, roe deer and fallow deer in Scandinavia has high homology to the PrP sequence of North American cervids. Variants in the European moose PrP sequence were found at amino acid position 109 as K or Q. The 109Q variant is unique in the PrP sequence of vertebrates. During the 1980s a wasting syndrome in Swedish moose, Moose Wasting Syndrome (MWS), was described. SNP analysis demonstrated a difference in the observed genotype proportions of the heterozygous Q/K and homozygous Q/Q variants in the MWS animals compared with the healthy animals. In MWS moose the allele frequencies for 109K and 109Q were 0.73 and 0.27, respectively, and for healthy animals 0.69 and 0.31. Both alleles were seen as heterozygotes and homozygotes. In reindeer, PrP sequence variation was demonstrated at codon 176 as D or N and codon 225 as S or Y. The PrP sequences in roe deer and fallow deer were identical with published GenBank sequences. PMID:22441661
Using structure to explore the sequence alignment space of remote homologs.

PubMed

Kuziemko, Andrew; Honig, Barry; Petrey, Donald

2011-10-01

Protein structure modeling by homology requires an accurate sequence alignment between the query protein and its structural template. However, sequence alignment methods based on dynamic programming (DP) are typically unable to generate accurate alignments for remote sequence homologs, thus limiting the applicability of modeling methods. A central problem is that the alignment that is "optimal" in terms of the DP score does not necessarily correspond to the alignment that produces the most accurate structural model. That is, the correct alignment based on structural superposition will generally have a lower score than the optimal alignment obtained from sequence. Variations of the DP algorithm have been developed that generate alternative alignments that are "suboptimal" in terms of the DP score, but these still encounter difficulties in detecting the correct structural alignment. We present here a new alternative sequence alignment method that relies heavily on the structure of the template. By initially aligning the query sequence to individual fragments in secondary structure elements and combining high-scoring fragments that pass basic tests for "modelability", we can generate accurate alignments within a small ensemble. Our results suggest that the set of sequences that can currently be modeled by homology can be greatly extended.
Horizontal gene transfer of chromosomal Type II toxin-antitoxin systems of Escherichia coli.

PubMed

Ramisetty, Bhaskar Chandra Mohan; Santhosh, Ramachandran Sarojini

2016-02-01

Type II toxin-antitoxin systems (TAs) are small autoregulated bicistronic operons that encode a toxin protein with the potential to inhibit metabolic processes and an antitoxin protein to neutralize the toxin. Most of the bacterial genomes encode multiple TAs. However, the diversity and accumulation of TAs on bacterial genomes and its physiological implications are highly debated. Here we provide evidence that Escherichia coli chromosomal TAs (encoding RNase toxins) are 'acquired' DNA likely originated from heterologous DNA and are the smallest known autoregulated operons with the potential for horizontal propagation. Sequence analyses revealed that integration of TAs into the bacterial genome is unique and contributes to variations in the coding and/or regulatory regions of flanking host genome sequences. Plasmids and genomes encoding identical TAs of natural isolates are mutually exclusive. Chromosomal TAs might play significant roles in the evolution and ecology of bacteria by contributing to host genome variation and by moderation of plasmid maintenance. © FEMS 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Sequences in Glycoprotein gp41, the CD4 Binding Site, and the V2 Domain Regulate Sensitivity and Resistance of HIV-1 to Broadly Neutralizing Antibodies

PubMed Central

O'Rourke, Sara M.; Schweighardt, Becky; Phung, Pham; Mesa, Kathryn A.; Vollrath, Aaron L.; Tatsuno, Gwen P.; To, Briana; Sinangil, Faruk; Limoli, Kay; Wrin, Terri

2012-01-01

The swarm of quasispecies that evolves in each HIV-1-infected individual represents a source of closely related Env protein variants that can be used to explore various aspects of HIV-1 biology. In this study, we made use of these variants to identify mutations that confer sensitivity and resistance to the broadly neutralizing antibodies found in the sera of selected HIV-1-infected individuals. For these studies, libraries of Env proteins were cloned from infected subjects and screened for infectivity and neutralization sensitivity. The nucleotide sequences of the Env proteins were then compared for pairs of neutralization-sensitive and -resistant viruses. In vitro mutagenesis was used to identify the specific amino acids responsible for the neutralization phenotype. All of the mutations altering neutralization sensitivity/resistance appeared to induce conformational changes that simultaneously enhanced the exposure of two or more epitopes located in different regions of gp160. These mutations appeared to occur at unique positions required to maintain the quaternary structure of the gp160 trimer, as well as conformational masking of epitopes targeted by neutralizing antibodies. Our results show that sequences in gp41, the CD4 binding site, and the V2 domain all have the ability to act as global regulators of neutralization sensitivity. Our results also suggest that neutralization assays designed to support the development of vaccines and therapeutics targeting the HIV-1 Env protein should consider virus variation within individuals as well as virus variation between individuals. PMID:22933284
Protein conformation and disease : pathological consequences of analogous mutations in homologous proteins.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stevens, F. J.; Pokkuluri, P. R.; Schiffer, M.

2000-12-19

The antibody light chain variable domain (V{sub L}){sup 1} and myelin protein zero (MPZ) are representatives of the functionally diverse immunoglobulin superfamily. The V{sub L} is a subunit of the antigen-binding component of antibodies, while MPZ is the major membrane-linked constituent of the myelin sheaths that coat peripheral nerves. Despite limited amino acid sequence homology, the conformations of the core structures of the two proteins are largely superimposable. Amino acid variations in V{sub L} account for various conformational disease outcomes, including amyloidosis. However, the specific amino acid changes in V{sub L} that are responsible for disease have been obscured bymore » multiple concurrent primary structure alterations. Recently, certain demyelination disorders have been linked to point mutations and single amino acid polymorphisms in MPZ. We demonstrate here that some pathogenic variations in MPZ correspond to changes suspected of determining amyloidosis in V{sub L}. This unanticipated observation suggests that studies of the biophysical origin of conformational disease in one member of a superfamily of homologous proteins may have implications throughout the superfamily. In some cases, findings may account for overt disease; in other cases, due to the natural repertoire of inherited polymorphisms, variations in a representative protein may predict subclinical impairment of homologous proteins.« less
Variation analysis and gene annotation of eight MHC haplotypes: The MHC Haplotype Project

PubMed Central

Horton, Roger; Gibson, Richard; Coggill, Penny; Miretti, Marcos; Allcock, Richard J.; Almeida, Jeff; Forbes, Simon; Gilbert, James G. R.; Halls, Karen; Harrow, Jennifer L.; Hart, Elizabeth; Howe, Kevin; Jackson, David K.; Palmer, Sophie; Roberts, Anne N.; Sims, Sarah; Stewart, C. Andrew; Traherne, James A.; Trevanion, Steve; Wilming, Laurens; Rogers, Jane; de Jong, Pieter J.; Elliott, John F.; Sawcer, Stephen; Todd, John A.; Trowsdale, John

2008-01-01

The human major histocompatibility complex (MHC) is contained within about 4 Mb on the short arm of chromosome 6 and is recognised as the most variable region in the human genome. The primary aim of the MHC Haplotype Project was to provide a comprehensively annotated reference sequence of a single, human leukocyte antigen-homozygous MHC haplotype and to use it as a basis against which variations could be assessed from seven other similarly homozygous cell lines, representative of the most common MHC haplotypes in the European population. Comparison of the haplotype sequences, including four haplotypes not previously analysed, resulted in the identification of >44,000 variations, both substitutions and indels (insertions and deletions), which have been submitted to the dbSNP database. The gene annotation uncovered haplotype-specific differences and confirmed the presence of more than 300 loci, including over 160 protein-coding genes. Combined analysis of the variation and annotation datasets revealed 122 gene loci with coding substitutions of which 97 were non-synonymous. The haplotype (A3-B7-DR15; PGF cell line) designated as the new MHC reference sequence, has been incorporated into the human genome assembly (NCBI35 and subsequent builds), and constitutes the largest single-haplotype sequence of the human genome to date. The extensive variation and annotation data derived from the analysis of seven further haplotypes have been made publicly available and provide a framework and resource for future association studies of all MHC-associated diseases and transplant medicine. PMID:18193213
An effective approach for annotation of protein families with low sequence similarity and conserved motifs: identifying GDSL hydrolases across the plant kingdom.

PubMed

Vujaklija, Ivan; Bielen, Ana; Paradžik, Tina; Biđin, Siniša; Goldstein, Pavle; Vujaklija, Dušica

2016-02-18

The massive accumulation of protein sequences arising from the rapid development of high-throughput sequencing, coupled with automatic annotation, results in high levels of incorrect annotations. In this study, we describe an approach to decrease annotation errors of protein families characterized by low overall sequence similarity. The GDSL lipolytic family comprises proteins with multifunctional properties and high potential for pharmaceutical and industrial applications. The number of proteins assigned to this family has increased rapidly over the last few years. In particular, the natural abundance of GDSL enzymes reported recently in plants indicates that they could be a good source of novel GDSL enzymes. We noticed that a significant proportion of annotated sequences lack specific GDSL motif(s) or catalytic residue(s). Here, we applied motif-based sequence analyses to identify enzymes possessing conserved GDSL motifs in selected proteomes across the plant kingdom. Motif-based HMM scanning (Viterbi decoding-VD and posterior decoding-PD) and the here described PD/VD protocol were successfully applied on 12 selected plant proteomes to identify sequences with GDSL motifs. A significant number of identified GDSL sequences were novel. Moreover, our scanning approach successfully detected protein sequences lacking at least one of the essential motifs (171/820) annotated by Pfam profile search (PfamA) as GDSL. Based on these analyses we provide a curated list of GDSL enzymes from the selected plants. CLANS clustering and phylogenetic analysis helped us to gain a better insight into the evolutionary relationship of all identified GDSL sequences. Three novel GDSL subfamilies as well as unreported variations in GDSL motifs were discovered in this study. In addition, analyses of selected proteomes showed a remarkable expansion of GDSL enzymes in the lycophyte, Selaginella moellendorffii. Finally, we provide a general motif-HMM scanner which is easily accessible through the graphical user interface ( http://compbio.math.hr/ ). Our results show that scanning with a carefully parameterized motif-HMM is an effective approach for annotation of protein families with low sequence similarity and conserved motifs. The results of this study expand current knowledge and provide new insights into the evolution of the large GDSL-lipase family in land plants.

Topology of membrane proteins-predictions, limitations and variations.

PubMed

Tsirigos, Konstantinos D; Govindarajan, Sudha; Bassot, Claudio; Västermark, Åke; Lamb, John; Shu, Nanjiang; Elofsson, Arne

2017-10-26

Transmembrane proteins perform a variety of important biological functions necessary for the survival and growth of the cells. Membrane proteins are built up by transmembrane segments that span the lipid bilayer. The segments can either be in the form of hydrophobic alpha-helices or beta-sheets which create a barrel. A fundamental aspect of the structure of transmembrane proteins is the membrane topology, that is, the number of transmembrane segments, their position in the protein sequence and their orientation in the membrane. Along these lines, many predictive algorithms for the prediction of the topology of alpha-helical and beta-barrel transmembrane proteins exist. The newest algorithms obtain an accuracy close to 80% both for alpha-helical and beta-barrel transmembrane proteins. However, lately it has been shown that the simplified picture presented when describing a protein family by its topology is limited. To demonstrate this, we highlight examples where the topology is either not conserved in a protein superfamily or where the structure cannot be described solely by the topology of a protein. The prediction of these non-standard features from sequence alone was not successful until the recent revolutionary progress in 3D-structure prediction of proteins. Copyright © 2017 Elsevier Ltd. All rights reserved.
Multidimensional structure-function relationships in human β-cardiac myosin from population-scale genetic variation

PubMed Central

Homburger, Julian R.; Green, Eric M.; Caleshu, Colleen; Sunitha, Margaret S.; Taylor, Rebecca E.; Ruppel, Kathleen M.; Metpally, Raghu Prasad Rao; Colan, Steven D.; Michels, Michelle; Day, Sharlene M.; Olivotto, Iacopo; Bustamante, Carlos D.; Dewey, Frederick E.; Ho, Carolyn Y.; Spudich, James A.; Ashley, Euan A.

2016-01-01

Myosin motors are the fundamental force-generating elements of muscle contraction. Variation in the human β-cardiac myosin heavy chain gene (MYH7) can lead to hypertrophic cardiomyopathy (HCM), a heritable disease characterized by cardiac hypertrophy, heart failure, and sudden cardiac death. How specific myosin variants alter motor function or clinical expression of disease remains incompletely understood. Here, we combine structural models of myosin from multiple stages of its chemomechanical cycle, exome sequencing data from two population cohorts of 60,706 and 42,930 individuals, and genetic and phenotypic data from 2,913 patients with HCM to identify regions of disease enrichment within β-cardiac myosin. We first developed computational models of the human β-cardiac myosin protein before and after the myosin power stroke. Then, using a spatial scan statistic modified to analyze genetic variation in protein 3D space, we found significant enrichment of disease-associated variants in the converter, a kinetic domain that transduces force from the catalytic domain to the lever arm to accomplish the power stroke. Focusing our analysis on surface-exposed residues, we identified a larger region significantly enriched for disease-associated variants that contains both the converter domain and residues on a single flat surface on the myosin head described as the myosin mesa. Notably, patients with HCM with variants in the enriched regions have earlier disease onset than patients who have HCM with variants elsewhere. Our study provides a model for integrating protein structure, large-scale genetic sequencing, and detailed phenotypic data to reveal insight into time-shifted protein structures and genetic disease. PMID:27247418
The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity

PubMed Central

Wang, Quanli; Halvorsen, Matt; Han, Yujun; Weir, William H.; Allen, Andrew S.; Goldstein, David B.

2015-01-01

Noncoding sequence contains pathogenic mutations. Yet, compared with mutations in protein-coding sequence, pathogenic regulatory mutations are notoriously difficult to recognize. Most fundamentally, we are not yet adept at recognizing the sequence stretches in the human genome that are most important in regulating the expression of genes. For this reason, it is difficult to apply to the regulatory regions the same kinds of analytical paradigms that are being successfully applied to identify mutations among protein-coding regions that influence risk. To determine whether dosage sensitive genes have distinct patterns among their noncoding sequence, we present two primary approaches that focus solely on a gene’s proximal noncoding regulatory sequence. The first approach is a regulatory sequence analogue of the recently introduced residual variation intolerance score (RVIS), termed noncoding RVIS, or ncRVIS. The ncRVIS compares observed and predicted levels of standing variation in the regulatory sequence of human genes. The second approach, termed ncGERP, reflects the phylogenetic conservation of a gene’s regulatory sequence using GERP++. We assess how well these two approaches correlate with four gene lists that use different ways to identify genes known or likely to cause disease through changes in expression: 1) genes that are known to cause disease through haploinsufficiency, 2) genes curated as dosage sensitive in ClinGen’s Genome Dosage Map, 3) genes judged likely to be under purifying selection for mutations that change expression levels because they are statistically depleted of loss-of-function variants in the general population, and 4) genes judged unlikely to cause disease based on the presence of copy number variants in the general population. We find that both noncoding scores are highly predictive of dosage sensitivity using any of these criteria. In a similar way to ncGERP, we assess two ensemble-based predictors of regional noncoding importance, ncCADD and ncGWAVA, and find both scores are significantly predictive of human dosage sensitive genes and appear to carry information beyond conservation, as assessed by ncGERP. These results highlight that the intolerance of noncoding sequence stretches in the human genome can provide a critical complementary tool to other genome annotation approaches to help identify the parts of the human genome increasingly likely to harbor mutations that influence risk of disease. PMID:26332131
Proteogenomic characterization of human colon and rectal cancer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Bing; Wang, Jing; Wang, Xiaojing

2014-09-18

We analyzed proteomes of colon and rectal tumors previously characterized by the Cancer Genome Atlas (TCGA) and performed integrated proteogenomic analyses. Protein sequence variants encoded by somatic genomic variations displayed reduced expression compared to protein variants encoded by germline variations. mRNA transcript abundance did not reliably predict protein expression differences between tumors. Proteomics identified five protein expression subtypes, two of which were associated with the TCGA "MSI/CIMP" transcriptional subtype, but had distinct mutation and methylation patterns and associated with different clinical outcomes. Although CNAs showed strong cis- and trans-effects on mRNA expression, relatively few of these extend to the proteinmore » level. Thus, proteomics data enabled prioritization of candidate driver genes. Our analyses identified HNF4A, a novel candidate driver gene in tumors with chromosome 20q amplifications. Integrated proteogenomic analysis provides functional context to interpret genomic abnormalities and affords novel insights into cancer biology.« less
Helicobacter pylori Heat Shock Protein A: Serologic Responses and Genetic Diversity

PubMed Central

Ng, Enders K. W.; Thompson, Stuart A.; Pérez-Pérez, Guillermo I.; Kansau, Imad; van der Ende, Arie; Labigne, Agnès; Sung, Joseph J. Y.; Chung, S. C. Sydney; Blaser, Martin J.

1999-01-01

Helicobacter pylori synthesizes an unusual GroES homolog, heat shock protein A (HspA). The present study was aimed at an assessment of the serological response to HspA in a group of Chinese patients with defined gastroduodenal pathologies and determination of whether diversity is present in the nucleotide sequences encoding HspA in isolates from these patients. Serum samples collected from 154 patients who had an upper gastrointestinal pathology and the presence of H. pylori defined by biopsy were tested for an immunoglobulin G (IgG) serologic response to H. pylori HspA by an enzyme linked immunosorbant assay. HspA-encoding nucleotide sequences in H. pylori isolates from 14 patients (7 seropositive and 7 seronegative for HspA) were analyzed by PCR and direct sequencing of the PCR products. The sequencing results were compared to those of 48 isolates from other parts of the world. Of the 154 known H. pylori-positive patients, 54 (35.1%) were seropositive for HspA. The A domain (GroES homology) of HspA was highly conserved in the 14 isolates tested. Although the B domain (metal-binding site unique to H. pylori) resembled that in the known major variant, particular amino acid substitutions allowed definition of an HspA variant associated with isolates from East Asia. There were no associations between patient characteristics and HspA seropositivity or amino acid sequences. We confirmed in this study that the clinical outcomes of H. pylori infection are not related to HspA antigenicity or to sequence variation. However, B-domain sequence variation may be a marker for the study of the genetic diversity of H. pylori strains of different geographic origins. PMID:10225839
Imperfect duplicate insertions type of mutations in plasmepsin V modulates binding properties of PEXEL motifs of export proteins in Indian Plasmodium vivax.

PubMed

Rawat, Manmeet; Vijay, Sonam; Gupta, Yash; Tiwari, Pramod Kumar; Sharma, Arun

2013-01-01

Plasmepsin V (PM-V) have functionally conserved orthologues across the Plasmodium genus who's binding and antigenic processing at the PEXEL motifs for export about 200-300 essential proteins is important for the virulence and viability of the causative Plasmodium species. This study was undertaken to determine P. vivax plasmepsin V Ind (PvPM-V-Ind) PEXEL motif export pathway for pathogenicity-related proteins/antigens export thereby altering plasmodium exportome during erythrocytic stages. We identify and characterize Plasmodium vivax plasmepsin-V-Ind (mutant) gene by cloning, sequence analysis, in silico bioinformatic protocols and structural modeling predictions based on docking studies on binding capacity with PEXEL motifs processing in terms of binding and accessibility of export proteins. Cloning and sequence analysis for genetic diversity demonstrates PvPM-V-Ind (mutant) gene is highly conserved among all isolates from different geographical regions of India. Imperfect duplicate insertion types of mutations (SVSE from 246-249 AA and SLSE from 266-269 AA) were identified among all Indian isolates in comparison to P.vivax Sal-1 (PvPM-V-Sal 1) isolate. In silico bioinformatics interaction studies of PEXEL peptide and active enzyme reveal that PvPM-V-Ind (mutant) is only active in endoplasmic reticulum lumen and membrane embedding is essential for activation of plasmepsin V. Structural modeling predictions based on docking studies with PEXEL motif show significant variation in substrate protein binding of these imperfect mutations with data mined PEXEL sequences. The predicted variation in the docking score and interacting amino acids of PvPM-V-Ind (mutant) proteins with PEXEL and lopinavir suggests a modulation in the activity of PvPM-V in terms of binding and accessibility at these sites. Our functional modeled validation of PvPM-V-Ind (mutant) imperfect duplicate insertions with data mined PEXEL sequences leading to altered binding and substrate accessibility of the enzyme makes it a plausible target to investigate export mechanisms for in silico virtual screening and novel pharmacophore designing.
Imperfect Duplicate Insertions Type of Mutations in Plasmepsin V Modulates Binding Properties of PEXEL Motifs of Export Proteins in Indian Plasmodium vivax

PubMed Central

Rawat, Manmeet; Vijay, Sonam; Gupta, Yash; Tiwari, Pramod Kumar; Sharma, Arun

2013-01-01

Introduction Plasmepsin V (PM-V) have functionally conserved orthologues across the Plasmodium genus who's binding and antigenic processing at the PEXEL motifs for export about 200–300 essential proteins is important for the virulence and viability of the causative Plasmodium species. This study was undertaken to determine P. vivax plasmepsin V Ind (PvPM-V-Ind) PEXEL motif export pathway for pathogenicity-related proteins/antigens export thereby altering plasmodium exportome during erythrocytic stages. Method We identify and characterize Plasmodium vivax plasmepsin-V-Ind (mutant) gene by cloning, sequence analysis, in silico bioinformatic protocols and structural modeling predictions based on docking studies on binding capacity with PEXEL motifs processing in terms of binding and accessibility of export proteins. Results Cloning and sequence analysis for genetic diversity demonstrates PvPM-V-Ind (mutant) gene is highly conserved among all isolates from different geographical regions of India. Imperfect duplicate insertion types of mutations (SVSE from 246–249 AA and SLSE from 266–269 AA) were identified among all Indian isolates in comparison to P.vivax Sal-1 (PvPM-V-Sal 1) isolate. In silico bioinformatics interaction studies of PEXEL peptide and active enzyme reveal that PvPM-V-Ind (mutant) is only active in endoplasmic reticulum lumen and membrane embedding is essential for activation of plasmepsin V. Structural modeling predictions based on docking studies with PEXEL motif show significant variation in substrate protein binding of these imperfect mutations with data mined PEXEL sequences. The predicted variation in the docking score and interacting amino acids of PvPM-V-Ind (mutant) proteins with PEXEL and lopinavir suggests a modulation in the activity of PvPM-V in terms of binding and accessibility at these sites. Conclusion/Significance Our functional modeled validation of PvPM-V-Ind (mutant) imperfect duplicate insertions with data mined PEXEL sequences leading to altered binding and substrate accessibility of the enzyme makes it a plausible target to investigate export mechanisms for in silico virtual screening and novel pharmacophore designing. PMID:23555891
Top-down Proteomics in Health and Disease: Challenges and Opportunities

PubMed Central

Gregorich, Zachery R.; Ge, Ying

2014-01-01

Proteomics is essential for deciphering how molecules interact as a system and for understanding the functions of cellular systems in human disease; however, the unique characteristics of the human proteome, which include a high dynamic range of protein expression and extreme complexity due to a plethora of post-translational modifications (PTMs) and sequence variations, make such analyses challenging. An emerging “top-down” mass spectrometry (MS)-based proteomics approach, which provides a “bird’s eye” view of all proteoforms, has unique advantages for the assessment of PTMs and sequence variations. Recently, a number of studies have showcased the potential of top-down proteomics for unraveling of disease mechanisms and discovery of new biomarkers. Nevertheless, the top-down approach still faces significant challenges in terms of protein solubility, separation, and the detection of large intact proteins, as well as the under-developed data analysis tools. Consequently, new technological developments are urgently needed to advance the field of top-down proteomics. Herein, we intend to provide an overview of the recent applications of top-down proteomics in biomedical research. Moreover, we will outline the challenges and opportunities facing top-down proteomics strategies aimed at understanding and diagnosing human diseases. PMID:24723472
Partitioning of genetic variation between regulatory and coding gene segments: the predominance of software variation in genes encoding introvert proteins.

PubMed

Mitchison, A

1997-01-01

In considering genetic variation in eukaryotes, a fundamental distinction can be made between variation in regulatory (software) and coding (hardware) gene segments. For quantitative traits the bulk of variation, particularly that near the population mean, appears to reside in regulatory segments. The main exceptions to this rule concern proteins which handle extrinsic substances, here termed extrovert proteins. The immune system includes an unusually large proportion of this exceptional category, but even so its chief source of variation may well be polymorphism in regulatory gene segments. The main evidence for this view emerges from genome scanning for quantitative trait loci (QTL), which in the case of the immune system points to a major contribution of pro-inflammatory cytokine genes. Further support comes from sequencing of major histocompatibility complex (Mhc) class II promoters, where a high level of polymorphism has been detected. These Mhc promoters appear to act, in part at least, by gating the back-signal from T cells into antigen-presenting cells. Both these forms of polymorphism are likely to be sustained by the need for flexibility in the immune response. Future work on promoter polymorphism is likely to benefit from the input from genome informatics.
Rapid PCR Assays That Specifically Identify Anthrax and Anthrax Surrogate Chromosomal Signatures

DTIC Science & Technology

2002-08-30

The genetic variation among a set of 175 full-length sspE DNA sequences obtained from representative members of the B. anthracis clade have been...examined. Thirty-six sspE genotypes and seventeen protein phylotypes were identified among the B. cereus, B. thuringiensis, B. anthracis and B. mycoides...the sspE DNA sequence data sets suggests that the B. anthracis dade is more phylogenetically complex than has been inferred by traditional taxonomic methods.
Natural Variation in the Pto Pathogen Resistance Gene Within Species of Wild Tomato (Lycopersicon). I. Functional Analysis of Pto Alleles

PubMed Central

Rose, Laura E.; Langley, Charles H.; Bernal, Adriana J.; Michelmore, Richard W.

2005-01-01

Disease resistance to the bacterial pathogen Pseudomonas syringae pv. tomato (Pst) in the cultivated tomato, Lycopersicon esculentum, and the closely related L. pimpinellifolium is triggered by the physical interaction between plant disease resistance protein, Pto, and the pathogen avirulence protein, AvrPto. To investigate the extent to which variation in the Pto gene is responsible for naturally occurring variation in resistance to Pst, we determined the resistance phenotype of 51 accessions from seven species of Lycopersicon to isogenic strains of Pst differing in the presence of avrPto. One-third of the plants displayed resistance specifically when the pathogen expressed AvrPto, consistent with a gene-for-gene interaction. To test whether this resistance in these species was conferred specifically by the Pto gene, alleles of Pto were amplified and sequenced from 49 individuals and a subset (16) of these alleles was tested in planta using Agrobacterium-mediated transient assays. Eleven alleles conferred a hypersensitive resistance response (HR) in the presence of AvrPto, while 5 did not. Ten amino acid substitutions associated with the absence of AvrPto recognition and HR were identified, none of which had been identified in previous structure-function studies. Additionally, 3 alleles encoding putative pseudogenes of Pto were isolated from two species of Lycopersicon. Therefore, a large proportion, but not all, of the natural variation in the reaction to strains of Pst expressing AvrPto can be attributed to sequence variation in the Pto gene. PMID:15944360
Evaluation of a functional variant assay for selecting beef cattle

USDA-ARS?s Scientific Manuscript database

A commercially available genotyping assay for functional variants was chosen to obtain genotypes needed for a selection experiment in populations of pedigreed cattle that have not been extensively genotyped. The assay design included probes for coding sequence variation in 88% of annotated protein c...
Ovine Reference Materials and Assays for Prion Genetic Testing

USDA-ARS?s Scientific Manuscript database

Background: Genetic predisposition to scrapie in sheep is associated with variation in the peptide sequence of the ovine prion protein encoded by Prnp. Codon variants implicated in scrapie susceptibility or disease progression include those at amino acid positions 112, 136, 141, 154, and 171. Nin...
Rapid functional diversification in the structurally conserved ELAV family of neuronal RNA binding proteins

PubMed Central

Samson, Marie-Laure

2008-01-01

Background The Drosophila gene embryonic lethal abnormal visual system (elav) is the prototype of a gene family present in all metazoans. Its members encode structurally conserved neuronal proteins with three RNA Recognition Motifs (RRM) but they paradoxically act at diverse levels of post-transcriptional regulation. In an attempt to understand the history of this family, we searched for orthologs in eleven completely sequenced genomes, including those of humans, D. melanogaster and C. elegans, for which cDNAs are available. Results We analyzed 23 orthologs/paralogs of elav, and found evidence of gain/loss of gene copy number. For one set of genes, including elav itself, the coding sequences are free of introns and their products most resemble ELAV. The remaining genes show remarkable conservation of their exon organization, and their products most resemble FNE and RBP9, proteins encoded by the two elav paralogs of Drosophila. Remarkably, three of the conserved exon junctions are both close to structural elements, involved respectively in protein-RNA interactions and in the regulation of sub-cellular localization, and in the vicinity of diverse sequence variations. Conclusion The data indicate that the essential elav gene of Drosophila is newly emerged, restricted to dipterans and of retrotransposed origin. We propose that the conserved exon junctions constitute potential sites for sequence/function modifications, and that RRM binding proteins, whose function relies upon plastic RNA-protein interactions, may have played an important role in brain evolution. PMID:18715504
Intrinsic Kinetics Fluctuations as Cause of Growth Inhomogeneity in Protein Crystals

NASA Technical Reports Server (NTRS)

Vekilov, Peter G.; Rosenberger, Franz

1998-01-01

Intrinsic kinetics instabilities in the form of growth step bunching during the crystallization of the protein lysozyme from solution were characterized by in situ high-resolution optical interferometry. Compositional variations (striations) in the crystal, which potentially decrease its utility, e.g., for molecular structure studies by diffraction methods, were visualized by polarized light reflection microscopy. A spatiotemporal correlation was established between the sequence of moving step bunches and the striations.
Sequence charge decoration dictates coil-globule transition in intrinsically disordered proteins.

PubMed

Firman, Taylor; Ghosh, Kingshuk

2018-03-28

We present an analytical theory to compute conformations of heteropolymers-applicable to describe disordered proteins-as a function of temperature and charge sequence. The theory describes coil-globule transition for a given protein sequence when temperature is varied and has been benchmarked against the all-atom Monte Carlo simulation (using CAMPARI) of intrinsically disordered proteins (IDPs). In addition, the model quantitatively shows how subtle alterations of charge placement in the primary sequence-while maintaining the same charge composition-can lead to significant changes in conformation, even as drastic as a coil (swelled above a purely random coil) to globule (collapsed below a random coil) and vice versa. The theory provides insights on how to control (enhance or suppress) these changes by tuning the temperature (or solution condition) and charge decoration. As an application, we predict the distribution of conformations (at room temperature) of all naturally occurring IDPs in the DisProt database and notice significant size variation even among IDPs with a similar composition of positive and negative charges. Based on this, we provide a new diagram-of-states delineating the sequence-conformation relation for proteins in the DisProt database. Next, we study the effect of post-translational modification, e.g., phosphorylation, on IDP conformations. Modifications as little as two-site phosphorylation can significantly alter the size of an IDP with everything else being constant (temperature, salt concentration, etc.). However, not all possible modification sites have the same effect on protein conformations; there are certain "hot spots" that can cause maximal change in conformation. The location of these "hot spots" in the parent sequence can readily be identified by using a sequence charge decoration metric originally introduced by Sawle and Ghosh. The ability of our model to predict conformations (both expanded and collapsed states) of IDPs at a high-throughput level can provide valuable insights into the different mechanisms by which phosphorylation/charge mutation controls IDP function.
Copy number variation in CEP57L1 predisposes to congenital absence of bilateral ACL and PCL ligaments.

PubMed

Liu, Yichuan; Li, Yun; March, Michael E; Nguyen, Kenny; Kenny, Nguyen; Xu, Kexiang; Wang, Fengxiang; Guo, Yiran; Keating, Brendan; Glessner, Joseph; Li, Jiankang; Ganley, Theodore J; Zhang, Jianguo; Deardorff, Matthew A; Xu, Xun; Hakonarson, Hakon

2015-11-11

Absence of the anterior (ACL) or posterior cruciate ligament (PCL) are rare congenital malformations that result in knee joint instability, with a prevalence of 1.7 per 100,000 live births and can be associated with other lower-limb abnormalities such as ACL agnesia and absence of the menisci of the knee. While a few cases of absence of ACL/PCL are reported in the literature, a number of large familial case series of related conditions such as ACL agnesia suggest a potential underlying monogenic etiology. We performed whole exome sequencing of a family with two individuals affected by ACL/PCL. We identified copy number variation (CNV) deletion impacting the exon sequences of CEP57L1, present in the affected mother and her affected daughter based on the exome sequencing data. The deletion was validated using quantitative PCR (qPCR), and the gene was confirmed to be expressed in ACL ligament tissue. Interestingly, we detected reduced expression of CEP57L1 in Epstein-Barr virus (EBV) cells from the two patients in comparison with healthy controls. Evaluation of 3D protein structure showed that the helix-binding sites of the protein remain intact with the deletion, but other functional binding sites related to microtubule attachment are missing. The specificity of the CNV deletion was confirmed by showing that it was absent in ~700 exome sequencing samples as well as in the database of genomic variations (DGV), a database containing large numbers of annotated CNVs from previous scientific reports. We identified a novel CNV deletion that was inherited through an autosomal dominant transmission from an affected mother to her affected daughter, both of whom suffered from the absence of the anterior and posterior cruciate ligaments of the knees.
Isolation and molecular characterization of partial FSH and LH receptor genes in Arabian camels (Camelus dromedarius)

PubMed Central

Jelokhani-Niaraki, Saber; Tahmoorespur, Mojtaba; Bitaraf-Sani, Morteza

2015-01-01

Very little is known about LHR and FSHR genes of domestic dromedary camels. The main objective of this study was to determine and analyze partial genomic regions of FSHR and LHR genes in dromedary camels for the first time. To this end, a total of50 DNA samples belonging to dromedary camels raised in Iran were sent for sequencing (25 samples of each gene). We compared the nucleotide sequences of Camelus dromedarius with corresponding sequences of previously published FSHR and LHR genes in bactrian camels and other species. According to the data, the same nucleotide variation was identified in both regions of the two camel species. The alignment of deduced protein sequences of the two different species revealed an amino acid variation at the FSHR region. No evidence of amino acid variation was observed, however, in LHR sequences. Phylogenetic analysis indicated that both camel species had a close relationship and clustered together in a separate branch. This was further confirmed by genetic distance values illustrating significant sequence identity between Camelus dromedarius and Camelus bactrianus. Interestingly, sequence comparisons revealed heterozygote patterns in FSHR sequences isolated from dromedary camels of Iran. In comparison to other species, this camel contains three amino acid substitutions at 5, 67, and 105 positions in the FSHR coding region. These positions are found exclusively in camels and can be considered as species specific. The results of our study can be used for hormone functionality research (FSHR and LHR) as well as reproduction-linked polymorphisms and breeding programs. PMID:27844002
Isolation and molecular characterization of partial FSH and LH receptor genes in Arabian camels (Camelus dromedarius).

PubMed

Jelokhani-Niaraki, Saber; Tahmoorespur, Mojtaba; Bitaraf-Sani, Morteza

2015-06-01

Very little is known about LHR and FSHR genes of domestic dromedary camels. The main objective of this study was to determine and analyze partial genomic regions of FSHR and LHR genes in dromedary camels for the first time. To this end, a total of50 DNA samples belonging to dromedary camels raised in Iran were sent for sequencing (25 samples of each gene). We compared the nucleotide sequences of Camelus dromedarius with corresponding sequences of previously published FSHR and LHR genes in bactrian camels and other species. According to the data, the same nucleotide variation was identified in both regions of the two camel species. The alignment of deduced protein sequences of the two different species revealed an amino acid variation at the FSHR region. No evidence of amino acid variation was observed, however, in LHR sequences. Phylogenetic analysis indicated that both camel species had a close relationship and clustered together in a separate branch. This was further confirmed by genetic distance values illustrating significant sequence identity between Camelus dromedarius and Camelus bactrianus . Interestingly, sequence comparisons revealed heterozygote patterns in FSHR sequences isolated from dromedary camels of Iran. In comparison to other species, this camel contains three amino acid substitutions at 5, 67, and 105 positions in the FSHR coding region. These positions are found exclusively in camels and can be considered as species specific. The results of our study can be used for hormone functionality research ( FSHR and LHR ) as well as reproduction-linked polymorphisms and breeding programs.
Frame-Insensitive Expression Cloning of Fluorescent Protein from Scolionema suvaense.

PubMed

Horiuchi, Yuki; Laskaratou, Danai; Sliwa, Michel; Ruckebusch, Cyril; Hatori, Kuniyuki; Mizuno, Hideaki; Hotta, Jun-Ichi

2018-01-26

Expression cloning from cDNA is an important technique for acquiring genes encoding novel fluorescent proteins. However, the probability of in-frame cDNA insertion following the first start codon of the vector is normally only 1/3, which is a cause of low cloning efficiency. To overcome this issue, we developed a new expression plasmid vector, pRSET-TriEX, in which transcriptional slippage was induced by introducing a DNA sequence of (dT) 14 next to the first start codon of pRSET. The effectiveness of frame-insensitive cloning was validated by inserting the gene encoding eGFP with all three possible frames to the vector. After transformation with one of these plasmids, E. coli cells expressed eGFP with no significant difference in the expression level. The pRSET-TriEX vector was then used for expression cloning of a novel fluorescent protein from Scolionema suvaense . We screened 3658 E. coli colonies transformed with pRSET-TriEX containing Scolionema suvaense cDNA, and found one colony expressing a novel green fluorescent protein, ScSuFP. The highest score in protein sequence similarity was 42% with the chain c of multi-domain green fluorescent protein like protein "ember" from Anthoathecata sp. Variations in the N- and/or C-terminal sequence of ScSuFP compared to other fluorescent proteins indicate that the expression cloning, rather than the sequence similarity-based methods, was crucial for acquiring the gene encoding ScSuFP. The absorption maximum was at 498 nm, with an extinction efficiency of 1.17 × 10⁵ M -1 ·cm -1 . The emission maximum was at 511 nm and the fluorescence quantum yield was determined to be 0.6. Pseudo-native gel electrophoresis showed that the protein forms obligatory homodimers.

Systematic pharmacogenomics analysis of a Malay whole genome: proof of concept for personalized medicine.

PubMed

Salleh, Mohd Zaki; Teh, Lay Kek; Lee, Lian Shien; Ismet, Rose Iszati; Patowary, Ashok; Joshi, Kandarp; Pasha, Ayesha; Ahmed, Azni Zain; Janor, Roziah Mohd; Hamzah, Ahmad Sazali; Adam, Aishah; Yusoff, Khalid; Hoh, Boon Peng; Hatta, Fazleen Haslinda Mohd; Ismail, Mohamad Izwan; Scaria, Vinod; Sivasubbu, Sridhar

2013-01-01

With a higher throughput and lower cost in sequencing, second generation sequencing technology has immense potential for translation into clinical practice and in the realization of pharmacogenomics based patient care. The systematic analysis of whole genome sequences to assess patient to patient variability in pharmacokinetics and pharmacodynamics responses towards drugs would be the next step in future medicine in line with the vision of personalizing medicine. Genomic DNA obtained from a 55 years old, self-declared healthy, anonymous male of Malay descent was sequenced. The subject's mother died of lung cancer and the father had a history of schizophrenia and deceased at the age of 65 years old. A systematic, intuitive computational workflow/pipeline integrating custom algorithm in tandem with large datasets of variant annotations and gene functions for genetic variations with pharmacogenomics impact was developed. A comprehensive pathway map of drug transport, metabolism and action was used as a template to map non-synonymous variations with potential functional consequences. Over 3 million known variations and 100,898 novel variations in the Malay genome were identified. Further in-depth pharmacogenetics analysis revealed a total of 607 unique variants in 563 proteins, with the eventual identification of 4 drug transport genes, 2 drug metabolizing enzyme genes and 33 target genes harboring deleterious SNVs involved in pharmacological pathways, which could have a potential role in clinical settings. The current study successfully unravels the potential of personal genome sequencing in understanding the functionally relevant variations with potential influence on drug transport, metabolism and differential therapeutic outcomes. These will be essential for realizing personalized medicine through the use of comprehensive computational pipeline for systematic data mining and analysis.
Complete mitochondrial genome of Xingguo red carp (Cyprinus carpio var. singuonensis) and purse red carp (Cyprinus carpio var. wuyuanensis).

PubMed

Hu, Guang-Fu; Liu, Xiang-Jiang; Li, Zhong; Liang, Hong-Wei; Hu, Shao-Na; Zou, Gui-Wei

2016-01-01

The complete mitochondrial genomes of Xingguo red carp (Cyprinus carpio var. singuonensis) and purse red carp (Cyprinus carpio var. wuyuanensis) were sequenced. Comparison of these two mitochondrial genomes revealed that the mtDNAs of these two common carp varieties were remarkably similar in genome length, gene order and content, and AT content. However, size variation between these two mitochondrial genomes presented here showed 39 site differences in overall length. About 2 site differences were located in rRNAs, 3 in tRNAs, 3 in the control region, 31 in protein-coding genes. Thirty-one variable bases in the protein-coding regions between the two varieties mitochondrial sequences led to three variable amino acids, which were mainly located in the protein ND5 and ND4.
The wheat cytochrome oxidase subunit II gene has an intron insert and three radical amino acid changes relative to maize

PubMed Central

Bonen, Linda; Boer, Poppo H.; Gray, Michael W.

1984-01-01

We have determined the sequence of the wheat mitochondrial gene for cytochrome oxidase subunit II (COII) and find that its derived protein sequence differs from that of maize at only three amino acid positions. Unexpectedly, all three replacements are non-conservative ones. The wheat COII gene has a highly-conserved intron at the same position as in maize, but the wheat intron is 1.5 times longer because of an insert relative to its maize counterpart. Hybridization analysis of mitochondrial DNA from rye, pea, broad bean and cucumber indicates strong sequence conservation of COII coding sequences among all these higher plants. However, only rye and maize mitochondrial DNA show homology with wheat COII intron sequences and rye alone with intron-insert sequences. We find that a sequence identical to the region of the 5' exon corresponding to the transmembrane domain of the COII protein is present at a second genomic location in wheat mitochondria. These variations in COII gene structure and size, as well as the presence of repeated COII sequences, illustrate at the DNA sequence level, factors which contribute to higher plant mitochondrial DNA diversity and complexity. ImagesFig. 3.Fig. 4.Fig. 5. PMID:16453565
Genetic variation in potential Giardia vaccine candidates cyst wall protein 2 and α1-giardin.

PubMed

Radunovic, Matej; Klotz, Christian; Saghaug, Christina Skår; Brattbakk, Hans-Richard; Aebischer, Toni; Langeland, Nina; Hanevik, Kurt

2017-08-01

Giardia is a prevalent intestinal parasitic infection. The trophozoite structural protein a1-giardin (a1-g) and the cyst protein cyst wall protein 2 (CWP2) have shown promise as Giardia vaccine antigen candidates in murine models. The present study assesses the genetic diversity of a1-g and CWP2 between and within assemblages A and B in human clinical isolates. a1-g and CWP2 sequences were acquired from 15 Norwegian isolates by PCR amplification and 20 sequences from German cultured isolates by whole genome sequencing. Sequences were aligned to reference genomes from assemblage A2 and B to identify genetic variance. Genetic diversity was found between assemblage A and B reference sequences for both a1-g (90.8% nucleotide identity) and CWP2 (82.5% nucleotide identity). However, for a1-g, this translated into only 3 amino acid (aa) substitutions, while for CWP2 there were 41 aa substitutions, and also one aa deletion. Genetic diversity within assemblage B was larger; nucleotide identity 92.0% for a1-g and 94.3% for CWP2, than within assemblage A (nucleotide identity 99.0% for a1-g and 99.7% for CWP2). For CWP2, the diversity on both nucleotide and protein level was higher in the C-terminal end. Predicted antigenic epitopes were not affected for a1-g, but partially for CWP2. Despite genetic diversity in a1-g, we found aa sequence, characteristics, and antigenicity to be well preserved. CWP2 showed more aa variance and potential antigenic differences. Several CWP2 antigens might be necessary in a future Giardia vaccine to provide cross protection against both Giardia assemblages infecting humans.
C-Terminal DxD-Containing Sequences within Paramyxovirus Nucleocapsid Proteins Determine Matrix Protein Compatibility and Can Direct Foreign Proteins into Budding Particles

PubMed Central

Ray, Greeshma; Schmitt, Phuong Tieu

2016-01-01

ABSTRACT Paramyxovirus particles are formed by a budding process coordinated by viral matrix (M) proteins. M proteins coalesce at sites underlying infected cell membranes and induce other viral components, including viral glycoproteins and viral ribonucleoprotein complexes (vRNPs), to assemble at these locations from which particles bud. M proteins interact with the nucleocapsid (NP or N) components of vRNPs, and these interactions enable production of infectious, genome-containing virions. For the paramyxoviruses parainfluenza virus 5 (PIV5) and mumps virus, M-NP interaction also contributes to efficient production of virus-like particles (VLPs) in transfected cells. A DLD sequence near the C-terminal end of PIV5 NP protein was previously found to be necessary for M-NP interaction and efficient VLP production. Here, we demonstrate that 15-residue-long, DLD-containing sequences derived from either the PIV5 or Nipah virus nucleocapsid protein C-terminal ends are sufficient to direct packaging of a foreign protein, Renilla luciferase, into budding VLPs. Mumps virus NP protein harbors DWD in place of the DLD sequence found in PIV5 NP protein, and consequently, PIV5 NP protein is incompatible with mumps virus M protein. A single amino acid change converting DLD to DWD within PIV5 NP protein induced compatibility between these proteins and allowed efficient production of mumps VLPs. Our data suggest a model in which paramyxoviruses share an overall common strategy for directing M-NP interactions but with important variations contained within DLD-like sequences that play key roles in defining M/NP protein compatibilities. IMPORTANCE Paramyxoviruses are responsible for a wide range of diseases that affect both humans and animals. Paramyxovirus pathogens include measles virus, mumps virus, human respiratory syncytial virus, and the zoonotic paramyxoviruses Nipah virus and Hendra virus. Infectivity of paramyxovirus particles depends on matrix-nucleocapsid protein interactions which enable efficient packaging of encapsidated viral RNA genomes into budding virions. In this study, we have defined regions near the C-terminal ends of paramyxovirus nucleocapsid proteins that are important for matrix protein interaction and that are sufficient to direct a foreign protein into budding particles. These results advance our basic understanding of paramyxovirus genome packaging interactions and also have implications for the potential use of virus-like particles as protein delivery tools. PMID:26792745
C-Terminal DxD-Containing Sequences within Paramyxovirus Nucleocapsid Proteins Determine Matrix Protein Compatibility and Can Direct Foreign Proteins into Budding Particles.

PubMed

Ray, Greeshma; Schmitt, Phuong Tieu; Schmitt, Anthony P

2016-01-20

Paramyxovirus particles are formed by a budding process coordinated by viral matrix (M) proteins. M proteins coalesce at sites underlying infected cell membranes and induce other viral components, including viral glycoproteins and viral ribonucleoprotein complexes (vRNPs), to assemble at these locations from which particles bud. M proteins interact with the nucleocapsid (NP or N) components of vRNPs, and these interactions enable production of infectious, genome-containing virions. For the paramyxoviruses parainfluenza virus 5 (PIV5) and mumps virus, M-NP interaction also contributes to efficient production of virus-like particles (VLPs) in transfected cells. A DLD sequence near the C-terminal end of PIV5 NP protein was previously found to be necessary for M-NP interaction and efficient VLP production. Here, we demonstrate that 15-residue-long, DLD-containing sequences derived from either the PIV5 or Nipah virus nucleocapsid protein C-terminal ends are sufficient to direct packaging of a foreign protein, Renilla luciferase, into budding VLPs. Mumps virus NP protein harbors DWD in place of the DLD sequence found in PIV5 NP protein, and consequently, PIV5 NP protein is incompatible with mumps virus M protein. A single amino acid change converting DLD to DWD within PIV5 NP protein induced compatibility between these proteins and allowed efficient production of mumps VLPs. Our data suggest a model in which paramyxoviruses share an overall common strategy for directing M-NP interactions but with important variations contained within DLD-like sequences that play key roles in defining M/NP protein compatibilities. Paramyxoviruses are responsible for a wide range of diseases that affect both humans and animals. Paramyxovirus pathogens include measles virus, mumps virus, human respiratory syncytial virus, and the zoonotic paramyxoviruses Nipah virus and Hendra virus. Infectivity of paramyxovirus particles depends on matrix-nucleocapsid protein interactions which enable efficient packaging of encapsidated viral RNA genomes into budding virions. In this study, we have defined regions near the C-terminal ends of paramyxovirus nucleocapsid proteins that are important for matrix protein interaction and that are sufficient to direct a foreign protein into budding particles. These results advance our basic understanding of paramyxovirus genome packaging interactions and also have implications for the potential use of virus-like particles as protein delivery tools. Copyright © 2016, American Society for Microbiology. All Rights Reserved.
Evolution and Diversity in Human Herpes Simplex Virus Genomes

PubMed Central

Gatherer, Derek; Ochoa, Alejandro; Greenbaum, Benjamin; Dolan, Aidan; Bowden, Rory J.; Enquist, Lynn W.; Legendre, Matthieu; Davison, Andrew J.

2014-01-01

Herpes simplex virus 1 (HSV-1) causes a chronic, lifelong infection in >60% of adults. Multiple recent vaccine trials have failed, with viral diversity likely contributing to these failures. To understand HSV-1 diversity better, we comprehensively compared 20 newly sequenced viral genomes from China, Japan, Kenya, and South Korea with six previously sequenced genomes from the United States, Europe, and Japan. In this diverse collection of passaged strains, we found that one-fifth of the newly sequenced members share a gene deletion and one-third exhibit homopolymeric frameshift mutations (HFMs). Individual strains exhibit genotypic and potential phenotypic variation via HFMs, deletions, short sequence repeats, and single-nucleotide polymorphisms, although the protein sequence identity between strains exceeds 90% on average. In the first genome-scale analysis of positive selection in HSV-1, we found signs of selection in specific proteins and residues, including the fusion protein glycoprotein H. We also confirmed previous results suggesting that recombination has occurred with high frequency throughout the HSV-1 genome. Despite this, the HSV-1 strains analyzed clustered by geographic origin during whole-genome distance analysis. These data shed light on likely routes of HSV-1 adaptation to changing environments and will aid in the selection of vaccine antigens that are invariant worldwide. PMID:24227835
An exploration of the sequence of a 2.9-Mb region of the genome of Drosophila melanogaster: the Adh region.

PubMed Central

Ashburner, M; Misra, S; Roote, J; Lewis, S E; Blazej, R; Davis, T; Doyle, C; Galle, R; George, R; Harris, N; Hartzell, G; Harvey, D; Hong, L; Houston, K; Hoskins, R; Johnson, G; Martin, C; Moshrefi, A; Palazzolo, M; Reese, M G; Spradling, A; Tsang, G; Wan, K; Whitelaw, K; Celniker, S

1999-01-01

A contiguous sequence of nearly 3 Mb from the genome of Drosophila melanogaster has been sequenced from a series of overlapping P1 and BAC clones. This region covers 69 chromosome polytene bands on chromosome arm 2L, including the genetically well-characterized "Adh region." A computational analysis of the sequence predicts 218 protein-coding genes, 11 tRNAs, and 17 transposable element sequences. At least 38 of the protein-coding genes are arranged in clusters of from 2 to 6 closely related genes, suggesting extensive tandem duplication. The gene density is one protein-coding gene every 13 kb; the transposable element density is one element every 171 kb. Of 73 genes in this region identified by genetic analysis, 49 have been located on the sequence; P-element insertions have been mapped to 43 genes. Ninety-five (44%) of the known and predicted genes match a Drosophila EST, and 144 (66%) have clear similarities to proteins in other organisms. Genes known to have mutant phenotypes are more likely to be represented in cDNA libraries, and far more likely to have products similar to proteins of other organisms, than are genes with no known mutant phenotype. Over 650 chromosome aberration breakpoints map to this chromosome region, and their nonrandom distribution on the genetic map reflects variation in gene spacing on the DNA. This is the first large-scale analysis of the genome of D. melanogaster at the sequence level. In addition to the direct results obtained, this analysis has allowed us to develop and test methods that will be needed to interpret the complete sequence of the genome of this species.Before beginning a Hunt, it is wise to ask someone what you are looking for before you begin looking for it. Milne 1926 PMID:10471707
The Anopheles stephensi odorant binding protein 1 (AsteObp1) gene: a new molecular marker for biological forms diagnosis.

PubMed

Gholizadeh, S; Firooziyan, S; Ladonni, H; Hajipirloo, H Mohammadzadeh; Djadid, N Dinparast; Hosseini, A; Raz, A

2015-06-01

Anopheles (Cellia) stephensi Liston 1901 is known as an Asian malaria vector. Three biological forms, namely "mysorensis", "intermediate", and "type" have been earlier reported in this species. Nevertheless, the present morphological and molecular information is insufficient to diagnose these forms. During this investigation, An. stephensi biological forms were morphologically identified and sequenced for odorant-binding protein 1 (Obp1) gene. Also, intron I sequences were used to construct phylogenetic trees. Despite nucleotide sequence variation in exon of AsteObp1, nearly 100% identity was observed at the amino acid level among the three biological forms. In order to overcome difficulties in using egg morphology characters, intron I sequences of An. stephensi Obp1 opens new molecular way to the identification of the main Asian malaria vector biological forms. However, multidisciplinary studies are needed to establish the taxonomic status of An. stephensi. Copyright © 2015 Elsevier B.V. All rights reserved.
Comprehensive global amino acid sequence analysis of PB1F2 protein of influenza A H5N1 viruses and the influenza A virus subtypes responsible for the 20th‐century pandemics

PubMed Central

Pasricha, Gunisha; Mishra, Akhilesh C.; Chakrabarti, Alok K.

2012-01-01

Please cite this paper as: Pasricha et al. (2012) Comprehensive global amino acid sequence analysis of PB1F2 protein of influenza A H5N1 viruses and the Influenza A virus subtypes responsible for the 20th‐century pandemics. Influenza and Other Respiratory Viruses 7(4), 497–505. Background PB1F2 is the 11th protein of influenza A virus translated from +1 alternate reading frame of PB1 gene. Since the discovery, varying sizes and functions of the PB1F2 protein of influenza A viruses have been reported. Selection of PB1 gene segment in the pandemics, variable size and pleiotropic effect of PB1F2 intrigued us to analyze amino acid sequences of this protein in various influenza A viruses. Methods Amino acid sequences for PB1F2 protein of influenza A H5N1, H1N1, H2N2, and H3N2 subtypes were obtained from Influenza Research Database. Multiple sequence alignments of the PB1F2 protein sequences of the aforementioned subtypes were used to determine the size, variable and conserved domains and to perform mutational analysis. Results Analysis showed that 96·4% of the H5N1 influenza viruses harbored full‐length PB1F2 protein. Except for the 2009 pandemic H1N1 virus, all the subtypes of the 20th‐century pandemic influenza viruses contained full‐length PB1F2 protein. Through the years, PB1F2 protein of the H1N1 and H3N2 viruses has undergone much variation. PB1F2 protein sequences of H5N1 viruses showed both human‐ and avian host‐specific conserved domains. Global database of PB1F2 protein revealed that N66S mutation was present only in 3·8% of the H5N1 strains. We found a novel mutation, N84S in the PB1F2 protein of 9·35% of the highly pathogenic avian influenza H5N1 influenza viruses. Conclusions Varying sizes and mutations of the PB1F2 protein in different influenza A virus subtypes with pandemic potential were obtained. There was genetic divergence of the protein in various hosts which highlighted the host‐specific evolution of the virus. However, studies are required to correlate this sequence variability with the virulence and pathogenicity. PMID:22788742
Prediction of phenotypes of missense mutations in human proteins from biological assemblies.

PubMed

Wei, Qiong; Xu, Qifang; Dunbrack, Roland L

2013-02-01

Single nucleotide polymorphisms (SNPs) are the most frequent variation in the human genome. Nonsynonymous SNPs that lead to missense mutations can be neutral or deleterious, and several computational methods have been presented that predict the phenotype of human missense mutations. These methods use sequence-based and structure-based features in various combinations, relying on different statistical distributions of these features for deleterious and neutral mutations. One structure-based feature that has not been studied significantly is the accessible surface area within biologically relevant oligomeric assemblies. These assemblies are different from the crystallographic asymmetric unit for more than half of X-ray crystal structures. We find that mutations in the core of proteins or in the interfaces in biological assemblies are significantly more likely to be disease-associated than those on the surface of the biological assemblies. For structures with more than one protein in the biological assembly (whether the same sequence or different), we find the accessible surface area from biological assemblies provides a statistically significant improvement in prediction over the accessible surface area of monomers from protein crystal structures (P = 6e-5). When adding this information to sequence-based features such as the difference between wildtype and mutant position-specific profile scores, the improvement from biological assemblies is statistically significant but much smaller (P = 0.018). Combining this information with sequence-based features in a support vector machine leads to 82% accuracy on a balanced dataset of 50% disease-associated mutations from SwissVar and 50% neutral mutations from human/primate sequence differences in orthologous proteins. Copyright © 2012 Wiley Periodicals, Inc.
The Proteins API: accessing key integrated protein and genome information

PubMed Central

Antunes, Ricardo; Alpi, Emanuele; Gonzales, Leonardo; Liu, Wudong; Luo, Jie; Qi, Guoying; Turner, Edd

2017-01-01

Abstract The Proteins API provides searching and programmatic access to protein and associated genomics data such as curated protein sequence positional annotations from UniProtKB, as well as mapped variation and proteomics data from large scale data sources (LSS). Using the coordinates service, researchers are able to retrieve the genomic sequence coordinates for proteins in UniProtKB. This, the LSS genomics and proteomics data for UniProt proteins is programmatically only available through this service. A Swagger UI has been implemented to provide documentation, an interface for users, with little or no programming experience, to ‘talk’ to the services to quickly and easily formulate queries with the services and obtain dynamically generated source code for popular programming languages, such as Java, Perl, Python and Ruby. Search results are returned as standard JSON, XML or GFF data objects. The Proteins API is a scalable, reliable, fast, easy to use RESTful services that provides a broad protein information resource for users to ask questions based upon their field of expertise and allowing them to gain an integrated overview of protein annotations available to aid their knowledge gain on proteins in biological processes. The Proteins API is available at (http://www.ebi.ac.uk/proteins/api/doc). PMID:28383659
The Proteins API: accessing key integrated protein and genome information.

PubMed

Nightingale, Andrew; Antunes, Ricardo; Alpi, Emanuele; Bursteinas, Borisas; Gonzales, Leonardo; Liu, Wudong; Luo, Jie; Qi, Guoying; Turner, Edd; Martin, Maria

2017-07-03

The Proteins API provides searching and programmatic access to protein and associated genomics data such as curated protein sequence positional annotations from UniProtKB, as well as mapped variation and proteomics data from large scale data sources (LSS). Using the coordinates service, researchers are able to retrieve the genomic sequence coordinates for proteins in UniProtKB. This, the LSS genomics and proteomics data for UniProt proteins is programmatically only available through this service. A Swagger UI has been implemented to provide documentation, an interface for users, with little or no programming experience, to 'talk' to the services to quickly and easily formulate queries with the services and obtain dynamically generated source code for popular programming languages, such as Java, Perl, Python and Ruby. Search results are returned as standard JSON, XML or GFF data objects. The Proteins API is a scalable, reliable, fast, easy to use RESTful services that provides a broad protein information resource for users to ask questions based upon their field of expertise and allowing them to gain an integrated overview of protein annotations available to aid their knowledge gain on proteins in biological processes. The Proteins API is available at (http://www.ebi.ac.uk/proteins/api/doc). © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Unraveling patterns of site-to-site synonymous rates variation and associated gene properties of protein domains and families.

PubMed

Dimitrieva, Slavica; Anisimova, Maria

2014-01-01

In protein-coding genes, synonymous mutations are often thought not to affect fitness and therefore are not subject to natural selection. Yet increasingly, cases of non-neutral evolution at certain synonymous sites were reported over the last decade. To evaluate the extent and the nature of site-specific selection on synonymous codons, we computed the site-to-site synonymous rate variation (SRV) and identified gene properties that make SRV more likely in a large database of protein-coding gene families and protein domains. To our knowledge, this is the first study that explores the determinants and patterns of the SRV in real data. We show that the SRV is widespread in the evolution of protein-coding sequences, putting in doubt the validity of the synonymous rate as a standard neutral proxy. While protein domains rarely undergo adaptive evolution, the SRV appears to play important role in optimizing the domain function at the level of DNA. In contrast, protein families are more likely to evolve by positive selection, but are less likely to exhibit SRV. Stronger SRV was detected in genes with stronger codon bias and tRNA reusage, those coding for proteins with larger number of interactions or forming larger number of structures, located in intracellular components and those involved in typically conserved complex processes and functions. Genes with extreme SRV show higher expression levels in nearly all tissues. This indicates that codon bias in a gene, which often correlates with gene expression, may often be a site-specific phenomenon regulating the speed of translation along the sequence, consistent with the co-translational folding hypothesis. Strikingly, genes with SRV were strongly overrepresented for metabolic pathways and those associated with several genetic diseases, particularly cancers and diabetes.
Analysis of human herpesvirus-6 IE1 sequence variation in clinical samples.

PubMed

Stanton, Richard; Wilkinson, Gavin W G; Fox, Julie D

2003-12-01

Herpesvirus immediate early (IE) proteins are known to play key roles in establishing productive infections, regulating reactivation from latency, and creating a cellular environment favourable to viral replication. Human herpesvirus-6 (HHV-6) IE genes have not been studied as intensively as their homologues in the prototype betaherpesvirus human cytomegalovirus (HCMV). Whilst the HCMV IE1 gene is relatively conserved, early studies indicated that HHV-6 IE1 exhibited a high level of sequence variation between HHV-6A and HHV-6B isolates, although the observation was based primarily on virus stocks that had been isolated and propagated in vitro. In this study, we investigated the level of HHV-6 IE1 sequence variation in vivo by direct sequencing of circulating virus in clinical samples without prior in vitro culture. Sequences exactly matching those reported for reference HHV-6 isolates were identified in clinical samples, thus the HHV-6 laboratory strains used in the majority of in vitro studies appear to be representative of virus circulating in vivo with respect to the IE1 gene. The HHV-6 IE1 sequence is also conserved in reference strains that had been passaged extensively in vitro. The high degree of divergence between variant A and B type IE1 sequences was confirmed, but interestingly HHV-6B IE1 sequences were observed to further segregate into two distinct subgroups, with the laboratory strains Z29 and HST representative of these two subgroups. Within each HHV-6B subgroup, a remarkably high level of homology was observed. Thus the HHV-6 IE1 sequence appears highly stable, underlining its potential importance to the viral life cycle. Copyright 2003 Wiley-Liss, Inc.
Diversity and Evolution of Bacterial Twin Arginine Translocase Protein, TatC, Reveals a Protein Secretion System That Is Evolving to Fit Its Environmental Niche

PubMed Central

Simone, Domenico; Bay, Denice C.; Leach, Thorin; Turner, Raymond J.

2013-01-01

Background The twin-arginine translocation (Tat) protein export system enables the transport of fully folded proteins across a membrane. This system is composed of two integral membrane proteins belonging to TatA and TatC protein families and in some systems a third component, TatB, a homolog of TatA. TatC participates in substrate protein recognition through its interaction with a twin arginine leader peptide sequence. Methodology/Principal Findings The aim of this study was to explore TatC diversity, evolution and sequence conservation in bacteria to identify how TatC is evolving and diversifying in various bacterial phyla. Surveying bacterial genomes revealed that 77% of all species possess one or more tatC loci and half of these classes possessed only tatC and tatA genes. Phylogenetic analysis of diverse TatC homologues showed that they were primarily inherited but identified a small subset of taxonomically unrelated bacteria that exhibited evidence supporting lateral gene transfer within an ecological niche. Examination of bacilli tatCd/tatCy isoform operons identified a number of known and potentially new Tat substrate genes based on their frequent association to tatC loci. Evolutionary analysis of these Bacilli isoforms determined that TatCy was the progenitor of TatCd. A bacterial TatC consensus sequence was determined and highlighted conserved and variable regions within a three dimensional model of the Escherichia coli TatC protein. Comparative analysis between the TatC consensus sequence and Bacilli TatCd/y isoform consensus sequences revealed unique sites that may contribute to isoform substrate specificity or make TatA specific contacts. Synonymous to non-synonymous nucleotide substitution analyses of bacterial tatC homologues determined that tatC sequence variation differs dramatically between various classes and suggests TatC specialization in these species. Conclusions/Significance TatC proteins appear to be diversifying within particular bacterial classes and its specialization may be driven by the substrates it transports and the environment of its host. PMID:24236045
Evolution-Based Functional Decomposition of Proteins

PubMed Central

Rivoire, Olivier; Reynolds, Kimberly A.; Ranganathan, Rama

2016-01-01

The essential biological properties of proteins—folding, biochemical activities, and the capacity to adapt—arise from the global pattern of interactions between amino acid residues. The statistical coupling analysis (SCA) is an approach to defining this pattern that involves the study of amino acid coevolution in an ensemble of sequences comprising a protein family. This approach indicates a functional architecture within proteins in which the basic units are coupled networks of amino acids termed sectors. This evolution-based decomposition has potential for new understandings of the structural basis for protein function. To facilitate its usage, we present here the principles and practice of the SCA and introduce new methods for sector analysis in a python-based software package (pySCA). We show that the pattern of amino acid interactions within sectors is linked to the divergence of functional lineages in a multiple sequence alignment—a model for how sector properties might be differentially tuned in members of a protein family. This work provides new tools for studying proteins and for generally testing the concept of sectors as the principal units of function and adaptive variation. PMID:27254668
Design and application of a data-independent precursor and product ion repository.

PubMed

Thalassinos, Konstantinos; Vissers, Johannes P C; Tenzer, Stefan; Levin, Yishai; Thompson, J Will; Daniel, David; Mann, Darrin; DeLong, Mark R; Moseley, M Arthur; America, Antoine H; Ottens, Andrew K; Cavey, Greg S; Efstathiou, Georgios; Scrivens, James H; Langridge, James I; Geromanos, Scott J

2012-10-01

The functional design and application of a data-independent LC-MS precursor and product ion repository for protein identification, quantification, and validation is conceptually described. The ion repository was constructed from the sequence search results of a broad range of discovery experiments investigating various tissue types of two closely related mammalian species. The relative high degree of similarity in protein complement, ion detection, and peptide and protein identification allows for the analysis of normalized precursor and product ion intensity values, as well as standardized retention times, creating a multidimensional/orthogonal queryable, qualitative, and quantitative space. Peptide ion map selection for identification and quantification is primarily based on replication and limited variation. The information is stored in a relational database and is used to create peptide- and protein-specific fragment ion maps that can be queried in a targeted fashion against the raw or time aligned ion detections. These queries can be conducted either individually or as groups, where the latter affords pathway and molecular machinery analysis of the protein complement. The presented results also suggest that peptide ionization and fragmentation efficiencies are highly conserved between experiments and practically independent of the analyzed biological sample when using similar instrumentation. Moreover, the data illustrate only minor variation in ionization efficiency with amino acid sequence substitutions occurring between species. Finally, the data and the presented results illustrate how LC-MS performance metrics can be extracted and utilized to ensure optimal performance of the employed analytical workflows.
Assessing fluctuating evolutionary pressure in yeast and mammal evolutionary rate covariation using bioinformatics of meiotic protein genetic sequences

NASA Astrophysics Data System (ADS)

Dehipawala, Sunil; Nguyen, A.; Tremberger, G.; Cheung, E.; Holden, T.; Lieberman, D.; Cheung, T.

2013-09-01

The evolutionary rate co-variation in meiotic proteins has been reported for yeast and mammal using phylogenic branch lengths which assess retention, duplication and mutation. The bioinformatics of the corresponding DNA sequences could be classified as a diagram of fractal dimension and Shannon entropy. Results from biomedical gene research provide examples on the diagram methodology. The identification of adaptive selection using entropy marker and functional-structural diversity using fractal dimension would support a regression analysis where the coefficient of determination would serve as evolutionary pathway marker for DNA sequences and be an important component in the astrobiology community. Comparisons between biomedical genes such as EEF2 (elongation factor 2 human, mouse, etc), WDR85 in epigenetics, HAR1 in human specificity, clinical trial targeted cancer gene CD47, SIRT6 in spermatogenesis, and HLA-C in mosquito bite immunology demonstrate the diagram classification methodology. Comparisons to the SEPT4-XIAP pair in stem cell apoptosis, testesexpressed taste genes TAS1R3-GNAT3 pair, and amyloid beta APLP1-APLP2 pair with the yeast-mammal DNA sequences for meiotic proteins RAD50-MRE11 pair and NCAPD2-ICK pair have accounted for the observed fluctuating evolutionary pressure systematically. Regression with high R-sq values or a triangular-like cluster pattern for concordant pairs in co-variation among the studied species could serve as evidences for the possible location of common ancestors in the entropy-fractal dimension diagram, consistent with an example of the human-chimp common ancestor study using the FOXP2 regulated genes reported in human fetal brain study. The Deinococcus radiodurans R1 Rad-A could be viewed as an outlier in the RAD50 diagram and also in the free energy versus fractal dimension regression Cook's distance, consistent with a non-Earth source for this radiation resistant bacterium. Convergent and divergent fluctuating evolutionary pressure could be studied with extension to genetic sequences in organisms in possible astrobiology conditions, with the assumption that the continuation of a book of life would require meiotic proteins everywhere in the universe.
ActiveDriverDB: human disease mutations and genome variation in post-translational modification sites of proteins

PubMed Central

Krassowski, Michal; Paczkowska, Marta; Cullion, Kim; Huang, Tina; Dzneladze, Irakli; Ouellette, B F Francis; Yamada, Joseph T; Fradet-Turcotte, Amelie

2018-01-01

Abstract Interpretation of genetic variation is needed for deciphering genotype-phenotype associations, mechanisms of inherited disease, and cancer driver mutations. Millions of single nucleotide variants (SNVs) in human genomes are known and thousands are associated with disease. An estimated 21% of disease-associated amino acid substitutions corresponding to missense SNVs are located in protein sites of post-translational modifications (PTMs), chemical modifications of amino acids that extend protein function. ActiveDriverDB is a comprehensive human proteo-genomics database that annotates disease mutations and population variants through the lens of PTMs. We integrated >385,000 published PTM sites with ∼3.6 million substitutions from The Cancer Genome Atlas (TCGA), the ClinVar database of disease genes, and human genome sequencing projects. The database includes site-specific interaction networks of proteins, upstream enzymes such as kinases, and drugs targeting these enzymes. We also predicted network-rewiring impact of mutations by analyzing gains and losses of kinase-bound sequence motifs. ActiveDriverDB provides detailed visualization, filtering, browsing and searching options for studying PTM-associated mutations. Users can upload mutation datasets interactively and use our application programming interface in pipelines. Integrative analysis of mutations and PTMs may help decipher molecular mechanisms of phenotypes and disease, as exemplified by case studies of TP53, BRCA2 and VHL. The open-source database is available at https://www.ActiveDriverDB.org. PMID:29126202

Systematic analysis of protein identity between Zika virus and other arthropod-borne viruses.

PubMed

Chang, Hsiao-Han; Huber, Roland G; Bond, Peter J; Grad, Yonatan H; Camerini, David; Maurer-Stroh, Sebastian; Lipsitch, Marc

2017-07-01

To analyse the proportions of protein identity between Zika virus and dengue, Japanese encephalitis, yellow fever, West Nile and chikungunya viruses as well as polymorphism between different Zika virus strains. We used published protein sequences for the Zika virus and obtained protein sequences for the other viruses from the National Center for Biotechnology Information (NCBI) protein database or the NCBI virus variation resource. We used BLASTP to find regions of identity between viruses. We quantified the identity between the Zika virus and each of the other viruses, as well as within-Zika virus polymorphism for all amino acid k -mers across the proteome, with k ranging from 6 to 100. We assessed accessibility of protein fragments by calculating the solvent accessible surface area for the envelope and nonstructural-1 (NS1) proteins. In total, we identified 294 Zika virus protein fragments with both low proportion of identity with other viruses and low levels of polymorphisms among Zika virus strains. The list includes protein fragments from all Zika virus proteins, except NS3. NS4A has the highest number (190 k -mers) of protein fragments on the list. We provide a candidate list of protein fragments that could be used when developing a sensitive and specific serological test to detect previous Zika virus infections.
Reductive evolution and the loss of PDC/PAS domains from the genus Staphylococcus

PubMed Central

2013-01-01

Background The Per-Arnt-Sim (PAS) domain represents a ubiquitous structural fold that is involved in bacterial sensing and adaptation systems, including several virulence related functions. Although PAS domains and the subclass of PhoQ-DcuS-CitA (PDC) domains have a common structure, there is limited amino acid sequence similarity. To gain greater insight into the evolution of PDC/PAS domains present in the bacterial kingdom and staphylococci in specific, the PDC/PAS domains from the genomic sequences of 48 bacteria, representing 5 phyla, were identified using the sensitive search method based on HMM-to-HMM comparisons (HHblits). Results A total of 1,007 PAS domains and 686 PDC domains distributed over 1,174 proteins were identified. For 28 Gram-positive bacteria, the distribution, organization, and molecular evolution of PDC/PAS domains were analyzed in greater detail, with a special emphasis on the genus Staphylococcus. Compared to other bacteria the staphylococci have relatively fewer proteins (6–9) containing PDC/PAS domains. As a general rule, the staphylococcal genomes examined in this study contain a core group of seven PDC/PAS domain-containing proteins consisting of WalK, SrrB, PhoR, ArlS, HssS, NreB, and GdpP. The exceptions to this rule are: 1) S. saprophyticus lacks the core NreB protein; 2) S. carnosus has two additional PAS domain containing proteins; 3) S. epidermidis, S. aureus, and S. pseudintermedius have an additional protein with two PDC domains that is predicted to code for a sensor histidine kinase; 4) S. lugdunensis has an additional PDC containing protein predicted to be a sensor histidine kinase. Conclusions This comprehensive analysis demonstrates that variation in PDC/PAS domains among bacteria has limited correlations to the genome size or pathogenicity; however, our analysis established that bacteria having a motile phase in their life cycle have significantly more PDC/PAS-containing proteins. In addition, our analysis revealed a tremendous amount of variation in the number of PDC/PAS-containing proteins within genera. This variation extended to the Staphylococcus genus, which had between 6 and 9 PDC/PAS proteins and some of these appear to be previously undescribed signaling proteins. This latter point is important because most staphylococcal proteins that contain PDC/PAS domains regulate virulence factor synthesis or antibiotic resistance. PMID:23902280
Reductive evolution and the loss of PDC/PAS domains from the genus Staphylococcus.

PubMed

Shah, Neethu; Gaupp, Rosmarie; Moriyama, Hideaki; Eskridge, Kent M; Moriyama, Etsuko N; Somerville, Greg A

2013-07-31

The Per-Arnt-Sim (PAS) domain represents a ubiquitous structural fold that is involved in bacterial sensing and adaptation systems, including several virulence related functions. Although PAS domains and the subclass of PhoQ-DcuS-CitA (PDC) domains have a common structure, there is limited amino acid sequence similarity. To gain greater insight into the evolution of PDC/PAS domains present in the bacterial kingdom and staphylococci in specific, the PDC/PAS domains from the genomic sequences of 48 bacteria, representing 5 phyla, were identified using the sensitive search method based on HMM-to-HMM comparisons (HHblits). A total of 1,007 PAS domains and 686 PDC domains distributed over 1,174 proteins were identified. For 28 Gram-positive bacteria, the distribution, organization, and molecular evolution of PDC/PAS domains were analyzed in greater detail, with a special emphasis on the genus Staphylococcus. Compared to other bacteria the staphylococci have relatively fewer proteins (6-9) containing PDC/PAS domains. As a general rule, the staphylococcal genomes examined in this study contain a core group of seven PDC/PAS domain-containing proteins consisting of WalK, SrrB, PhoR, ArlS, HssS, NreB, and GdpP. The exceptions to this rule are: 1) S. saprophyticus lacks the core NreB protein; 2) S. carnosus has two additional PAS domain containing proteins; 3) S. epidermidis, S. aureus, and S. pseudintermedius have an additional protein with two PDC domains that is predicted to code for a sensor histidine kinase; 4) S. lugdunensis has an additional PDC containing protein predicted to be a sensor histidine kinase. This comprehensive analysis demonstrates that variation in PDC/PAS domains among bacteria has limited correlations to the genome size or pathogenicity; however, our analysis established that bacteria having a motile phase in their life cycle have significantly more PDC/PAS-containing proteins. In addition, our analysis revealed a tremendous amount of variation in the number of PDC/PAS-containing proteins within genera. This variation extended to the Staphylococcus genus, which had between 6 and 9 PDC/PAS proteins and some of these appear to be previously undescribed signaling proteins. This latter point is important because most staphylococcal proteins that contain PDC/PAS domains regulate virulence factor synthesis or antibiotic resistance.
Characterization of Proteoforms with Unknown Post-translational Modifications Using the MIScore

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kou, Qiang; Zhu, Binhai; Wu, Si

Various proteoforms may be generated from a single gene due to primary structure alterations (PSAs) such as genetic variations, alternative splicing, and post-translational modifications (PTMs). Top-down mass spectrometry is capable of analyzing intact proteins and identifying patterns of multiple PSAs, making it the method of choice for studying complex proteoforms. In top-down proteomics, proteoform identification is often performed by searching tandem mass spectra against a protein sequence database that contains only one reference protein sequence for each gene or transcript variant in a proteome. Because of the incompleteness of the protein database, an identified proteoform may contain unknown PSAs comparedmore » with the reference sequence. Proteoform characterization is to identify and localize PSAs in a proteoform. Although many software tools have been proposed for proteoform identification by top-down mass spectrometry, the characterization of proteoforms in identified proteoform-spectrum matches still relies mainly on manual annotation. We propose to use the Modification Identification Score (MIScore), which is based on Bayesian models, to automatically identify and localize PTMs in proteoforms. Experiments showed that the MIScore is accurate in identifying and localizing one or two modifications.« less
Structure and genetic variability of envelope glycoproteins of two antigenic variants of caprine arthritis-encephalitis lentivirus.

PubMed

Knowles, D P; Cheevers, W P; McGuire, T C; Brassfield, A L; Harwood, W G; Stem, T A

1991-11-01

To define the structure of the caprine arthritis-encephalitis virus (CAEV) env gene and characterize genetic changes which occur during antigenic variation, we sequenced the env genes of CAEV-63 and CAEV-Co, two antigenic variants of CAEV defined by serum neutralization. The deduced primary translation product of the CAEV env gene consists of a 60- to 80-amino-acid signal peptide followed by an amino-terminal surface protein (SU) and a carboxy-terminal transmembrane protein (TM) separated by an Arg-Lys-Lys-Arg cleavage site. The signal peptide cleavage site was verified by amino-terminal amino acid sequencing of native CAEV-63 SU. In addition, immunoprecipitation of [35S]methionine-labeled CAEV-63 proteins by sera from goats immunized with recombinant vaccinia virus expressing the CAEV-63 env gene confirmed that antibodies induced by env-encoded recombinant proteins react specifically with native virion SU and TM. The env genes of CAEV-63 and CAEV-Co encode 28 conserved cysteines and 25 conserved potential N-linked glycosylation sites. Nucleotide sequence variability results in 62 amino acid changes and one deletion within the SU and 34 amino acid changes within the TM.
Structure and genetic variability of envelope glycoproteins of two antigenic variants of caprine arthritis-encephalitis lentivirus.

PubMed Central

Knowles, D P; Cheevers, W P; McGuire, T C; Brassfield, A L; Harwood, W G; Stem, T A

1991-01-01

To define the structure of the caprine arthritis-encephalitis virus (CAEV) env gene and characterize genetic changes which occur during antigenic variation, we sequenced the env genes of CAEV-63 and CAEV-Co, two antigenic variants of CAEV defined by serum neutralization. The deduced primary translation product of the CAEV env gene consists of a 60- to 80-amino-acid signal peptide followed by an amino-terminal surface protein (SU) and a carboxy-terminal transmembrane protein (TM) separated by an Arg-Lys-Lys-Arg cleavage site. The signal peptide cleavage site was verified by amino-terminal amino acid sequencing of native CAEV-63 SU. In addition, immunoprecipitation of [35S]methionine-labeled CAEV-63 proteins by sera from goats immunized with recombinant vaccinia virus expressing the CAEV-63 env gene confirmed that antibodies induced by env-encoded recombinant proteins react specifically with native virion SU and TM. The env genes of CAEV-63 and CAEV-Co encode 28 conserved cysteines and 25 conserved potential N-linked glycosylation sites. Nucleotide sequence variability results in 62 amino acid changes and one deletion within the SU and 34 amino acid changes within the TM. Images PMID:1656067
Characterization of the variable-number tandem repeats in vrrA from different Bacillus anthracis isolates

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jackson, P.J.; Walthers, E.A.; Richmond, K.L.

1997-04-01

PCR analysis of 198 Bacillus anthracis isolates revealed a variable region of DNA sequence differing in length among the isolates. Five Polymorphisms differed by the presence Of two to six copies of the 12-bp tandem repeat 5{prime}-CAATATCAACAA-3{prime}. This variable-number tandem repeat (VNTR) region is located within a larger sequence containing one complete open reading frame that encodes a putative 30-kDa protein. Length variation did not change the reading frame of the encoded protein and only changed the copy number of a 4-amino-acid sequence (QYQQ) from 2 to 6. The structure of the VNTR region suggests that these multiple repeats aremore » generated by recombination or polymerase slippage. Protein structures predicted from the reverse-translated DNA sequence suggest that any structural changes in the encoded protein are confined to the region encoded by the VNTR sequence. Copy number differences in the VNTR region were used to define five different B. anthracis alleles. Characterization of 198 isolates revealed allele frequencies of 6.1, 17.7, 59.6, 5.6, and 11.1% sequentially from shorter to longer alleles. The high degree of polymorphism in the VNTR region provides a criterion for assigning isolates to five allelic categories. There is a correlation between categories and geographic distribution. Such molecular markers can be used to monitor the epidemiology of anthrax outbreaks in domestic and native herbivore populations. 22 refs., 4 figs., 3 tabs.« less
A complete mitochondrial genome of wheat (Triticum aestivum cv. Chinese Yumai), and fast evolving mitochondrial genes in higher plants.

PubMed

Cui, Peng; Liu, Huitao; Lin, Qiang; Ding, Feng; Zhuo, Guoyin; Hu, Songnian; Liu, Dongcheng; Yang, Wenlong; Zhan, Kehui; Zhang, Aimin; Yu, Jun

2009-12-01

Plant mitochondrial genomes, encoding necessary proteins involved in the system of energy production, play an important role in the development and reproduction of the plant. They occupy a specific evolutionary pattern relative to their nuclear counterparts. Here, we determined the winter wheat (Triticum aestivum cv. Chinese Yumai) mitochondrial genome in a length of 452 and 526 bp by shotgun sequencing its BAC library. It contains 202 genes, including 35 known protein-coding genes, three rRNA and 17 tRNA genes, as well as 149 open reading frames (ORFs; greater than 300 bp in length). The sequence is almost identical to the previously reported sequence of the spring wheat (T. aestivum cv. Chinese Spring); we only identified seven SNPs (three transitions and four transversions) and 10 indels (insertions and deletions) between the two independently acquired sequences, and all variations were found in non-coding regions. This result confirmed the accuracy of the previously reported mitochondrial sequence of the Chinese Spring wheat. The nucleotide frequency and codon usage of wheat are common among the lineage of higher plant with a high AT-content of 58%. Molecular evolutionary analysis demonstrated that plant mitochondrial genomes evolved at different rates, which may correlate with substantial variations in metabolic rate and generation time among plant lineages. In addition, through the estimation of the ratio of non-synonymous to synonymous substitution rates between orthologous mitochondrion-encoded genes of higher plants, we found an accelerated evolutionary rate that seems to be the result of relaxed selection.
Re-sequencing and genetic variation identification of a rice line with ideal plant architecture.

PubMed

Li, Shuangcheng; Xie, Kailong; Li, Wenbo; Zou, Ting; Ren, Yun; Wang, Shiquan; Deng, Qiming; Zheng, Aiping; Zhu, Jun; Liu, Huainian; Wang, Lingxia; Ai, Peng; Gao, Fengyan; Huang, Bin; Cao, Xuemei; Li, Ping

2012-12-01

The ideal plant architecture (IPA) includes several important characteristics such as low tiller numbers, few or no unproductive tillers, more grains per panicle, and thick and sturdy stems. We have developed an indica restorer line 7302R that displays the IPA phenotype in terms of tiller number, grain number, and stem strength. However, its mechanism had to be clarified. We performed re-sequencing and genome-wide variation analysis of 7302R using the Solexa sequencing technology. With the genomic sequence of the indica cultivar 9311 as reference, 307 627 SNPs, 57 372 InDels, and 3 096 SVs were identified in the 7302R genome. The 7302R-specific variations were investigated via the synteny analysis of all the SNPs of 7302R with those of the previous sequenced none-IPA-type lines IR24, MH63, and SH527. Moreover, we found 178 168 7302R-specific SNPs across the whole genome and 30 239 SNPs in the predicted mRNA regions, among which 8 517 were Non-syn CDS. In addition, 263 large-effect SNPs that were expected to affect the integrity of encoded proteins were identified from the 7302R-specific SNPs. SNPs of several important previously cloned rice genes were also identified by aligning the 7302R sequence with other sequence lines. Our results provided several candidates account for the IPA phenotype of 7302R. These results therefore lay the groundwork for long-term efforts to uncover important genes and alleles for rice plant architecture construction, also offer useful data resources for future genetic and genomic studies in rice.
The population genomics of rhesus macaques (Macaca mulatta) based on whole-genome sequences

PubMed Central

Xue, Cheng; Raveendran, Muthuswamy; Harris, R. Alan; Fawcett, Gloria L.; Liu, Xiaoming; White, Simon; Dahdouli, Mahmoud; Rio Deiros, David; Below, Jennifer E.; Salerno, William; Cox, Laura; Fan, Guoping; Ferguson, Betsy; Horvath, Julie; Johnson, Zach; Kanthaswamy, Sree; Kubisch, H. Michael; Liu, Dahai; Platt, Michael; Smith, David G.; Sun, Binghua; Vallender, Eric J.; Wang, Feng; Wiseman, Roger W.; Chen, Rui; Muzny, Donna M.; Gibbs, Richard A.; Yu, Fuli; Rogers, Jeffrey

2016-01-01

Rhesus macaques (Macaca mulatta) are the most widely used nonhuman primate in biomedical research, have the largest natural geographic distribution of any nonhuman primate, and have been the focus of much evolutionary and behavioral investigation. Consequently, rhesus macaques are one of the most thoroughly studied nonhuman primate species. However, little is known about genome-wide genetic variation in this species. A detailed understanding of extant genomic variation among rhesus macaques has implications for the use of this species as a model for studies of human health and disease, as well as for evolutionary population genomics. Whole-genome sequencing analysis of 133 rhesus macaques revealed more than 43.7 million single-nucleotide variants, including thousands predicted to alter protein sequences, transcript splicing, and transcription factor binding sites. Rhesus macaques exhibit 2.5-fold higher overall nucleotide diversity and slightly elevated putative functional variation compared with humans. This functional variation in macaques provides opportunities for analyses of coding and noncoding variation, and its cellular consequences. Despite modestly higher levels of nonsynonymous variation in the macaques, the estimated distribution of fitness effects and the ratio of nonsynonymous to synonymous variants suggest that purifying selection has had stronger effects in rhesus macaques than in humans. Demographic reconstructions indicate this species has experienced a consistently large but fluctuating population size. Overall, the results presented here provide new insights into the population genomics of nonhuman primates and expand genomic information directly relevant to primate models of human disease. PMID:27934697
Extraordinary Sequence Divergence at Tsga8, an X-linked Gene Involved in Mouse Spermiogenesis

PubMed Central

Good, Jeffrey M.; Vanderpool, Dan; Smith, Kimberly L.; Nachman, Michael W.

2011-01-01

The X chromosome plays an important role in both adaptive evolution and speciation. We used a molecular evolutionary screen of X-linked genes potentially involved in reproductive isolation in mice to identify putative targets of recurrent positive selection. We then sequenced five very rapidly evolving genes within and between several closely related species of mice in the genus Mus. All five genes were involved in male reproduction and four of the genes showed evidence of recurrent positive selection. The most remarkable evolutionary patterns were found at Testis-specific gene a8 (Tsga8), a spermatogenesis-specific gene expressed during postmeiotic chromatin condensation and nuclear transformation. Tsga8 was characterized by extremely high levels of insertion–deletion variation of an alanine-rich repetitive motif in natural populations of Mus domesticus and M. musculus, differing in length from the reference mouse genome by up to 89 amino acids (27% of the total protein length). This population-level variation was coupled with striking divergence in protein sequence and length between closely related mouse species. Although no clear orthologs had previously been described for Tsga8 in other mammalian species, we have identified a highly divergent hypothetical gene on the rat X chromosome that shares clear orthology with the 5′ and 3′ ends of Tsga8. Further inspection of this ortholog verified that it is expressed in rat testis and shares remarkable similarity with mouse Tsga8 across several general features of the protein sequence despite no conservation of nucleotide sequence across over 60% of the rat-coding domain. Overall, Tsga8 appears to be one of the most rapidly evolving genes to have been described in rodents. We discuss the potential evolutionary causes and functional implications of this extraordinary divergence and the possible contribution of Tsga8 and the other four genes we examined to reproductive isolation in mice. PMID:21186189
Partial nephrogenic diabetes insipidus caused by a novel AQP2 variation impairing trafficking of the aquaporin-2 water channel.

PubMed

Dollerup, Pia; Thomsen, Troels Møller; Nejsum, Lene N; Færch, Mia; Österbrand, Martin; Gregersen, Niels; Rittig, Søren; Christensen, Jane H; Corydon, Thomas J

2015-12-29

Autosomal dominant inheritance of congenital nephrogenic diabetes insipidus (CNDI) is rare and usually caused by variations in the AQP2 gene. We have investigated the genetic and molecular background underlying symptoms of diabetes insipidus (DI) in a Swedish family with autosomal dominant inheritance of the condition. The proband and her father were subjected to water deprivation testing and direct DNA sequencing of the coding regions of the AQP2 and AVP genes. Madin-Darby canine kidney (MDCK) cells stably expressing AQP2 variant proteins were generated by lentiviral gene delivery. Localization of AQP2 variant proteins in the cells under stimulated and unstimulated conditions was analyzed by means of immunostaining and confocal laser scanning microscopy. Intracellular trafficking of AQP2 variant proteins was studied using transient expression of mutant dynamin2-K44A-GFP protein and AQP2 variant protein phosphorylation levels were assessed by Western blotting analysis. Clinical and genetic data suggest that the proband and her father suffer from partial nephrogenic DI due to a variation (g.4807C > T) in the AQP2 gene. The variation results in substitution of arginine-254 to tryptophan (p.R254W) in AQP2. Analysis of MDCK cells stably expressing AQP2 variant proteins revealed disabled phosphorylation, impaired trafficking and intracellular accumulation of AQP2-R254W protein. Notably, blocking of the endocytic pathway demonstrated impairment of AQP2-R254W to reach the cell surface. Partial CNDI in the Swedish family is caused by an AQP2 variation that seems to disable the encoded AQP2-R254W protein to reach the subapical vesicle population as well as impairing its phosphorylation at S256. The AQP2-R254W protein is thus unable to reach the plasma membrane to facilitate AVP mediated urine concentration.
Rate heterogeneity in six protein-coding genes from the holoparasite Balanophora (Balanophoraceae) and other taxa of Santalales

PubMed Central

Su, Huei-Jiun; Hu, Jer-Ming

2012-01-01

Background and Aims The holoparasitic flowering plant Balanophora displays extreme floral reduction and was previously found to have enormous rate acceleration in the nuclear 18S rDNA region. So far, it remains unclear whether non-ribosomal, protein-coding genes of Balanophora also evolve in an accelerated fashion and whether the genes with high substitution rates retain their functionality. To tackle these issues, six different genes were sequenced from two Balanophora species and their rate variation and expression patterns were examined. Methods Sequences including nuclear PI, euAP3, TM6, LFY and RPB2 and mitochondrial matR were determined from two Balanophora spp. and compared with selected hemiparasitic species of Santalales and autotrophic core eudicots. Gene expression was detected for the six protein-coding genes and the expression patterns of the three B-class genes (PI, AP3 and TM6) were further examined across different organs of B. laxiflora using RT-PCR. Key Results Balanophora mitochondrial matR is highly accelerated in both nonsynonymous (dN) and synonymous (dS) substitution rates, whereas the rate variation of nuclear genes LFY, PI, euAP3, TM6 and RPB2 are less dramatic. Significant dS increases were detected in Balanophora PI, TM6, RPB2 and dN accelerations in euAP3. All of the protein-coding genes are expressed in inflorescences, indicative of their functionality. PI is restrictively expressed in tepals, synandria and floral bracts, whereas AP3 and TM6 are widely expressed in both male and female inflorescences. Conclusions Despite the observation that rates of sequence evolution are generally higher in Balanophora than in hemiparasitic species of Santalales and autotrophic core eudicots, the five nuclear protein-coding genes are functional and are evolving at a much slower rate than 18S rDNA. The mechanism or mechanisms responsible for rapid sequence evolution and concomitant rate acceleration for 18S rDNA and matR are currently not well understood and require further study in Balanophora and other holoparasites. PMID:23041381
ProtVista: visualization of protein sequence annotations.

PubMed

Watkins, Xavier; Garcia, Leyla J; Pundir, Sangya; Martin, Maria J

2017-07-01

ProtVista is a comprehensive visualization tool for the graphical representation of protein sequence features in the UniProt Knowledgebase, experimental proteomics and variation public datasets. The complexity and relationships in this wealth of data pose a challenge in interpretation. Integrative visualization approaches such as provided by ProtVista are thus essential for researchers to understand the data and, for instance, discover patterns affecting function and disease associations. ProtVista is a JavaScript component released as an open source project under the Apache 2 License. Documentation and source code are available at http://ebi-uniprot.github.io/ProtVista/ . martin@ebi.ac.uk. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Genetic diversity in the C-terminus of merozoite surface protein 1 among Plasmodium knowlesi isolates from Selangor and Sabah Borneo, Malaysia.

PubMed

Yap, Nan Jiun; Goh, Xiang Ting; Koehler, Anson V; William, Timothy; Yeo, Tsin Wen; Vythilingam, Indra; Gasser, Robin B; Lim, Yvonne A L

2017-10-01

Plasmodium knowlesi, a malaria parasite of macaques, has emerged as an important parasite of humans. Despite the significance of P. knowlesi malaria in parts of Southeast Asia, very little is known about the genetic variation in this parasite. Our aim here was to explore sequence variation in a molecule called the 42kDa merozoite surface protein-1 (MSP-1), which is found on the surface of blood stages of Plasmodium spp. and plays a key role in erythrocyte invasion. Several studies of P. falciparum have reported that the C-terminus (a 42kDa fragment) of merozoite surface protein-1 (MSP-1 42 ; consisting of MSP-1 19 and MSP-1 33 ) is a potential candidate for a malaria vaccine. However, to date, no study has yet investigated the sequence diversity of the gene encoding P. knowlesi MSP-1 42 (comprising Pk-msp-1 19 and Pk-msp-1 33 ) among isolates in Malaysia. The present study explored this aspect. Twelve P. knowlesi isolates were collected from patients from hospitals in Selangor and Sabah Borneo, Malaysia, between 2012 and 2014. The Pk-msp-1 42 gene was amplified by PCR and directly sequenced. Haplotype diversity (Hd) and nucleotide diversity (л) were studied among the isolates. There was relatively high genetic variation among P. knowlesi isolates; overall Hd and л were 1±0.034 and 0.01132±0.00124, respectively. A total of nine different haplotypes related to amino acid alterations at 13 positions, and the Pk-MSP-1 19 sequence was found to be more conserved than Pk-msp-1 33 . We have found evidence for negative selection in Pk-msp- 42 as well as the 33kDa and 19kDa fragments by comparing the rate of non-synonymous versus synonymous substitutions. Future investigations should study large numbers of samples from disparate geographical locations to critically assess whether this molecule might be a potential vaccine target for P. knowlesi. Copyright © 2017 Elsevier B.V. All rights reserved.
Diversity of the luciferin binding protein gene in bioluminescent dinoflagellates--insights from a new gene in Noctiluca scintillans and sequences from gonyaulacoid genera.

PubMed

Valiadi, Martha; Iglesias-Rodriguez, Maria Debora

2014-01-01

Dinoflagellate bioluminescence systems operate with or without a luciferin binding protein, representing two distinct modes of light production. However, the distribution, diversity, and evolution of the luciferin binding protein gene within bioluminescent dinoflagellates are not well known. We used PCR to detect and partially sequence this gene from the heterotrophic dinoflagellate Noctiluca scintillans and a group of ecologically important gonyaulacoid species. We report an additional luciferin binding protein gene in N. scintillans which is not attached to luciferase, further to its typical combined bioluminescence gene. This supports the hypothesis that a profound re-organization of the bioluminescence system has taken place in this organism. We also show that the luciferin binding protein gene is present in the genera Ceratocorys, Gonyaulax, and Protoceratium, and is prevalent in bioluminescent species of Alexandrium. Therefore, this gene is an integral component of the standard molecular bioluminescence machinery in dinoflagellates. Nucleotide sequences showed high within-strain variation among gene copies, revealing a highly diverse gene family comprising multiple gene types in some organisms. Phylogenetic analyses showed that, in some species, the evolution of the luciferin binding protein gene was different from the organism's general phylogenies, highlighting the complex evolutionary history of dinoflagellate bioluminescence systems. © 2013 The Author(s) Journal of Eukaryotic Microbiology © 2013 International Society of Protistologists.
Conserved hypothetical protein Rv1977 in Mycobacterium tuberculosis strains contains sequence polymorphisms and might be involved in ongoing immune evasion.

PubMed

Jiang, Yi; Liu, Haican; Wang, Xuezhi; Li, Guilian; Qiu, Yan; Dou, Xiangfeng; Wan, Kanglin

2015-01-01

Host immune pressure and associated parasite immune evasion are key features of host-pathogen co-evolution. A previous study showed that human T cell epitopes of Mycobacterium tuberculosis are evolutionarily hyperconserved and thus it was deduced that M. tuberculosis lacks antigenic variation and immune evasion. Here, we selected 151 clinical Mycobacterium tuberculosis isolates from China, amplified gene encoding Rv1977 and compared the sequences. The results showed that Rv1977, a conserved hypothetical protein, is not conserved in M. tuberculosis strains and there are polymorphisms existed in the protein. Some mutations, especially one frameshift mutation, occurred in the antigen Rv1977, which is uncommon in M.tb strains and may lead to the protein function altering. Mutations and deletion in the gene all affect one of three T cell epitopes and the changed T cell epitope contained more than one variable position, which may suggest ongoing immune evasion.
Divergence in substrate specificity by the vOTU domain of various strains of highly-pathogenic PRRSV and the implications to pathogenicity

USDA-ARS?s Scientific Manuscript database

Porcine reproductive and respiratory syndrome virus (PRRSV) is widespread with a high variation in sequence and virulence among the divergent strains and causes an economically destructive disease. A viral ovarian domain protease (vOTU) has been previously identified within the nonstructural protein...
Integrating transcriptome and genome re-sequencing data to identify key genes and mutations affecting chicken eggshell qualities.

PubMed

Zhang, Quan; Zhu, Feng; Liu, Long; Zheng, Chuan Wei; Wang, De He; Hou, Zhuo Cheng; Ning, Zhong Hua

2015-01-01

Eggshell damages lead to economic losses in the egg production industry and are a threat to human health. We examined 49-wk-old Rhode Island White hens (Gallus gallus) that laid eggs having shells with significantly different strengths and thicknesses. We used HiSeq 2000 (Illumina) sequencing to characterize the chicken transcriptome and whole genome to identify the key genes and genetic mutations associated with eggshell calcification. We identified a total of 14,234 genes expressed in the chicken uterus, representing 89% of all annotated chicken genes. A total of 889 differentially expressed genes were identified by comparing low eggshell strength (LES) and normal eggshell strength (NES) genomes. The DEGs are enriched in calcification-related processes, including calcium ion transport and calcium signaling pathways as revealed by gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis. Some important matrix proteins, such as OC-116, LTF and SPP1, were also expressed differentially between two groups. A total of 3,671,919 single-nucleotide polymorphisms (SNPs) and 508,035 Indels were detected in protein coding genes by whole-genome re-sequencing, including 1775 non-synonymous variations and 19 frame-shift Indels in DEGs. SNPs and Indels found in this study could be further investigated for eggshell traits. This is the first report to integrate the transcriptome and genome re-sequencing to target the genetic variations which decreased the eggshell qualities. These findings further advance our understanding of eggshell calcification in the chicken uterus.
Biodiversity of mannose-specific adhesion in Lactobacillus plantarum revisited: strain-specific domain composition of the mannose-adhesin.

PubMed

Gross, G; Snel, J; Boekhorst, J; Smits, M A; Kleerebezem, M

2010-03-01

Recently, we have identified the mannose-specific adhesin encoding gene (msa) of Lactobacillus plantarum. In the current study, structure and function of this potentially probiotic effector gene were further investigated, exploring genetic diversity of msa in L. plantarum in relation to mannose adhesion capacity. The results demonstrate that there is considerable variation in quantitative in vitro mannose adhesion capacity, which is paralleled by msa gene sequence variation. The msa genes of different L. plantarum strains encode proteins with variable domain composition. Construction of L. plantarum 299v mutant strains revealed that the msa gene product is the key-protein for mannose adhesion, also in a strain with high mannose adhering capacity. However, no straightforward correlation between adhesion capacity and domain composition of Msa in L. plantarum could be identified. Nevertheless, differences in Msa sequences in combination with variable genetic background of specific bacterial strains appears to determine mannose adhesion capacity and potentially affects probiotic properties. These findings exemplify the strain-specificity of probiotic characteristics and illustrate the need for careful and molecular selection of new candidate probiotics.

Complete genomic sequence of a Tobacco rattle virus isolate from Michigan-grown potatoes.

PubMed

Crosslin, James M; Hamm, Philip B; Kirk, William W; Hammond, Rosemarie W

2010-04-01

Tobacco rattle virus (TRV) causes stem mottle on potato leaves and necrotic arcs and rings in potato tubers, known as corky ringspot disease. Recently, TRV was reported in Michigan potato tubers cv. FL1879 exhibiting corky ringspot disease. Sequence analysis of the RNA-1-encoded 16-kDa gene of the Michigan isolate, designated MI-1, revealed homology to TRV isolates from Florida and Washington. Here, we report the complete genomic sequence of RNA-1 (6,791 nt) and RNA-2 (3,685 nt) of TRV MI-1. RNA-1 is predicted to contain four open reading frames, and the genome structure and phylogenetic analyses of the RNA-1 nucleotide sequence revealed significant homologies to the known sequences of other TRV-1 isolates. The relationships based on the full-length nucleotide sequence were different from than those based on the 16-kDa gene encoded on genomic RNA-1 and reflect sequence variation within a 20-25-aa residue region of the 16-kDa protein. MI-1 RNA-2 is predicted to contain three ORFs, encoding the coat protein (CP), a 37.6-kDa protein (ORF 2b), and a 33.6-kDa protein (ORF 2c). In addition, it contains a region of similarity to the 3' terminus of RNA-1, including a truncated portion of the 16-kDa cistron. Phylogenetic analysis of RNA-2, based on a comparison of nucleotide sequences with other members of the genus Tobravirus, indicates that TRV MI-1 and other North American isolates cluster as a distinct group. TRV M1-1 is only the second North American isolate for which there is a complete sequence of the genome, and it is distinct from the North American isolate TRV ORY. The relationship of the TRV MI-1 isolate to other tobravirus isolates is discussed.
Whole exome sequencing identifies a homozygous nonsense variation in ALMS1 gene in a patient with syndromic obesity.

PubMed

Das Bhowmik, Aneek; Gupta, Neerja; Dalal, Ashwin; Kabra, Madhulika

In the present study we report on genetic analysis in a patient with developmental delay, truncal obesity and vision problem, to find the causative mutation. Whole exome sequencing was performed on genomic DNA extracted from whole blood of the patient which revealed a homozygous nonsense variant (c.2816T>A) in exon 8 of ALMS1 gene that results in a stop codon and premature truncation at codon 939 (p.L939Ter) of the protein. The mutation was confirmed by Sanger sequencing. Exome sequencing was helpful in establishing diagnosis of Alstrom syndrome in this patient. This case highlights the utility of exome sequencing in clinical practice. Copyright © 2016 Asia Oceania Association for the Study of Obesity. Published by Elsevier Ltd. All rights reserved.
High-throughput sequencing of mGluR signaling pathway genes reveals enrichment of rare variants in autism.

PubMed

Kelleher, Raymond J; Geigenmüller, Ute; Hovhannisyan, Hayk; Trautman, Edwin; Pinard, Robert; Rathmell, Barbara; Carpenter, Randall; Margulies, David

2012-01-01

Identification of common molecular pathways affected by genetic variation in autism is important for understanding disease pathogenesis and devising effective therapies. Here, we test the hypothesis that rare genetic variation in the metabotropic glutamate-receptor (mGluR) signaling pathway contributes to autism susceptibility. Single-nucleotide variants in genes encoding components of the mGluR signaling pathway were identified by high-throughput multiplex sequencing of pooled samples from 290 non-syndromic autism cases and 300 ethnically matched controls on two independent next-generation platforms. This analysis revealed significant enrichment of rare functional variants in the mGluR pathway in autism cases. Higher burdens of rare, potentially deleterious variants were identified in autism cases for three pathway genes previously implicated in syndromic autism spectrum disorder, TSC1, TSC2, and SHANK3, suggesting that genetic variation in these genes also contributes to risk for non-syndromic autism. In addition, our analysis identified HOMER1, which encodes a postsynaptic density-localized scaffolding protein that interacts with Shank3 to regulate mGluR activity, as a novel autism-risk gene. Rare, potentially deleterious HOMER1 variants identified uniquely in the autism population affected functionally important protein regions or regulatory sequences and co-segregated closely with autism among children of affected families. We also identified rare ASD-associated coding variants predicted to have damaging effects on components of the Ras/MAPK cascade. Collectively, these findings suggest that altered signaling downstream of mGluRs contributes to the pathogenesis of non-syndromic autism.
High-Throughput Sequencing of mGluR Signaling Pathway Genes Reveals Enrichment of Rare Variants in Autism

PubMed Central

Hovhannisyan, Hayk; Trautman, Edwin; Pinard, Robert; Rathmell, Barbara; Carpenter, Randall; Margulies, David

2012-01-01

Identification of common molecular pathways affected by genetic variation in autism is important for understanding disease pathogenesis and devising effective therapies. Here, we test the hypothesis that rare genetic variation in the metabotropic glutamate-receptor (mGluR) signaling pathway contributes to autism susceptibility. Single-nucleotide variants in genes encoding components of the mGluR signaling pathway were identified by high-throughput multiplex sequencing of pooled samples from 290 non-syndromic autism cases and 300 ethnically matched controls on two independent next-generation platforms. This analysis revealed significant enrichment of rare functional variants in the mGluR pathway in autism cases. Higher burdens of rare, potentially deleterious variants were identified in autism cases for three pathway genes previously implicated in syndromic autism spectrum disorder, TSC1, TSC2, and SHANK3, suggesting that genetic variation in these genes also contributes to risk for non-syndromic autism. In addition, our analysis identified HOMER1, which encodes a postsynaptic density-localized scaffolding protein that interacts with Shank3 to regulate mGluR activity, as a novel autism-risk gene. Rare, potentially deleterious HOMER1 variants identified uniquely in the autism population affected functionally important protein regions or regulatory sequences and co-segregated closely with autism among children of affected families. We also identified rare ASD-associated coding variants predicted to have damaging effects on components of the Ras/MAPK cascade. Collectively, these findings suggest that altered signaling downstream of mGluRs contributes to the pathogenesis of non-syndromic autism. PMID:22558107
Comparative analysis of the XopD T3S effector family in plant pathogenic bacteria

PubMed Central

Kim, Jung-Gun; Taylor, Kyle W.; Mudgett, Mary Beth

2011-01-01

SUMMARY XopD is a type III effector protein that is required for Xanthomonas campestris pathovar vesicatoria (Xcv) growth in tomato. It is a modular protein consisting of an N-terminal DNA-binding domain, two EAR transcriptional repressor motifs, and a C-terminal SUMO protease. In tomato, XopD functions as a transcriptional repressor, resulting in the suppression of defense responses at late stages of infection. A survey of available genome sequences for phytopathogenic bacteria revealed that XopD homologs are limited to species within three Genera of Proteobacteria – Xanthomonas, Acidovorax, and Pseudomonas. While the EAR motif(s) and SUMO protease domain are conserved in all the XopD-like proteins, variation exists in the length and sequence identity of the N-terminal domains. Comparative analysis of the DNA sequences surrounding xopD and xopD-like genes led to revised annotation of the xopD gene. Edman degradation sequence analysis and functional complementation studies confirmed that the xopD gene from Xcv encodes a 760 amino acid protein with a longer N-terminal domain than previously predicted. None of the XopD-like proteins studied complemented Xcv ΔxopD mutant phenotypes in tomato leaves suggesting that the N-terminus of XopD defines functional specificity. Xcv ΔxopD strains expressing chimeric fusion proteins containing the N-terminus of XopD fused to the EAR motif(s) and SUMO protease domain of the XopD-like protein from Xanthomonas campestris pathovar campestris strain B100 were fully virulent in tomato demonstrating that the N-terminus of XopD controls specificity in tomato. PMID:21726373
A framework for organizing cancer-related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE).

PubMed

Wu, Tsung-Jung; Shamsaddini, Amirhossein; Pan, Yang; Smith, Krista; Crichton, Daniel J; Simonyan, Vahan; Mazumder, Raja

2014-01-01

Years of sequence feature curation by UniProtKB/Swiss-Prot, PIR-PSD, NCBI-CDD, RefSeq and other database biocurators has led to a rich repository of information on functional sites of genes and proteins. This information along with variation-related annotation can be used to scan human short sequence reads from next-generation sequencing (NGS) pipelines for presence of non-synonymous single-nucleotide variations (nsSNVs) that affect functional sites. This and similar workflows are becoming more important because thousands of NGS data sets are being made available through projects such as The Cancer Genome Atlas (TCGA), and researchers want to evaluate their biomarkers in genomic data. BioMuta, an integrated sequence feature database, provides a framework for automated and manual curation and integration of cancer-related sequence features so that they can be used in NGS analysis pipelines. Sequence feature information in BioMuta is collected from the Catalogue of Somatic Mutations in Cancer (COSMIC), ClinVar, UniProtKB and through biocuration of information available from publications. Additionally, nsSNVs identified through automated analysis of NGS data from TCGA are also included in the database. Because of the petabytes of data and information present in NGS primary repositories, a platform HIVE (High-performance Integrated Virtual Environment) for storing, analyzing, computing and curating NGS data and associated metadata has been developed. Using HIVE, 31 979 nsSNVs were identified in TCGA-derived NGS data from breast cancer patients. All variations identified through this process are stored in a Curated Short Read archive, and the nsSNVs from the tumor samples are included in BioMuta. Currently, BioMuta has 26 cancer types with 13 896 small-scale and 308 986 large-scale study-derived variations. Integration of variation data allows identifications of novel or common nsSNVs that can be prioritized in validation studies. Database URL: BioMuta: http://hive.biochemistry.gwu.edu/tools/biomuta/index.php; CSR: http://hive.biochemistry.gwu.edu/dna.cgi?cmd=csr; HIVE: http://hive.biochemistry.gwu.edu.
TIM Barrel Protein Structure Classification Using Alignment Approach and Best Hit Strategy

NASA Astrophysics Data System (ADS)

Chu, Jia-Han; Lin, Chun Yuan; Chang, Cheng-Wen; Lee, Chihan; Yang, Yuh-Shyong; Tang, Chuan Yi

2007-11-01

The classification of protein structures is essential for their function determination in bioinformatics. It has been estimated that around 10% of all known enzymes have TIM barrel domains from the Structural Classification of Proteins (SCOP) database. With its high sequence variation and diverse functionalities, TIM barrel protein becomes to be an attractive target for protein engineering and for the evolution study. Hence, in this paper, an alignment approach with the best hit strategy is proposed to classify the TIM barrel protein structure in terms of superfamily and family levels in the SCOP. This work is also used to do the classification for class level in the Enzyme nomenclature (ENZYME) database. Two testing data sets, TIM40D and TIM95D, both are used to evaluate this approach. The resulting classification has an overall prediction accuracy rate of 90.3% for the superfamily level in the SCOP, 89.5% for the family level in the SCOP and 70.1% for the class level in the ENZYME. These results demonstrate that the alignment approach with the best hit strategy is a simple and viable method for the TIM barrel protein structure classification, even only has the amino acid sequences information.
Top-down Mass Spectrometry of Cardiac Myofilament Proteins in Health and Disease

PubMed Central

Ying, Peng; Serife, Ayaz-Guner; Deyang, Yu; Ying, Ge

2014-01-01

Myofilaments are composed of thin and thick filaments which coordinate with each other to regulate muscle contraction and relaxation. Posttranslational modifications (PTMs) together with genetic variations and alternative splicing of the myofilament proteins play essential roles in regulating cardiac contractility in health and disease. Therefore, a comprehensive characterization of the myofilament proteins in physiological and pathological conditions is essential for better understanding the molecular basis of cardiac function and dysfunction. Due to the vast complexity and dynamic nature of proteins, it is challenging to obtain a holistic view of myofilament protein modifications. In recent years, top-down mass spectrometry (MS) has emerged as a powerful approach to study isoform composition and PTMs of proteins owing to its advantage of complete sequence coverage and its ability to identify PTMs and sequence variants without a priori knowledge. In this review, we will discuss the application of top-down MS to study cardiac myofilaments and highlight the insights it provides into the understanding of molecular mechanisms in contractile dysfunction of heart failure. Particularly, recent results of cardiac troponin and tropomyosin modifications will be elaborated. The limitations and perspectives on the use of top-down MS for myofilament protein characterization will also be briefly discussed. PMID:24945106
Nonneutral mitochondrial DNA variation in humans and chimpanzees

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nachman, M.W.; Aquadro, C.F.; Brown, W.M.

1996-03-01

We sequenced the NADH dehydrogenase subunit 3 (ND3) gene from a sample of 61 humans, five common chimpanzees, and one gorilla to test whether patterns of mitochondrial DNA (mtDNA) variation are consistent with a neutral model of molecular evolution. Within humans and within chimpanzees, the ratio of replacement to silent nucleotide substitutions was higher than observed in comparisons between species, contrary to neutral expectations. To test the generality of this result, we reanalyzed published human RFLP data from the entire mitochondrial genome. Gains of restriction sites relative to a known human mtDNA sequence were used to infer unambiguous nucleotide substitutions.more » We also compared the complete mtDNA sequences of three humans. Both the RFLP data and the sequence data reveal a higher ratio of replacement to silent nucleotide substitutions within humans than is seen between species. This pattern is observed at most or all human mitochondrial genes and is inconsistent with a strictly neutral model. These data suggest that many mitochondrial protein polymorphisms are slightly deleterious, consistent with studies of human mitochondrial diseases. 59 refs., 2 figs., 8 tabs.« less
Neural correlates of nesting behavior in zebra finches (Taeniopygia guttata).

PubMed

Hall, Zachary J; Bertin, Marion; Bailey, Ida E; Meddle, Simone L; Healy, Susan D

2014-05-01

Nest building in birds involves a behavioral sequence (nest material collection and deposition in the nest) that offers a unique model for addressing how the brain sequences motor actions. In this study, we identified brain regions involved in nesting behavior in male and female zebra finches (Taeniopygia guttata). We used Fos immunohistochemistry to quantify production of the immediate early gene protein product Fos (a molecular indicator of neuronal activity) in the brain correlated this expression with the variation in nesting behavior. Using this technique, we found that neural circuitry involved in motor sequencing, social behavior, reward and motivation were active during nesting. Within pairs of nesting birds, the number of times a male picked up or deposited nesting material and the amount of time a female spent in the nest explained the variation in Fos expression in the anterior motor pathway, social behavior network, and reward neural circuits. Identification of the brain regions that are involved in nesting enables us to begin studying the roles of motor sequencing, context, and reward in construction behavior at the neural level. Copyright © 2014 Elsevier B.V. All rights reserved.
Neural correlates of nesting behavior in zebra finches (Taeniopygia guttata)

PubMed Central

Hall, Zachary J.; Bertin, Marion; Bailey, Ida E.; Meddle, Simone L.; Healy, Susan D.

2014-01-01

Nest building in birds involves a behavioral sequence (nest material collection and deposition in the nest) that offers a unique model for addressing how the brain sequences motor actions. In this study, we identified brain regions involved in nesting behavior in male and female zebra finches (Taeniopygia guttata). We used Fos immunohistochemistry to quantify production of the immediate early gene protein product Fos (a molecular indicator of neuronal activity) in the brain correlated this expression with the variation in nesting behavior. Using this technique, we found that neural circuitry involved in motor sequencing, social behavior, reward and motivation were active during nesting. Within pairs of nesting birds, the number of times a male picked up or deposited nesting material and the amount of time a female spent in the nest explained the variation in Fos expression in the anterior motor pathway, social behavior network, and reward neural circuits. Identification of the brain regions that are involved in nesting enables us to begin studying the roles of motor sequencing, context, and reward in construction behavior at the neural level. PMID:24508238
Mutations that Cause Human Disease: A Computational/Experimental Approach

DOE Office of Scientific and Technical Information (OSTI.GOV)

Beernink, P; Barsky, D; Pesavento, B

International genome sequencing projects have produced billions of nucleotides (letters) of DNA sequence data, including the complete genome sequences of 74 organisms. These genome sequences have created many new scientific opportunities, including the ability to identify sequence variations among individuals within a species. These genetic differences, which are known as single nucleotide polymorphisms (SNPs), are particularly important in understanding the genetic basis for disease susceptibility. Since the report of the complete human genome sequence, over two million human SNPs have been identified, including a large-scale comparison of an entire chromosome from twenty individuals. Of the protein coding SNPs (cSNPs), approximatelymore » half leads to a single amino acid change in the encoded protein (non-synonymous coding SNPs). Most of these changes are functionally silent, while the remainder negatively impact the protein and sometimes cause human disease. To date, over 550 SNPs have been found to cause single locus (monogenic) diseases and many others have been associated with polygenic diseases. SNPs have been linked to specific human diseases, including late-onset Parkinson disease, autism, rheumatoid arthritis and cancer. The ability to predict accurately the effects of these SNPs on protein function would represent a major advance toward understanding these diseases. To date several attempts have been made toward predicting the effects of such mutations. The most successful of these is a computational approach called ''Sorting Intolerant From Tolerant'' (SIFT). This method uses sequence conservation among many similar proteins to predict which residues in a protein are functionally important. However, this method suffers from several limitations. First, a query sequence must have a sufficient number of relatives to infer sequence conservation. Second, this method does not make use of or provide any information on protein structure, which can be used to understand how an amino acid change affects the protein. The experimental methods that provide the most detailed structural information on proteins are X-ray crystallography and NMR spectroscopy. However, these methods are labor intensive and currently cannot be carried out on a genomic scale. Nonetheless, Structural Genomics projects are being pursued by more than a dozen groups and consortia worldwide and as a result the number of experimentally determined structures is rising exponentially. Based on the expectation that protein structures will continue to be determined at an ever-increasing rate, reliable structure prediction schemes will become increasingly valuable, leading to information on protein function and disease for many different proteins. Given known genetic variability and experimentally determined protein structures, can we accurately predict the effects of single amino acid substitutions? An objective assessment of this question would involve comparing predicted and experimentally determined structures, which thus far has not been rigorously performed. The completed research leveraged existing expertise at LLNL in computational and structural biology, as well as significant computing resources, to address this question.« less
Analysis of Complete Nucleotide Sequences of 12 Gossypium Chloroplast Genomes: Origin and Evolution of Allotetraploids

PubMed Central

Xu, Qin; Xiong, Guanjun; Li, Pengbo; He, Fei; Huang, Yi; Wang, Kunbo; Li, Zhaohu; Hua, Jinping

2012-01-01

Background Cotton (Gossypium spp.) is a model system for the analysis of polyploidization. Although ascertaining the donor species of allotetraploid cotton has been intensively studied, sequence comparison of Gossypium chloroplast genomes is still of interest to understand the mechanisms underlining the evolution of Gossypium allotetraploids, while it is generally accepted that the parents were A- and D-genome containing species. Here we performed a comparative analysis of 13 Gossypium chloroplast genomes, twelve of which are presented here for the first time. Methodology/Principal Findings The size of 12 chloroplast genomes under study varied from 159,959 bp to 160,433 bp. The chromosomes were highly similar having >98% sequence identity. They encoded the same set of 112 unique genes which occurred in a uniform order with only slightly different boundary junctions. Divergence due to indels as well as substitutions was examined separately for genome, coding and noncoding sequences. The genome divergence was estimated as 0.374% to 0.583% between allotetraploid species and A-genome, and 0.159% to 0.454% within allotetraploids. Forty protein-coding genes were completely identical at the protein level, and 20 intergenic sequences were completely conserved. The 9 allotetraploids shared 5 insertions and 9 deletions in whole genome, and 7-bp substitutions in protein-coding genes. The phylogenetic tree confirmed a close relationship between allotetraploids and the ancestor of A-genome, and the allotetraploids were divided into four separate groups. Progenitor allotetraploid cotton originated 0.43–0.68 million years ago (MYA). Conclusion Despite high degree of conservation between the Gossypium chloroplast genomes, sequence variations among species could still be detected. Gossypium chloroplast genomes preferred for 5-bp indels and 1–3-bp indels are mainly attributed to the SSR polymorphisms. This study supports that the common ancestor of diploid A-genome species in Gossypium is the maternal source of extant allotetraploid species and allotetraploids have a monophyletic origin. G. hirsutum AD1 lineages have experienced more sequence variations than other allotetraploids in intergenic regions. The available complete nucleotide sequences of 12 Gossypium chloroplast genomes should facilitate studies to uncover the molecular mechanisms of compartmental co-evolution and speciation of Gossypium allotetraploids. PMID:22876273
Gene sequence variability of the three surface proteins of human respiratory syncytial virus (HRSV) in Texas.

PubMed

Tapia, Lorena I; Shaw, Chad A; Aideyan, Letisha O; Jewell, Alan M; Dawson, Brian C; Haq, Taha R; Piedra, Pedro A

2014-01-01

Human respiratory syncytial virus (HRSV) has three surface glycoproteins: small hydrophobic (SH), attachment (G) and fusion (F), encoded by three consecutive genes (SH-G-F). A 270-nt fragment of the G gene is used to genotype HRSV isolates. This study genotyped and investigated the variability of the gene and amino acid sequences of the three surface proteins of HRSV strains collected from 1987 to 2005 from one center. Sixty original clinical isolates and 5 prototype strains were analyzed. Sequences containing SH, F and G genes were generated, and multiple alignments and phylogenetic trees were analyzed. Genetic variability by protein domains comparing virus genotypes was assessed. Complete sequences of the SH-G-F genes were obtained for all 65 samples: HRSV-A = 35; HRSV-B = 30. In group A strains, genotypes GA5 and GA2 were predominant. For HRSV-B strains, the genotype GB4 was predominant from 1992 to 1994 and only genotype BA viruses were detected in 2004-2005. Different genetic variability at nucleotide level was detected between the genes, with G gene being the most variable and the highest variability detected in the 270-nt G fragment that is frequently used to genotype the virus. High variability (>10%) was also detected in the signal peptide and transmembrane domains of the F gene of HRSV A strains. Variability among the HRSV strains resulting in non-synonymous changes was detected in hypervariable domains of G protein, the signal peptide of the F protein, a not previously defined domain in the F protein, and the antigenic site Ø in the pre-fusion F. Divergent trends were observed between HRSV -A and -B groups for some functional domains. A diverse population of HRSV -A and -B genotypes circulated in Houston during an 18 year period. We hypothesize that diverse sequence variation of the surface protein genes provide HRSV strains a survival advantage in a partially immune-protected community.
Gene Sequence Variability of the Three Surface Proteins of Human Respiratory Syncytial Virus (HRSV) in Texas

PubMed Central

Tapia, Lorena I.; Shaw, Chad A.; Aideyan, Letisha O.; Jewell, Alan M.; Dawson, Brian C.; Haq, Taha R.; Piedra, Pedro A.

2014-01-01

Human respiratory syncytial virus (HRSV) has three surface glycoproteins: small hydrophobic (SH), attachment (G) and fusion (F), encoded by three consecutive genes (SH-G-F). A 270-nt fragment of the G gene is used to genotype HRSV isolates. This study genotyped and investigated the variability of the gene and amino acid sequences of the three surface proteins of HRSV strains collected from 1987 to 2005 from one center. Sixty original clinical isolates and 5 prototype strains were analyzed. Sequences containing SH, F and G genes were generated, and multiple alignments and phylogenetic trees were analyzed. Genetic variability by protein domains comparing virus genotypes was assessed. Complete sequences of the SH-G-F genes were obtained for all 65 samples: HRSV-A = 35; HRSV-B = 30. In group A strains, genotypes GA5 and GA2 were predominant. For HRSV-B strains, the genotype GB4 was predominant from 1992 to 1994 and only genotype BA viruses were detected in 2004–2005. Different genetic variability at nucleotide level was detected between the genes, with G gene being the most variable and the highest variability detected in the 270-nt G fragment that is frequently used to genotype the virus. High variability (>10%) was also detected in the signal peptide and transmembrane domains of the F gene of HRSV A strains. Variability among the HRSV strains resulting in non-synonymous changes was detected in hypervariable domains of G protein, the signal peptide of the F protein, a not previously defined domain in the F protein, and the antigenic site Ø in the pre-fusion F. Divergent trends were observed between HRSV -A and -B groups for some functional domains. A diverse population of HRSV -A and -B genotypes circulated in Houston during an 18 year period. We hypothesize that diverse sequence variation of the surface protein genes provide HRSV strains a survival advantage in a partially immune-protected community. PMID:24625544
The adaptive evolution of the mammalian mitochondrial genome

PubMed Central

da Fonseca, Rute R; Johnson, Warren E; O'Brien, Stephen J; Ramos, Maria João; Antunes, Agostinho

2008-01-01

Background The mitochondria produce up to 95% of a eukaryotic cell's energy through oxidative phosphorylation. The proteins involved in this vital process are under high functional constraints. However, metabolic requirements vary across species, potentially modifying selective pressures. We evaluate the adaptive evolution of 12 protein-coding mitochondrial genes in 41 placental mammalian species by assessing amino acid sequence variation and exploring the functional implications of observed variation in secondary and tertiary protein structures. Results Wide variation in the properties of amino acids were observed at functionally important regions of cytochrome b in species with more-specialized metabolic requirements (such as adaptation to low energy diet or large body size, such as in elephant, dugong, sloth, and pangolin, and adaptation to unusual oxygen requirements, for example diving in cetaceans, flying in bats, and living at high altitudes in alpacas). Signatures of adaptive variation in the NADH dehydrogenase complex were restricted to the loop regions of the transmembrane units which likely function as protons pumps. Evidence of adaptive variation in the cytochrome c oxidase complex was observed mostly at the interface between the mitochondrial and nuclear-encoded subunits, perhaps evidence of co-evolution. The ATP8 subunit, which has an important role in the assembly of F0, exhibited the highest signal of adaptive variation. ATP6, which has an essential role in rotor performance, showed a high adaptive variation in predicted loop areas. Conclusion Our study provides insight into the adaptive evolution of the mtDNA genome in mammals and its implications for the molecular mechanism of oxidative phosphorylation. We present a framework for future experimental characterization of the impact of specific mutations in the function, physiology, and interactions of the mtDNA encoded proteins involved in oxidative phosphorylation. PMID:18318906
Porcine MAP3K5 analysis: molecular cloning, characterization, tissue expression pattern, and copy number variations associated with residual feed intake.

PubMed

Pu, L; Zhang, L C; Zhang, J S; Song, X; Wang, L G; Liang, J; Zhang, Y B; Liu, X; Yan, H; Zhang, T; Yue, J W; Li, N; Wu, Q Q; Wang, L X

2016-08-12

Mitogen-activated protein kinase kinase kinase 5 (MAP3K5) is essential for apoptosis, proliferation, differentiation, and immune responses, and is a candidate marker for residual feed intake (RFI) in pig. We cloned the full-length cDNA sequence of porcine MAP3K5 by rapid-amplification of cDNA ends. The 5451-bp gene contains a 5'-untranslated region (UTR) (718 bp), a coding region (3738 bp), and a 3'-UTR (995 bp), and encodes a peptide of 1245 amino acids, which shares 97, 99, 97, 93, 91, and 84% sequence identity with cattle, sheep, human, mouse, chicken, and zebrafish MAP3K5, respectively. The deduced MAP3K5 protein sequence contains two conserved domains: a DUF4071 domain and a protein kinase domain. Phylogenetic analysis showed that porcine MAP3K5 forms a separate branch to vicugna and camel MAP3K5. Tissue expression analysis using real-time quantitative polymerase chain reaction (qRT-PCR) revealed that MAP3K5 was expressed in the heart, liver, spleen, lung, kidney, muscle, fat, pancrea, ileum, and stomach tissues. Copy number variation was detected for porcine MAP3K5 and validated by qRT-PCR. Furthermore, a significant increase in average copy number was detected in the low RFI group when compared to the high RFI group in a Duroc pig population. These results provide useful information regarding the influence of MAP3K5 on RFI in pigs.
Host Immune Evasion by Lyme and Relapsing Fever Borreliae: Findings to Lead Future Studies for Borrelia miyamotoi

PubMed Central

Stone, Brandee L.; Brissette, Catherine A.

2017-01-01

The emerging pathogen, Borrelia miyamotoi, is a relapsing fever spirochete vectored by the same species of Ixodes ticks that carry the causative agents of Lyme disease in the US, Europe, and Asia. Symptoms caused by infection with B. miyamotoi are similar to a relapsing fever infection. However, B. miyamotoi has adapted to different vectors and reservoirs, which could result in unique physiology, including immune evasion mechanisms. Lyme Borrelia utilize a combination of Ixodes-produced inhibitors and native proteins [i.e., factor H-binding proteins (FHBPs)/complement regulator-acquiring surface proteins, p43, BBK32, BGA66, BGA71, CD59-like protein] to inhibit complement, while some relapsing fever spirochetes use C4b-binding protein and likely Ornithodoros-produced inhibitors. To evade the humoral response, Borrelia utilize antigenic variation of either outer surface proteins (Osps) and the Vmp-like sequences (Vls) system (Lyme borreliae) or variable membrane proteins (Vmps, relapsing fever borreliae). B. miyamotoi possesses putative FHBPs and antigenic variation of Vmps has been demonstrated. This review summarizes and compares the common mechanisms utilized by Lyme and relapsing fever spirochetes, as well as the current state of understanding immune evasion by B. miyamotoi. PMID:28154563
Host Immune Evasion by Lyme and Relapsing Fever Borreliae: Findings to Lead Future Studies for Borrelia miyamotoi.

PubMed

Stone, Brandee L; Brissette, Catherine A

2017-01-01

The emerging pathogen, Borrelia miyamotoi , is a relapsing fever spirochete vectored by the same species of Ixodes ticks that carry the causative agents of Lyme disease in the US, Europe, and Asia. Symptoms caused by infection with B. miyamotoi are similar to a relapsing fever infection. However, B. miyamotoi has adapted to different vectors and reservoirs, which could result in unique physiology, including immune evasion mechanisms. Lyme Borrelia utilize a combination of Ixodes -produced inhibitors and native proteins [i.e., factor H-binding proteins (FHBPs)/complement regulator-acquiring surface proteins, p43, BBK32, BGA66, BGA71, CD59-like protein] to inhibit complement, while some relapsing fever spirochetes use C4b-binding protein and likely Ornithodoros -produced inhibitors. To evade the humoral response, Borrelia utilize antigenic variation of either outer surface proteins (Osps) and the Vmp-like sequences (Vls) system (Lyme borreliae) or variable membrane proteins (Vmps, relapsing fever borreliae). B. miyamotoi possesses putative FHBPs and antigenic variation of Vmps has been demonstrated. This review summarizes and compares the common mechanisms utilized by Lyme and relapsing fever spirochetes, as well as the current state of understanding immune evasion by B. miyamotoi .
The role of protein structural analysis in the next generation sequencing era.

PubMed

Yue, Wyatt W; Froese, D Sean; Brennan, Paul E

2014-01-01

Proteins are macromolecules that serve a cell's myriad processes and functions in all living organisms via dynamic interactions with other proteins, small molecules and cellular components. Genetic variations in the protein-encoding regions of the human genome account for >85% of all known Mendelian diseases, and play an influential role in shaping complex polygenic diseases. Proteins also serve as the predominant target class for the design of small molecule drugs to modulate their activity. Knowledge of the shape and form of proteins, by means of their three-dimensional structures, is therefore instrumental to understanding their roles in disease and their potentials for drug development. In this chapter we outline, with the wide readership of non-structural biologists in mind, the various experimental and computational methods available for protein structure determination. We summarize how the wealth of structure information, contributed to a large extent by the technological advances in structure determination to date, serves as a useful tool to decipher the molecular basis of genetic variations for disease characterization and diagnosis, particularly in the emerging era of genomic medicine, and becomes an integral component in the modern day approach towards rational drug development.

Rapid search for tertiary fragments reveals protein sequence–structure relationships

PubMed Central

Zhou, Jianfu; Grigoryan, Gevorg

2015-01-01

Finding backbone substructures from the Protein Data Bank that match an arbitrary query structural motif, composed of multiple disjoint segments, is a problem of growing relevance in structure prediction and protein design. Although numerous protein structure search approaches have been proposed, methods that address this specific task without additional restrictions and on practical time scales are generally lacking. Here, we propose a solution, dubbed MASTER, that is both rapid, enabling searches over the Protein Data Bank in a matter of seconds, and provably correct, finding all matches below a user-specified root-mean-square deviation cutoff. We show that despite the potentially exponential time complexity of the problem, running times in practice are modest even for queries with many segments. The ability to explore naturally plausible structural and sequence variations around a given motif has the potential to synthesize its design principles in an automated manner; so we go on to illustrate the utility of MASTER to protein structural biology. We demonstrate its capacity to rapidly establish structure–sequence relationships, uncover the native designability landscapes of tertiary structural motifs, identify structural signatures of binding, and automatically rewire protein topologies. Given the broad utility of protein tertiary fragment searches, we hope that providing MASTER in an open-source format will enable novel advances in understanding, predicting, and designing protein structure. PMID:25420575
Digenic inheritance in autosomal recessive non-syndromic hearing loss cases carrying GJB2 heterozygote mutations: assessment of GJB4, GJA1, and GJC3.

PubMed

Kooshavar, Daniz; Tabatabaiefar, Mohammad Amin; Farrokhi, Effat; Abolhasani, Marziye; Noori-Daloii, Mohammad-Reza; Hashemzadeh-Chaleshtori, Morteza

2013-02-01

Autosomal recessive non-syndromic hearing loss (ARNSHL) can be caused by many genes. However, mutations in the GJB2 gene, which encodes the gap-junction (GJ) protein connexin (Cx) 26, constitute a considerable proportion differing among population. Between 10 and 42 percent of patients with recessive GJB2 mutations carry only one mutant allele. Mutations in GJB4, GJA1, and GJC3 encoding Cx30.3, Cx43, and Cx29, respectively, can lead to HL. Combination of different connexins in heteromeric and heterotypic GJ assemblies is possible. This study aims to determine whether variations in any of the genes GJB4, GJA1 or GJC3 can be the second mutant allele causing the disease in the digenic mode of inheritance in the studied GJB2 heterozygous cases. We examined 34 unrelated GJB2 heterozygous ARNSHL subjects from different geographic and ethnic areas in Iran, using polymerase chain reaction (PCR) followed by direct DNA sequencing to identify any sequence variations in these genes. Restriction fragment length polymorphism (RFLP) assays were performed on 400 normal hearing individuals. Sequence analysis of GJB4 showed five heterozygous variations including c.451C>A, c.219C>T, c.507C>G, c.155_158delTCTG and c.542C>T, with only the latter variation not being detected in any of control samples. There were three heterozygous variations including c.758C>T, c.717G>A and c.3*dupA in GJA1 in four cases. We found no variations in GJC3 gene sequence. Our data suggest that GJB4 c.542C>T variant and less likely some variations of GJB4 and GJA1, but not possibly GJC3, can be assigned to ARNSHL in GJB2 heterozygous mutation carriers providing clues of the digenic pattern. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Identification and Characterization of Novel Variations in Platelet G-Protein Coupled Receptor (GPCR) Genes in Patients Historically Diagnosed with Type 1 von Willebrand Disease.

PubMed

Stockley, Jacqueline; Nisar, Shaista P; Leo, Vincenzo C; Sabi, Essa; Cunningham, Margaret R; Eikenboom, Jeroen C; Lethagen, Stefan; Schneppenheim, Reinhard; Goodeve, Anne C; Watson, Steve P; Mundell, Stuart J; Daly, Martina E

2015-01-01

The clinical expression of type 1 von Willebrand disease may be modified by co-inheritance of other mild bleeding diatheses. We previously showed that mutations in the platelet P2Y12 ADP receptor gene (P2RY12) could contribute to the bleeding phenotype in patients with type 1 von Willebrand disease. Here we investigated whether variations in platelet G protein-coupled receptor genes other than P2RY12 also contributed to the bleeding phenotype. Platelet G protein-coupled receptor genes P2RY1, F2R, F2RL3, TBXA2R and PTGIR were sequenced in 146 index cases with type 1 von Willebrand disease and the potential effects of identified single nucleotide variations were assessed using in silico methods and heterologous expression analysis. Seven heterozygous single nucleotide variations were identified in 8 index cases. Two single nucleotide variations were detected in F2R; a novel c.-67G>C transversion which reduced F2R transcriptional activity and a rare c.1063C>T transition predicting a p.L355F substitution which did not interfere with PAR1 expression or signalling. Two synonymous single nucleotide variations were identified in F2RL3 (c.402C>G, p.A134 =; c.1029 G>C p.V343 =), both of which introduced less commonly used codons and were predicted to be deleterious, though neither of them affected PAR4 receptor expression. A third single nucleotide variation in F2RL3 (c.65 C>A; p.T22N) was co-inherited with a synonymous single nucleotide variation in TBXA2R (c.6680 C>T, p.S218 =). Expression and signalling of the p.T22N PAR4 variant was similar to wild-type, while the TBXA2R variation introduced a cryptic splice site that was predicted to cause premature termination of protein translation. The enrichment of single nucleotide variations in G protein-coupled receptor genes among type 1 von Willebrand disease patients supports the view of type 1 von Willebrand disease as a polygenic disorder.
Faster-X evolution of gene expression is driven by recessive adaptive cis-regulatory variation in Drosophila.

PubMed

Llopart, Ana

2018-05-01

The hemizygosity of the X (Z) chromosome fully exposes the fitness effects of mutations on that chromosome and has evolutionary consequences on the relative rates of evolution of X and autosomes. Specifically, several population genetics models predict increased rates of evolution in X-linked loci relative to autosomal loci. This prediction of faster-X evolution has been evaluated and confirmed for both protein coding sequences and gene expression. In the case of faster-X evolution for gene expression divergence, it is often assumed that variation in 5' noncoding sequences is associated with variation in transcript abundance between species but a formal, genomewide test of this hypothesis is still missing. Here, I use whole genome sequence data in Drosophila yakuba and D. santomea to evaluate this hypothesis and report positive correlations between sequence divergence at 5' noncoding sequences and gene expression divergence. I also examine polymorphism and divergence in 9,279 noncoding sequences located at the 5' end of annotated genes and detected multiple signals of positive selection. Notably, I used the traditional synonymous sites as neutral reference to test for adaptive evolution, but I also used bases 8-30 of introns <65 bp, which have been proposed to be a better neutral choice. X-linked genes with high degree of male-biased expression show the most extreme adaptive pattern at 5' noncoding regions, in agreement with faster-X evolution for gene expression divergence and a higher incidence of positively selected recessive mutations. © 2018 The Authors. Molecular Ecology Published by John Wiley & Sons Ltd.
Exome sequencing and arrayCGH detection of gene sequence and copy number variation between ILS and ISS mouse strains.

PubMed

Dumas, Laura; Dickens, C Michael; Anderson, Nathan; Davis, Jonathan; Bennett, Beth; Radcliffe, Richard A; Sikela, James M

2014-06-01

It has been well documented that genetic factors can influence predisposition to develop alcoholism. While the underlying genomic changes may be of several types, two of the most common and disease associated are copy number variations (CNVs) and sequence alterations of protein coding regions. The goal of this study was to identify CNVs and single-nucleotide polymorphisms that occur in gene coding regions that may play a role in influencing the risk of an individual developing alcoholism. Toward this end, two mouse strains were used that have been selectively bred based on their differential sensitivity to alcohol: the Inbred long sleep (ILS) and Inbred short sleep (ISS) mouse strains. Differences in initial response to alcohol have been linked to risk for alcoholism, and the ILS/ISS strains are used to investigate the genetics of initial sensitivity to alcohol. Array comparative genomic hybridization (arrayCGH) and exome sequencing were conducted to identify CNVs and gene coding sequence differences, respectively, between ILS and ISS mice. Mouse arrayCGH was performed using catalog Agilent 1 × 244 k mouse arrays. Subsequently, exome sequencing was carried out using an Illumina HiSeq 2000 instrument. ArrayCGH detected 74 CNVs that were strain-specific (38 ILS/36 ISS), including several ISS-specific deletions that contained genes implicated in brain function and neurotransmitter release. Among several interesting coding variations detected by exome sequencing was the gain of a premature stop codon in the alpha-amylase 2B (AMY2B) gene specifically in the ILS strain. In total, exome sequencing detected 2,597 and 1,768 strain-specific exonic gene variants in the ILS and ISS mice, respectively. This study represents the most comprehensive and detailed genomic comparison of ILS and ISS mouse strains to date. The two complementary genome-wide approaches identified strain-specific CNVs and gene coding sequence variations that should provide strong candidates to contribute to the alcohol-related phenotypic differences associated with these strains.
Efficient analysis of mouse genome sequences reveal many nonsense variants

PubMed Central

Steeland, Sophie; Timmermans, Steven; Van Ryckeghem, Sara; Hulpiau, Paco; Saeys, Yvan; Van Montagu, Marc; Vandenbroucke, Roosmarijn E.; Libert, Claude

2016-01-01

Genetic polymorphisms in coding genes play an important role when using mouse inbred strains as research models. They have been shown to influence research results, explain phenotypical differences between inbred strains, and increase the amount of interesting gene variants present in the many available inbred lines. SPRET/Ei is an inbred strain derived from Mus spretus that has ∼1% sequence difference with the C57BL/6J reference genome. We obtained a listing of all SNPs and insertions/deletions (indels) present in SPRET/Ei from the Mouse Genomes Project (Wellcome Trust Sanger Institute) and processed these data to obtain an overview of all transcripts having nonsynonymous coding sequence variants. We identified 8,883 unique variants affecting 10,096 different transcripts from 6,328 protein-coding genes, which is about 28% of all coding genes. Because only a subset of these variants results in drastic changes in proteins, we focused on variations that are nonsense mutations that ultimately resulted in a gain of a stop codon. These genes were identified by in silico changing the C57BL/6J coding sequences to the SPRET/Ei sequences, converting them to amino acid (AA) sequences, and comparing the AA sequences. All variants and transcripts affected were also stored in a database, which can be browsed using a SPRET/Ei M. spretus variants web tool (www.spretus.org), including a manual. We validated the tool by demonstrating the loss of function of three proteins predicted to be severely truncated, namely Fas, IRAK2, and IFNγR1. PMID:27147605
sc-PDB: a database for identifying variations and multiplicity of 'druggable' binding sites in proteins.

PubMed

Meslamani, Jamel; Rognan, Didier; Kellenberger, Esther

2011-05-01

The sc-PDB database is an annotated archive of druggable binding sites extracted from the Protein Data Bank. It contains all-atoms coordinates for 8166 protein-ligand complexes, chosen for their geometrical and physico-chemical properties. The sc-PDB provides a functional annotation for proteins, a chemical description for ligands and the detailed intermolecular interactions for complexes. The sc-PDB now includes a hierarchical classification of all the binding sites within a functional class. The sc-PDB entries were first clustered according to the protein name indifferent of the species. For each cluster, we identified dissimilar sites (e.g. catalytic and allosteric sites of an enzyme). SCOPE AND APPLICATIONS: The classification of sc-PDB targets by binding site diversity was intended to facilitate chemogenomics approaches to drug design. In ligand-based approaches, it avoids comparing ligands that do not share the same binding site. In structure-based approaches, it permits to quantitatively evaluate the diversity of the binding site definition (variations in size, sequence and/or structure). The sc-PDB database is freely available at: http://bioinfo-pharma.u-strasbg.fr/scPDB.
Exploring the limits of sequence and structure in a variant βγ-crystallin domain of the protein absent in melanoma-1 (AIM1)

PubMed Central

Aravind, Penmatsa; Wistow, Graeme; Sharma, Yogendra; Sankaranarayanan, Rajan

2008-01-01

βγ-Crystallins belong to a superfamily of proteins in prokaryotes and eukaryotes that are based on duplications of a characteristic, highly conserved Greek Key motif. Most members of the superfamily in vertebrates are structural proteins of the eye lens that contain four motifs arranged as two structural domains. Absent in melanoma-1 (AIM1), an unusual member of the superfamily whose expression is associated with suppression of malignancy in melanoma, contains 12 βγ-crystallin motifs in six domains. Some of these motifs diverge considerably from the canonical motif sequence. AIM1g1, the first βγ-crystallin domain of AIM1, is the most variant of βγ-crystallin domains currently known. In order to understand the limits of sequence variation on the structure, we report the crystal structure of AIM1g1 at 1.9Å resolution. In spite of having changes in key residues, the domain retains the overall βγ-crystallin fold. The domain also contains an unusual extended surface loop that significantly alters the shape of the domain and its charge profile. This structure illustrates the resilience of the βγ fold to considerable sequence changes and its remarkable ability to adapt for novel functions. PMID:18582473
The nucleotides they are a-changin': function of RNA binding proteins in post-transcriptional messenger RNA editing and modification in Arabidopsis.

PubMed

Kramer, Marianne C; Anderson, Stephen J; Gregory, Brian D

2018-06-05

During and after transcription, the fate of an RNA molecule is almost entirely directed by the cohorts of interacting RNA-binding proteins (RBPs). RBPs regulate all stages of the life cycle of a messenger RNA (mRNA) molecule, including splicing, polyadenylation, transport out of the nucleus, RNA stability, and translation. In addition to these functions, RBPs can function to modify or edit the sequences encoded by the RNA. While the sequence for each transcript is determined in the genome, by the time an RNA reaches its final fate, the sequence may have been edited, where one nucleotide is converted to another, or modified, where a chemical group, or sometimes others moieties, are covalently linked to a nucleotide base. These changes to the RNA sequence have major consequences on the function of the RNA. Additionally, variation in the levels of the RBPs that perform the editing or modification can drastically affect the fitness of an organism. Here, we review RBPs that are known to edit or modify RNA ribonucleotides, focusing on the RNA editing ability of the pentatricopeptide repeat (PPR) proteins and the RBPs that modify adenosine to N 6 - methyladenosine. Copyright © 2018 Elsevier Ltd. All rights reserved.
Hsp90 and environmental stress transform the adaptive value of natural genetic variation.

PubMed

Jarosz, Daniel F; Lindquist, Susan

2010-12-24

How can species remain unaltered for long periods yet also undergo rapid diversification? By linking genetic variation to phenotypic variation via environmental stress, the Hsp90 protein-folding reservoir might promote both stasis and change. However, the nature and adaptive value of Hsp90-contingent traits remain uncertain. In ecologically and genetically diverse yeasts, we find such traits to be both common and frequently adaptive. Most are based on preexisting variation, with causative polymorphisms occurring in coding and regulatory sequences alike. A common temperature stress alters phenotypes similarly. Both selective inhibition of Hsp90 and temperature stress increase correlations between genotype and phenotype. This system broadly determines the adaptive value of standing genetic variation and, in so doing, has influenced the evolution of current genomes.
Genome-wide identification of aquaporin encoding genes in Brassica oleracea and their phylogenetic sequence comparison to Brassica crops and Arabidopsis

PubMed Central

Diehn, Till A.; Pommerrenig, Benjamin; Bernhardt, Nadine; Hartmann, Anja; Bienert, Gerd P.

2015-01-01

Aquaporins (AQPs) are essential channel proteins that regulate plant water homeostasis and the uptake and distribution of uncharged solutes such as metalloids, urea, ammonia, and carbon dioxide. Despite their importance as crop plants, little is known about AQP gene and protein function in cabbage (Brassica oleracea) and other Brassica species. The recent releases of the genome sequences of B. oleracea and Brassica rapa allow comparative genomic studies in these species to investigate the evolution and features of Brassica genes and proteins. In this study, we identified all AQP genes in B. oleracea by a genome-wide survey. In total, 67 genes of four plant AQP subfamilies were identified. Their full-length gene sequences and locations on chromosomes and scaffolds were manually curated. The identification of six additional full-length AQP sequences in the B. rapa genome added to the recently published AQP protein family of this species. A phylogenetic analysis of AQPs of Arabidopsis thaliana, B. oleracea, B. rapa allowed us to follow AQP evolution in closely related species and to systematically classify and (re-) name these isoforms. Thirty-three groups of AQP-orthologous genes were identified between B. oleracea and Arabidopsis and their expression was analyzed in different organs. The two selectivity filters, gene structure and coding sequences were highly conserved within each AQP subfamily while sequence variations in some introns and untranslated regions were frequent. These data suggest a similar substrate selectivity and function of Brassica AQPs compared to Arabidopsis orthologs. The comparative analyses of all AQP subfamilies in three Brassicaceae species give initial insights into AQP evolution in these taxa. Based on the genome-wide AQP identification in B. oleracea and the sequence analysis and reprocessing of Brassica AQP information, our dataset provides a sequence resource for further investigations of the physiological and molecular functions of Brassica crop AQPs. PMID:25904922
Direct Observation of Parallel Folding Pathways Revealed Using a Symmetric Repeat Protein System

PubMed Central

Aksel, Tural; Barrick, Doug

2014-01-01

Although progress has been made to determine the native fold of a polypeptide from its primary structure, the diversity of pathways that connect the unfolded and folded states has not been adequately explored. Theoretical and computational studies predict that proteins fold through parallel pathways on funneled energy landscapes, although experimental detection of pathway diversity has been challenging. Here, we exploit the high translational symmetry and the direct length variation afforded by linear repeat proteins to directly detect folding through parallel pathways. By comparing folding rates of consensus ankyrin repeat proteins (CARPs), we find a clear increase in folding rates with increasing size and repeat number, although the size of the transition states (estimated from denaturant sensitivity) remains unchanged. The increase in folding rate with chain length, as opposed to a decrease expected from typical models for globular proteins, is a clear demonstration of parallel pathways. This conclusion is not dependent on extensive curve-fitting or structural perturbation of protein structure. By globally fitting a simple parallel-Ising pathway model, we have directly measured nucleation and propagation rates in protein folding, and have quantified the fluxes along each path, providing a detailed energy landscape for folding. This finding of parallel pathways differs from results from kinetic studies of repeat-proteins composed of sequence-variable repeats, where modest repeat-to-repeat energy variation coalesces folding into a single, dominant channel. Thus, for globular proteins, which have much higher variation in local structure and topology, parallel pathways are expected to be the exception rather than the rule. PMID:24988356
Molecular characterization of hypoxia and hypoxia-inducible factor 1 alpha (HIF-1α) from Taiwan voles (Microtus kikuchii).

PubMed

Jiang, Yi-Fan; Chou, Chung-Hsi; Lin, En-Chung; Chiu, Chih-Hsien

2011-02-01

Hypoxia-inducible factor 1 (HIF-1) is a transcription factor that senses and adapts cells to hypoxic environmental conditions. HIF-1 is composed of an oxygen-regulated α subunit (HIF-1α) and a constitutively expressed β subunit (HIF-1β). Taiwan voles (Microtus kikuchii) are an endemic species in Taiwan, found only in mountainous areas greater than 2000m above sea level. In this study, the full-length HIF-1α cDNA was cloned and sequenced from liver tissues of Taiwan voles. We found that HIF-1α of Taiwan voles had high sequence similarity to HIF-1α of other species. Sequence alignment of HIF-1α functional domains indicated basic helix-loop-helix (bHLH), PER-ARNT-SIM (PAS) and C-terminal transactivation (TAD-C) domains were conserved among species, but sequence variations were found between the oxygen-dependent degradation domains (ODDD). To measure Taiwan vole HIF-1α responses to hypoxia, animals were challenged with cobalt chloride, and HIF-1α mRNA and protein expression in brain, lung, heart, liver, kidney, and muscle was assessed by quantitative RT-PCR and Western blot analysis. Upon induction of hypoxic stress with cobalt chloride, an increase in HIF-1α mRNA levels was detected in lung, heart, kidney, and muscle tissue. In contrast, protein expression levels showed greater variation between individual animals. These results suggest that the regulation of HIF-1α may be important to the Taiwan vole under cobalt chloride treatments. But more details regarding the evolutionary effect of environmental pressure on HIF-1α primary sequence, HIF-1α function and regulation in Taiwan voles remain to be identified. Copyright Â© 2010 Elsevier Inc. All rights reserved.
Information Propagation in Developmental Enhancers

NASA Astrophysics Data System (ADS)

Jena, Siddhartha; Levine, Michael

Rather than encoding information about protein sequence, certain lengths of noncoding DNA, called enhancers, interact with protein machinery such as transcription factors to precisely regulate gene expression. Enhancers have been studied extensively in the fruit fly Drosophila melanogaster, where they regulate the expression of developmental genes that establish the blueprint of the adult fly. It has been suggested that enhancer sequences possess a specific but unknown syntax with regards to the placement and strength of transcription factor binding sites. Moreover, studies in divergent fly species have shown that compensatory evolution allows for maintenance of enhancer functionality despite considerable variation in primary DNA sequence. Here, the possible role of enhancers as signal processing modules is studied as a way of explaining these two findings. We first demonstrate how this framework can be used to explain the fine-tuned spatiotemporal dynamics of gene expression. We then explore the evolutionary pressure on enhancer sequences and the resulting emergence of enhancers that are linked by compensatory mutations. This study provides a possible mechanism for the function of multiple enhancers linked to a single gene.
AlloRep: A Repository of Sequence, Structural and Mutagenesis Data for the LacI/GalR Transcription Regulators.

PubMed

Sousa, Filipa L; Parente, Daniel J; Shis, David L; Hessman, Jacob A; Chazelle, Allen; Bennett, Matthew R; Teichmann, Sarah A; Swint-Kruse, Liskin

2016-02-22

Protein families evolve functional variation by accumulating point mutations at functionally important amino acid positions. Homologs in the LacI/GalR family of transcription regulators have evolved to bind diverse DNA sequences and allosteric regulatory molecules. In addition to playing key roles in bacterial metabolism, these proteins have been widely used as a model family for benchmarking structural and functional prediction algorithms. We have collected manually curated sequence alignments for >3000 sequences, in vivo phenotypic and biochemical data for >5750 LacI/GalR mutational variants, and noncovalent residue contact networks for 65 LacI/GalR homolog structures. Using this rich data resource, we compared the noncovalent residue contact networks of the LacI/GalR subfamilies to design and experimentally validate an allosteric mutant of a synthetic LacI/GalR repressor for use in biotechnology. The AlloRep database (freely available at www.AlloRep.org) is a key resource for future evolutionary studies of LacI/GalR homologs and for benchmarking computational predictions of functional change. Copyright © 2015 Elsevier Ltd. All rights reserved.
Next-generation genomic shotgun sequencing indicates greater genetic variability in the mitochondria of Hypophthalmichthys molitrix relative to H. nobilis from the Mississippi River, USA and provides tools for research and detection

USGS Publications Warehouse

Miller, John J; Eackles, Michael S.; Stauffer, Jay R; King, Timothy L.

2015-01-01

We characterized variation within the mitochondrial genomes of the invasive silver carp (Hypophthalmichthys molitrix) and bighead carp (H. nobilis) from the Mississippi River drainage by mapping our Next-Generation sequences to their publicly available genomes. Variant detection resulted in 338 single-nucleotide polymorphisms for H. molitrix and 39 for H. nobilis. The much greater genetic variation in H. molitrix mitochondria relative to H. nobilis may be indicative of a greater North American female effective population size of the former. When variation was quantified by gene, many tRNA loci appear to have little or no variability based on our results whereas protein-coding regions were more frequently polymorphic. These results provide biologists with additional regions of DNA to be used as markers to study the invasion dynamics of these species.
PRDM9 variation strongly influences recombination hot-spot activity and meiotic instability in humans

PubMed Central

Berg, Ingrid L.; Neumann, Rita; Lam, Kwan-Wood G.; Sarbajna, Shriparna; Odenthal-Hesse, Linda; May, Celia A.; Jeffreys, Alec J.

2011-01-01

PRDM9 has recently been identified as a likely trans-regulator of meiotic recombination hot spots in humans and mice1-3. The protein contains a zinc finger array that in humans can recognise a short sequence motif associated with hot spots4, with binding to this motif possibly triggering hot-spot activity via chromatin remodelling5. We now show that variation in the zinc finger array in humans has a profound effect on sperm hot-spot activity, even at hot spots lacking the sequence motif. Very subtle changes within the array can create hot-spot non-activating and enhancing alleles, and even trigger the appearance of a new hot spot. PRDM9 thus appears to be the preeminent global regulator of hot spots in humans. Variation at this locus also influences aspects of genome instability, specifically a megabase-scale rearrangement underlying two genomic disorders6 as well as minisatellite instability7, implicating PRDM9 as a risk factor for some pathological genome rearrangements. PMID:20818382
An analytical framework for whole-genome sequence association studies and its implications for autism spectrum disorder.

PubMed

Werling, Donna M; Brand, Harrison; An, Joon-Yong; Stone, Matthew R; Zhu, Lingxue; Glessner, Joseph T; Collins, Ryan L; Dong, Shan; Layer, Ryan M; Markenscoff-Papadimitriou, Eirene; Farrell, Andrew; Schwartz, Grace B; Wang, Harold Z; Currall, Benjamin B; Zhao, Xuefang; Dea, Jeanselle; Duhn, Clif; Erdman, Carolyn A; Gilson, Michael C; Yadav, Rachita; Handsaker, Robert E; Kashin, Seva; Klei, Lambertus; Mandell, Jeffrey D; Nowakowski, Tomasz J; Liu, Yuwen; Pochareddy, Sirisha; Smith, Louw; Walker, Michael F; Waterman, Matthew J; He, Xin; Kriegstein, Arnold R; Rubenstein, John L; Sestan, Nenad; McCarroll, Steven A; Neale, Benjamin M; Coon, Hilary; Willsey, A Jeremy; Buxbaum, Joseph D; Daly, Mark J; State, Matthew W; Quinlan, Aaron R; Marth, Gabor T; Roeder, Kathryn; Devlin, Bernie; Talkowski, Michael E; Sanders, Stephan J

2018-05-01

Genomic association studies of common or rare protein-coding variation have established robust statistical approaches to account for multiple testing. Here we present a comparable framework to evaluate rare and de novo noncoding single-nucleotide variants, insertion/deletions, and all classes of structural variation from whole-genome sequencing (WGS). Integrating genomic annotations at the level of nucleotides, genes, and regulatory regions, we define 51,801 annotation categories. Analyses of 519 autism spectrum disorder families did not identify association with any categories after correction for 4,123 effective tests. Without appropriate correction, biologically plausible associations are observed in both cases and controls. Despite excluding previously identified gene-disrupting mutations, coding regions still exhibited the strongest associations. Thus, in autism, the contribution of de novo noncoding variation is probably modest in comparison to that of de novo coding variants. Robust results from future WGS studies will require large cohorts and comprehensive analytical strategies that consider the substantial multiple-testing burden.
Characterization of expressed resistance gene analogs (RGAs) from peanut expressed sequence tags (ESTs)

USDA-ARS?s Scientific Manuscript database

Cultivated peanut (Arachis hypogaea L.) is one of the most important food legume crops grown worldwide, and is a major source for edible oil and protein. However, due to low genetic variation, peanut is very vulnerable to a variety of pathogens, such as early leaf spot, late leaf spot, rust and Toma...
Parallel or convergent evolution in human population genomic data revealed by genotype networks.

PubMed

R Vahdati, Ali; Wagner, Andreas

2016-08-02

Genotype networks are representations of genetic variation data that are complementary to phylogenetic trees. A genotype network is a graph whose nodes are genotypes (DNA sequences) with the same broadly defined phenotype. Two nodes are connected if they differ in some minimal way, e.g., in a single nucleotide. We analyze human genome variation data from the 1,000 genomes project, and construct haploid genotype (haplotype) networks for 12,235 protein coding genes. The structure of these networks varies widely among genes, indicating different patterns of variation despite a shared evolutionary history. We focus on those genes whose genotype networks show many cycles, which can indicate homoplasy, i.e., parallel or convergent evolution, on the sequence level. For 42 genes, the observed number of cycles is so large that it cannot be explained by either chance homoplasy or recombination. When analyzing possible explanations, we discovered evidence for positive selection in 21 of these genes and, in addition, a potential role for constrained variation and purifying selection. Balancing selection plays at most a small role. The 42 genes with excess cycles are enriched in functions related to immunity and response to pathogens. Genotype networks are representations of genetic variation data that can help understand unusual patterns of genomic variation.

Beyond directed evolution - semi-rational protein engineering and design

PubMed Central

Lutz, Stefan

2010-01-01

Over the last two decades, directed evolution has transformed the field of protein engineering. The advances in understanding protein structure and function, in no insignificant part a result of directed evolution studies, are increasingly empowering scientists and engineers to device more effective methods for manipulating and tailoring biocatalysts. Abandoning large combinatorial libraries, the focus has shifted to small, functionally-rich libraries and rational design. A critical component to the success of these emerging engineering strategies are computational tools for the evaluation of protein sequence datasets and the analysis of conformational variations of amino acids in proteins. Highlighting the opportunities and limitations of such approaches, this review focuses on recent engineering and design examples that require screening or selection of small libraries. PMID:20869867
Multivalent Protein Polymer MRI Contrast Agents: Controlling Relaxivity via Modulation of Amino Acid Sequence

PubMed Central

Karfeld-Sulzer, Lindsay S.; Waters, Emily A.; Davis, Nicolynn E.; Meade, Thomas J.; Barron, Annelise E.

2010-01-01

Magnetic Resonance Imaging (MRI) is a noninvasive imaging modality with high spatial and temporal resolution. Contrast agents (CAs) are frequently used to increase the contrast between tissues of interest. To increase the effectiveness of MR agents, small molecule CAs have been attached to macromolecules. We have created a family of biodegradable, macromolecular CAs based on protein polymers, allowing control over the CA properties. The protein polymers are monodisperse, random coil, and contain evenly spaced lysines that serve as reactive sites for Gd(III) chelates. The exact sequence and length of the protein can be specified, enabling controlled variation in lysine spacing and molecular weight. Relaxivity could be modulated by changing protein polymer length and lysine spacing. Relaxivities of up to ∼14 mM-1s-1 per Gd(III) and ∼461 mM-1s-1 per conjugate were observed. These CAs are biodegradable by incubation with plasmin, such that they can be easily excreted after use. They do not reduce cell viability, a prerequisite for future in vivo studies. The protein polymer CAs can be customized for different clinical diagnostic applications, including biomaterial tracking, as a balanced agent with high relaxivity and appropriate molar mass. PMID:20420441
Comparative sequencing in the genus Lycopersicon. Implications for the evolution of fruit size in the domestication of cultivated tomatoes.

PubMed Central

Nesbitt, T Clint; Tanksley, Steven D

2002-01-01

Sequence variation was sampled in cultivated and related wild forms of tomato at fw2.2--a fruit weight QTL key to the evolution of domesticated tomatoes. Variation at fw2.2 was contrasted with variation at four other loci not involved in fruit weight determination. Several conclusions could be reached: (1) Fruit weight variation attributable to fw2.2 is not caused by variation in the FW2.2 protein sequence; more likely, it is due to transcriptional variation associated with one or more of eight nucleotide changes unique to the promoter of large-fruit alleles; (2) fw2.2 and loci not involved in fruit weight have not evolved at distinguishably different rates in cultivated and wild tomatoes, despite the fact that fw2.2 was likely a target of selection during domestication; (3) molecular-clock-based estimates suggest that the large-fruit allele of fw2.2, now fixed in most cultivated tomatoes, arose in tomato germplasm long before domestication; (4) extant accessions of L. esculentum var. cerasiforme, the subspecies thought to be the most likely wild ancestor of domesticated tomatoes, appear to be an admixture of wild and cultivated tomatoes rather than a transitional step from wild to domesticated tomatoes; and (5) despite the fact that cerasiforme accessions are polymorphic for large- and small-fruit alleles at fw2.2, no significant association was detected between fruit size and fw2.2 genotypes in the subspecies--as tested by association genetic studies in the relatively small sample studied--suggesting the role of other fruit weight QTL in fruit weight variation in cerasiforme. PMID:12242247
Isolation and characterization of NBS–LRR resistance gene analogues from mango

PubMed Central

Lei, Xintao; Yao, Quansheng; Xu, Xuerong; Liu, Yang

2014-01-01

The nucleotide-binding site (NBS)–leucine-rich repeat (LRR) gene family is a class of R genes in plants. NBS genes play a very important role in disease defence. To further study the variation and homology of mango NBS–LRR genes, 16 resistance gene analogues (RGAs) (GenBank accession number HM446507-22) were isolated from the polymerase chain reaction fragments and sequenced by using two degenerate primer sets. The total nucleotide diversity index Pi was 0.362, and 236 variation sites were found among 16 RGAs. The degree of homology between the RGAs varied from 44.4% to 98.5%. Sixteen RGAs could be translated into amino sequences. The high level of this homology in the protein sequences of the P-loop and kinase-2 of the NBS domain between the RGAs isolated in this study and previously characterized R genes indicated that these cloned sequences belonged to the NBS–LRR gene family. Moreover, these 16 RGAs could be classified into the non-TIR–NBS–LRR gene family because only tryptophan (W) could be claimed as the final residual of the kinase-2 domain of all RGAs isolated here. From our results, we concluded that our mango NBS–LRR genes possessed a high level of variation from the mango genome, which may allow mango to recognize many different pathogenic virulence factors. PMID:26740762
Two missense mutations in melanocortin 1 receptor (MC1R) are strongly associated with dark ventral coat color in reindeer (Rangifer tarandus).

PubMed

Våge, D I; Nieminen, M; Anderson, D G; Røed, K H

2014-10-01

The protein-coding region of melanocortin 1 receptor (MC1R) was sequenced to identify potential variation affecting coat color in reindeer (Rangifer tarandus). A T→C sequence variation at nucleotide position 218 (c.218T>C) causing an amino acid (aa) change from methionine to threonine at aa position 73 (p.Met73Thr) was identified. In addition, a T→G sequence variation was found at nucleotide position 839 (c.839T>G), causing phenylalanine to be exchanged by cysteine at aa position 280 (p.Phe280Cys). The two sequence variants (c.218C and c.839G) were found to be closely associated with a darker belly coat compared with animals not having any of these two variants. The aa acid change p.Met73Thr affects the same position as p.Met73Lys previously reported to give constitutive activation of MC1R in black sheep (Ovis aries), whereas p.Phe280Cys is identical to one of two variants previously reported to be associated with dark coat color in Arctic fox (Alopex lagopus), supporting that the two variants found in reindeer are functional. The complete absence of Thr73 and Cys280 among the 51 wild reindeer analyzed provides some evidence that these variants are more common in the domestic herds. © 2014 Stichting International Foundation for Animal Genetics.
Variation of amino acid sequences of serum amyloid a (SAA) and immunohistochemical analysis of amyloid a (AA) in Japanese domestic cats.

PubMed

Tei, Meina; Uchida, Kazuyuki; Chambers, James K; Watanabe, Ken-Ichi; Tamamoto, Takashi; Ohno, Koichi; Nakayama, Hiroyuki

2018-02-02

Amyloid A (AA) amyloidosis, a fatal systemic amyloid disease, occurs secondary to chronic inflammatory conditions in humans. Although persistently elevated serum amyloid A (SAA) levels are required for its pathogenesis, not all individuals with chronic inflammation necessarily develop AA amyloidosis. Furthermore, many diseases in cats are associated with the elevated production of SAA, whereas only a small number actually develop AA amyloidosis. We hypothesized that a genetic mutation in the SAA gene may strongly contribute to the pathogenesis of feline AA amyloidosis. In the present study, genomic DNA from four Japanese domestic cats (JDCs) with AA amyloidosis and from five without amyloidosis was analyzed using polymerase chain reaction (PCR) amplification and direct sequencing. We identified the novel variation combination of 45R-51A in the deduced amino acid sequences of four JDCs with amyloidosis and five without. However, there was no relationship between amino acid variations and the distribution of AA amyloid deposits, indicating that differences in SAA sequences do not contribute to the pathogenesis of AA amyloidosis. Immunohistochemical analysis using antisera against the three different parts of the feline SAA protein-i.e., the N-terminal, central, and C-terminal regions-revealed that feline AA contained the C-terminus, unlike human AA. These results indicate that the cleavage and degradation of the C-terminus are not essential for amyloid fibril formation in JDCs.
Sequence charge decoration dictates coil-globule transition in intrinsically disordered proteins

NASA Astrophysics Data System (ADS)

Firman, Taylor; Ghosh, Kingshuk

2018-03-01

We present an analytical theory to compute conformations of heteropolymers—applicable to describe disordered proteins—as a function of temperature and charge sequence. The theory describes coil-globule transition for a given protein sequence when temperature is varied and has been benchmarked against the all-atom Monte Carlo simulation (using CAMPARI) of intrinsically disordered proteins (IDPs). In addition, the model quantitatively shows how subtle alterations of charge placement in the primary sequence—while maintaining the same charge composition—can lead to significant changes in conformation, even as drastic as a coil (swelled above a purely random coil) to globule (collapsed below a random coil) and vice versa. The theory provides insights on how to control (enhance or suppress) these changes by tuning the temperature (or solution condition) and charge decoration. As an application, we predict the distribution of conformations (at room temperature) of all naturally occurring IDPs in the DisProt database and notice significant size variation even among IDPs with a similar composition of positive and negative charges. Based on this, we provide a new diagram-of-states delineating the sequence-conformation relation for proteins in the DisProt database. Next, we study the effect of post-translational modification, e.g., phosphorylation, on IDP conformations. Modifications as little as two-site phosphorylation can significantly alter the size of an IDP with everything else being constant (temperature, salt concentration, etc.). However, not all possible modification sites have the same effect on protein conformations; there are certain "hot spots" that can cause maximal change in conformation. The location of these "hot spots" in the parent sequence can readily be identified by using a sequence charge decoration metric originally introduced by Sawle and Ghosh. The ability of our model to predict conformations (both expanded and collapsed states) of IDPs at a high-throughput level can provide valuable insights into the different mechanisms by which phosphorylation/charge mutation controls IDP function.
A few sequence polymorphisms among isolates of Maize bushy stunt phytoplasma associate with organ proliferation symptoms of infected maize plants

PubMed Central

Orlovskis, Zigmunds; Canale, Maria Cristina; Haryono, Mindia; Lopes, João Roberto Spotti

2017-01-01

Background and Aims Maize bushy stunt phytoplasma (MBSP) is a bacterial pathogen of maize (Zea mays L.) across Latin America. MBSP belongs to the 16SrI-B sub-group within the genus ‘Candidatus Phytoplasma’. MBSP and its insect vector Dalbulus maidis (Hemiptera: Cicadellidae) are restricted to maize; both are thought to have coevolved with maize during its domestication from a teosinte-like ancestor. MBSP-infected maize plants show a diversity of symptoms. and it is likely that MBSP is under strong selection for increased virulence and insect transmission on maize hybrids that are widely grown in Brazil. In this study it was investigated whether the differences in genome sequences of MBSP isolates from two maize-growing regions in South-east Brazil explain variations in symptom severity of the MBSP isolates on various maize genotypes. Methods MBSP isolates were collected from maize production fields in Guaíra and Piracicaba in South-east Brazil for infection assays. One representative isolate was chosen for de novo whole-genome assembly and for the alignment of sequence reads from the genomes of other phytoplasma isolates to detect polymorphisms. Statistical methods were applied to investigate the correlation between variations in disease symptoms of infected maize plants and MBSP sequence polymorphisms. Key Results MBSP isolates contributed consistently to organ proliferation symptoms and maize genotype to leaf necrosis, reddening and yellowing of infected maize plants. The symptom differences are associated with polymorphisms in a phase-variable lipoprotein, which is a candidate effector, and an ATP-dependent lipoprotein ABC export protein, whereas no polymorphisms were observed in other candidate effector genes. Lipoproteins and ABC export proteins activate host defence responses, regulate pathogen attachment to host cells and activate effector secretion systems in other pathogens. Conclusions Polymorphisms in two putative virulence genes among MBSP isolates from maize-growing regions in South-east Brazil are associated with variations in organ proliferation symptoms of MBSP-infected maize plants. PMID:28069632
Complete mitochondrial genome of Yangtze River wild common carp (Cyprinus carpio haematopterus) and Russian scattered scale mirror carp (Cyprinus carpio carpio).

PubMed

Hu, Guang Fu; Liu, Xiang Jiang; Zou, Gui Wei; Li, Zhong; Liang, Hong-Wei; Hu, Shao-Na

2016-01-01

We sequenced the complete mitogenomes of (Cyprinus carpio haematopterus) and Russian scattered scale mirror carp (Cyprinus carpio carpio). Comparison of these two mitogenomes revealed that the mitogenomes of these two common carp strains were remarkably similar in genome length, gene order and content, and AT content. There were only 55 bp variations in 16,581 nucleotides. About 1 bp variation was located in rRNAs, 2 bp in tRNAs, 9 bp in the control region and 43 bp in protein-coding genes. Furthermore, forty-three variable nucleotides in the protein-coding genes of the two strains led to four variable amino acids, which were located in the ND2, ATPase 6, ND5 and ND6 genes, respectively.
Genetic variability in E6, E7 and L1 genes of Human Papillomavirus 62 and its prevalence in Mexico.

PubMed

Artaza-Irigaray, Cristina; Flores-Miramontes, María Guadalupe; Olszewski, Dominik; Magaña-Torres, María Teresa; López-Cardona, María Guadalupe; Leal-Herrera, Yelda Aurora; Piña-Sánchez, Patricia; Jave-Suárez, Luis Felipe; Aguilar-Lemarroy, Adriana

2017-01-01

Human papillomavirus (HPV) is the main etiological agent of cervical cancer, the third most common cancer among women globally and the second most frequent in Mexico. Persistent infection with high-risk HPV genotypes is associated with premalignant lesions and cervical cancer development. HPVs considered as low risk or not yet classified, are often found in coinfection with different HPV genotypes. Indeed, HPV62 is one of the most prevalent HPV detected in some countries, but there is limited information about its prevalence in other regions and there are no HPV62 variants currently described. The aim of this study was to determine the prevalence of HPV62 in cervical samples from Mexican women and to identify mutations in the L1, E6 and E7 genes, which have never been reported in our population. HPV screening was performed by Cobas HPV Test in women who attended prevention health programs and dysplasia clinics. All HPV positive samples ( n = 491) and 87 additional cervical cancer samples were then genotyped with Linear Array HPV Genotyping test. Some samples were selected to corroborate genotyping by Next-Generation sequencing. On the other hand, nucleotide changes in L1, E6 and E7 genes were determined using PCR, Sanger sequencing and analysis with the CLC-MainWorkbench 7.6.1 software. L1 protein structure was predicted with the I-TASSER server. Using Linear Array, HPV62 prevalence was 7.6% in general population, 8% in Cervical Intraepithelial Neoplasia grade 1 (CIN1) samples and 4.6% in cervical samples. The presence of HPV62 was confirmed with Next-Generation sequencing. Regarding L1 gene, novel sequence variations were detected, but they did not alter the tertiary structure of the protein. Moreover, several nucleotide substitutions were found in E6 and E7 genes compared to reference HPV62 genomic sequence. Specifically, three non-synonymous sequence variations were detected, two in E6 and one in E7. HPV62 is a frequent HPV genotype found mainly in general population and in women with CIN1, and in 90.5% of the cases it was found in coinfection with other HPVs. Novel nucleotide changes in its L1, E6 and E7 genes were detected, some of them lead to changes in the protein sequence.
Dissecting genomic hotspots underlying seed protein, oil, and sucrose content in an interspecific mapping population of soybean using high-density linkage mapping.

PubMed

Patil, Gunvant; Vuong, Tri D; Kale, Sandip; Valliyodan, Babu; Deshmukh, Rupesh; Zhu, Chengsong; Wu, Xiaolei; Bai, Yonghe; Yungbluth, Dennis; Lu, Fang; Kumpatla, Siva; Shannon, J Grover; Varshney, Rajeev K; Nguyen, Henry T

2018-04-04

The cultivated [Glycine max (L) Merr.] and wild [Glycine soja Siebold & Zucc.] soybean species comprise wide variation in seed composition traits. Compared to wild soybean, cultivated soybean contains low protein, high oil, and high sucrose. In this study, an interspecific population was derived from a cross between G. max (Williams 82) and G. soja (PI 483460B). This recombinant inbred line (RIL) population of 188 lines was sequenced at 0.3× depth. Based on 91 342 single nucleotide polymorphisms (SNPs), recombination events in RILs were defined, and a high-resolution bin map was developed (4070 bins). In addition to bin mapping, quantitative trait loci (QTL) analysis for protein, oil, and sucrose was performed using 3343 polymorphic SNPs (3K-SNP), derived from Illumina Infinium BeadChip sequencing platform. The QTL regions from both platforms were compared, and a significant concordance was observed between bin and 3K-SNP markers. Importantly, the bin map derived from next-generation sequencing technology enhanced mapping resolution (from 1325 to 50 Kb). A total of five, nine, and four QTLs were identified for protein, oil, and sucrose content, respectively, and some of the QTLs coincided with soybean domestication-related genomic loci. The major QTL for protein and oil were mapped on Chr. 20 (qPro_20) and suggested negative correlation between oil and protein. In terms of sucrose content, a novel and major QTL were identified on Chr. 8 (qSuc_08) and harbours putative genes involved in sugar transport. In addition, genome-wide association using 91 342 SNPs confirmed the genomic loci derived from QTL mapping. A QTL-based haplotype using whole-genome resequencing of 106 diverse soybean lines identified unique allelic variation in wild soybean that could be utilized to widen the genetic base in cultivated soybean. © 2018 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
In Vitro Comparison of Adipokine Export Signals.

PubMed

Sharafi, Parisa; Kocaefe, Y Çetin

2016-01-01

Mammalian cells are widely used for recombinant protein production in research and biotechnology. Utilization of export signals significantly facilitates production and purification processes. 35 years after the discovery of the mammalian export machinery, there still are obscurities regarding the efficiency of the export signals. The aim of this study was the comparative evaluation of the efficiency of selected export signals using adipocytes as a cell model. Adipocytes have a large capacity for protein secretion including several enzymes, adipokines, and other signaling molecules, providing a valid system for a quantitative evaluation. Constructs that expressed N-terminal fusion export signals were generated to express Enhanced Green Fluorescence Protein (EGFP) as a reporter for quantitative and qualitative evaluation. Furthermore, fluorescent microscopy was used to trace the intracellular traffic of the reporter. The export efficiency of six selected proteins secreted from adipocytes was evaluated. Quantitative comparison of intracellular and exported fractions of the recombinant constructs demonstrated a similar efficiency among the studied sequences with minor variations. The export signal of Retinol Binding Protein (RBP4) exhibited the highest efficiency. This study presents the first quantitative data showing variations among export signals, in adipocytes which will help optimization of recombinant protein distribution.
Characterization of a Novel Polerovirus Infecting Maize in China

PubMed Central

Chen, Sha; Jiang, Guangzhuang; Wu, Jianxiang; Liu, Yong; Qian, Yajuan; Zhou, Xueping

2016-01-01

A novel virus, tentatively named Maize Yellow Mosaic Virus (MaYMV), was identified from the field-grown maize plants showing yellow mosaic symptoms on the leaves collected from the Yunnan Province of China by the deep sequencing of small RNAs. The complete 5642 nucleotide (nt)-long genome of the MaYMV shared the highest nucleotide sequence identity (73%) to Maize Yellow Dwarf Virus-RMV. Sequence comparisons and phylogenetic analyses suggested that MaYMV represents a new member of the genus Polerovirus in the family Luteoviridae. Furthermore, the P0 protein encoded by MaYMV was demonstrated to inhibit both local and systemic RNA silencing by co-infiltration assays using transgenic Nicotiana benthamiana line 16c carrying the GFP reporter gene, which further supported the identification of a new polerovirus. The biologically-active cDNA clone of MaYMV was generated by inserting the full-length cDNA of MaYMV into the binary vector pCB301. RT-PCR and Northern blot analyses showed that this clone was systemically infectious upon agro-inoculation into N. benthamiana. Subsequently, 13 different isolates of MaYMV from field-grown maize plants in different geographical locations of Yunnan and Guizhou provinces of China were sequenced. Analyses of their molecular variation indicate that the 3′ half of P3–P5 read-through protein coding region was the most variable, whereas the coat protein- (CP-) and movement protein- (MP-)coding regions were the most conserved. PMID:27136578
Characterization of a Novel Polerovirus Infecting Maize in China.

PubMed

Chen, Sha; Jiang, Guangzhuang; Wu, Jianxiang; Liu, Yong; Qian, Yajuan; Zhou, Xueping

2016-04-28

A novel virus, tentatively named Maize Yellow Mosaic Virus (MaYMV), was identified from the field-grown maize plants showing yellow mosaic symptoms on the leaves collected from the Yunnan Province of China by the deep sequencing of small RNAs. The complete 5642 nucleotide (nt)-long genome of the MaYMV shared the highest nucleotide sequence identity (73%) to Maize Yellow Dwarf Virus-RMV. Sequence comparisons and phylogenetic analyses suggested that MaYMV represents a new member of the genus Polerovirus in the family Luteoviridae. Furthermore, the P0 protein encoded by MaYMV was demonstrated to inhibit both local and systemic RNA silencing by co-infiltration assays using transgenic Nicotiana benthamiana line 16c carrying the GFP reporter gene, which further supported the identification of a new polerovirus. The biologically-active cDNA clone of MaYMV was generated by inserting the full-length cDNA of MaYMV into the binary vector pCB301. RT-PCR and Northern blot analyses showed that this clone was systemically infectious upon agro-inoculation into N. benthamiana. Subsequently, 13 different isolates of MaYMV from field-grown maize plants in different geographical locations of Yunnan and Guizhou provinces of China were sequenced. Analyses of their molecular variation indicate that the 3' half of P3-P5 read-through protein coding region was the most variable, whereas the coat protein- (CP-) and movement protein- (MP-)coding regions were the most conserved.
Ureaplasma antigenic variation beyond MBA phase variation: DNA inversions generating chimeric structures and switching in expression of the MBA N-terminal paralogue UU172

PubMed Central

Zimmerman, Carl-Ulrich R; Rosengarten, Renate; Spergser, Joachim

2011-01-01

Phase variation of the major ureaplasma surface membrane protein, the multiple-banded antigen (MBA), with its counterpart, the UU376 protein, was recently discussed as a result of DNA inversion occurring at specific inverted repeats. Two similar inverted repeats to the ones within the mba locus were found in the genome of Ureaplasma parvum serovar 3; one within the MBA N-terminal paralogue UU172 and another in the adjacent intergenic spacer region. In this report, we demonstrate on both genomic and protein level that DNA inversion at these inverted repeats leads to alternating expression between UU172 and the neighbouring conserved hypothetical ORF UU171. Sequence analysis of this phase-variable ‘UU172 element’ from both U. parvum and U. urealyticum strains revealed that it is highly conserved among both species and that it also includes the orthologue of UU144. A third inverted repeat region in UU144 is proposed to serve as an additional potential inversion site from which chimeric genes can evolve. Our results indicate that site-specific recombination events in the genome of U. parvum serovar 3 are dynamic and frequent, leading to a broad spectrum of antigenic variation by which the organism may evade host immune responses. PMID:21255110
Candidate chemosensory ionotropic receptors in a Lepidoptera.

PubMed

Olivier, V; Monsempes, C; François, M-C; Poivet, E; Jacquin-Joly, E

2011-04-01

A new family of candidate chemosensory ionotropic receptors (IRs) related to ionotropic glutamate receptors (iGluRs) was recently discovered in Drosophila melanogaster. Through Blast analyses of an expressed sequenced tag library prepared from male antennae of the noctuid moth Spodoptera littoralis, we identified 12 unigenes encoding proteins related to D. melanogaster and Bombyx mori IRs. Their full length sequences were obtained and the analyses of their expression patterns suggest that they were exclusively expressed or clearly enriched in chemosensory organs. The deduced protein sequences were more similar to B. mori and D. melanogaster IRs than to iGluRs and showed considerable variations in the predicted ligand-binding domains; none have the three glutamate-interacting residues found in iGluRs, suggesting different binding specificities. Our data suggest that we identified members of the insect IR chemosensory receptor family in S. littoralis and we report here the first demonstration of IR expression in Lepidoptera. © 2010 The Authors. Insect Molecular Biology © 2010 The Royal Entomological Society.
Next-generation sequencing of the Trichinella murrelli mitochondrial genome allows comprehensive comparison of its divergence from the principal agent of human trichinellosis, Trichinella spiralis.

PubMed

Webb, Kristen M; Rosenthal, Benjamin M

2011-01-01

The mitochondrial genome's non-recombinant mode of inheritance and relatively rapid rate of evolution has promoted its use as a marker for studying the biogeographic history and evolutionary interrelationships among many metazoan species. A modest portion of the mitochondrial genome has been defined for 12 species and genotypes of parasites in the genus Trichinella, but its adequacy in representing the mitochondrial genome as a whole remains unclear, as the complete coding sequence has been characterized only for Trichinella spiralis. Here, we sought to comprehensively describe the extent and nature of divergence between the mitochondrial genomes of T. spiralis (which poses the most appreciable zoonotic risk owing to its capacity to establish persistent infections in domestic pigs) and Trichinella murrelli (which is the most prevalent species in North American wildlife hosts, but which poses relatively little risk to the safety of pork). Next generation sequencing methodologies and scaffold and de novo assembly strategies were employed. The entire protein-coding region was sequenced (13,917 bp), along with a portion of the highly repetitive non-coding region (1524 bp) of the mitochondrial genome of T. murrelli with a combined average read depth of 250 reads. The accuracy of base calling, estimated from coding region sequence was found to exceed 99.3%. Genome content and gene order was not found to be significantly different from that of T. spiralis. An overall inter-species sequence divergence of 9.5% was estimated. Significant variation was identified when the amount of variation between species at each gene is compared to the average amount of variation between species across the coding region. Next generation sequencing is a highly effective means to obtain previously unknown mitochondrial genome sequence. Particular to parasites, the extremely deep coverage achieved through this method allows for the detection of sequence heterogeneity between the multiple individuals that necessarily comprise such templates. Copyright © 2010 Elsevier B.V. All rights reserved.
Integrating Transcriptome and Genome Re-Sequencing Data to Identify Key Genes and Mutations Affecting Chicken Eggshell Qualities

PubMed Central

Liu, Long; Zheng, Chuan Wei; Wang, De He; Hou, Zhuo Cheng; Ning, Zhong Hua

2015-01-01

Eggshell damages lead to economic losses in the egg production industry and are a threat to human health. We examined 49-wk-old Rhode Island White hens (Gallus gallus) that laid eggs having shells with significantly different strengths and thicknesses. We used HiSeq 2000 (Illumina) sequencing to characterize the chicken transcriptome and whole genome to identify the key genes and genetic mutations associated with eggshell calcification. We identified a total of 14,234 genes expressed in the chicken uterus, representing 89% of all annotated chicken genes. A total of 889 differentially expressed genes were identified by comparing low eggshell strength (LES) and normal eggshell strength (NES) genomes. The DEGs are enriched in calcification-related processes, including calcium ion transport and calcium signaling pathways as reveled by gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis. Some important matrix proteins, such as OC-116, LTF and SPP1, were also expressed differentially between two groups. A total of 3,671,919 single-nucleotide polymorphisms (SNPs) and 508,035 Indels were detected in protein coding genes by whole-genome re-sequencing, including 1775 non-synonymous variations and 19 frame-shift Indels in DEGs. SNPs and Indels found in this study could be further investigated for eggshell traits. This is the first report to integrate the transcriptome and genome re-sequencing to target the genetic variations which decreased the eggshell qualities. These findings further advance our understanding of eggshell calcification in the chicken uterus. PMID:25974068
Genomic stability of adipogenic human adenovirus 36.

PubMed

Nam, J-H; Na, H-N; Atkinson, R L; Dhurandhar, N V

2014-02-01

Human adenovirus Ad36 increases adiposity in several animal models, including rodents and non-human primates. Importantly, Ad36 is associated with human obesity, which has prompted research to understand its epidemiology and to develop a vaccine to prevent a subgroup of obesity. For this purpose, understanding the genomic stability of Ad36 in vivo and in vitro infections is critical. Here, we examined whether in vitro cell passaging over a 14-year period introduced any genetic variation in Ad36. We sequenced the whole genome of Ad36-which was plaque purified in 1998 from the original strain obtained from American Type Culture Collection, and passaged approximately 12 times over the past 14 years (Ad36-2012). This DNA sequence was compared with a previously published sequence of Ad36 likely obtained from the same source (Ad36-1988). Compared with Ad36-1988, only two nucleotides were altered in Ad36-2012: a T insertion at nucleotide 1862, which may induce early termination of the E1B viral protein, and a T➝C transition at nucleotide 26 136. Virus with the T insertion (designated Ad36-2012-T6) was mixed with wild-type virus lacking the T insertion (designated Ad36-2012-T5) in the viral stock. The transition at nucleotide 26 136 does not change the encoded amino acid (aspartic acid) in the pVIII viral protein. The rate of genetic variation in Ad36 is ∼2.37 × 10(-6) mutations/nucleotide/passage. Of particular importance, there were no mutations in the E4orf1 gene, the critical gene for producing obesity. This very-low-variation rate should reduce concerns about genetic variability when developing Ad36 vaccines or developing assays for detecting Ad36 infection in populations.
Bridging two scholarly islands enriches both: COI DNA barcodes for species identification versus human mitochondrial variation for the study of migrations and pathologies.

PubMed

Thaler, David S; Stoeckle, Mark Y

2016-10-01

DNA barcodes for species identification and the analysis of human mitochondrial variation have developed as independent fields even though both are based on sequences from animal mitochondria. This study finds questions within each field that can be addressed by reference to the other. DNA barcodes are based on a 648-bp segment of the mitochondrially encoded cytochrome oxidase I. From most species, this segment is the only sequence available. It is impossible to know whether it fairly represents overall mitochondrial variation. For modern humans, the entire mitochondrial genome is available from thousands of healthy individuals. SNPs in the human mitochondrial genome are evenly distributed across all protein-encoding regions arguing that COI DNA barcode is representative. Barcode variation among related species is largely based on synonymous codons. Data on human mitochondrial variation support the interpretation that most - possibly all - synonymous substitutions in mitochondria are selectively neutral. DNA barcodes confirm reports of a low variance in modern humans compared to nonhuman primates. In addition, DNA barcodes allow the comparison of modern human variance to many other extant animal species. Birds are a well-curated group in which DNA barcodes are coupled with census and geographic data. Putting modern human variation in the context of intraspecies variation among birds shows humans to be a single breeding population of average variance.

Porcine MYF6 gene: sequence, homology analysis, and variation in the promoter region.

PubMed

Wyszyńska-Koko, J; Kurył, J

2004-01-01

MYF6 gene codes for the bHLH transcription factor belonging to MyoD family. Its expression accompanies the processes of differentiation and maturation of myotubes during embriogenesis and continues on a relatively high level after birth, affecting the muscle phenotype. The porcine MYF6 gene was amplified and sequenced and compared with MYF6 gene sequences of other species. The amino acid sequence was deduced and an interspecies homology analysis was performed. Myf-6 protein shows a high conservation among species of 99 and 97% identity when comparing pig with cow and human, respectively, and of 93% when comparing pig with mouse and rat. The single nucleotide polymorphism (SNP) was revealed within the promoter region, which appeared to be T --> C transition recognized by a MspI restriction enzyme.
Comparative analysis of complete chloroplast genome sequence and inversion variation in Lasthenia burkei (Madieae, Asteraceae).

PubMed

Walker, Joseph F; Zanis, Michael J; Emery, Nancy C

2014-04-01

Complete chloroplast genome studies can help resolve relationships among large, complex plant lineages such as Asteraceae. We present the first whole plastome from the Madieae tribe and compare its sequence variation to other chloroplast genomes in Asteraceae. We used high throughput sequencing to obtain the Lasthenia burkei chloroplast genome. We compared sequence structure and rates of molecular evolution in the small single copy (SSC), large single copy (LSC), and inverted repeat (IR) regions to those for eight Asteraceae accessions and one Solanaceae accession. The chloroplast sequence of L. burkei is 150 746 bp and contains 81 unique protein coding genes and 4 coding ribosomal RNA sequences. We identified three major inversions in the L. burkei chloroplast, all of which have been found in other Asteraceae lineages, and a previously unreported inversion in Lactuca sativa. Regions flanking inversions contained tRNA sequences, but did not have particularly high G + C content. Substitution rates varied among the SSC, LSC, and IR regions, and rates of evolution within each region varied among species. Some observed differences in rates of molecular evolution may be explained by the relative proportion of coding to noncoding sequence within regions. Rates of molecular evolution vary substantially within and among chloroplast genomes, and major inversion events may be promoted by the presence of tRNAs. Collectively, these results provide insight into different mechanisms that may promote intramolecular recombination and the inversion of large genomic regions in the plastome.
Minireview: Toward the Establishment of a Link between Melatonin and Glucose Homeostasis: Association of Melatonin MT2 Receptor Variants with Type 2 Diabetes

PubMed Central

Karamitri, Angeliki; Renault, Nicolas; Clement, Nathalie; Guillaume, Jean-Luc

2013-01-01

The existence of interindividual variations in G protein-coupled receptor sequences has been recognized early on. Recent advances in large-scale exon sequencing techniques are expected to dramatically increase the number of variants identified in G protein-coupled receptors, giving rise to new challenges regarding their functional characterization. The current minireview will illustrate these challenges based on the MTNR1B gene, which encodes the melatonin MT2 receptor, for which exon sequencing revealed 40 rare nonsynonymous variants in the general population and in type 2 diabetes (T2D) cohorts. Functional characterization of these MT2 mutants revealed 14 mutants with loss of Gi protein activation that associate with increased risk of T2D development. This repertoire of disease-associated mutants is a rich source for structure-activity studies and will help to define the still poorly understood role of melatonin in glucose homeostasis and T2D development in humans. Defining the functional defects in carriers of rare MT2 mutations will help to provide personalized therapies to these patients in the future. PMID:23798576
Structural basis for regulation of rhizobial nodulation and symbiosis gene expression by the regulatory protein NolR.

PubMed

Lee, Soon Goo; Krishnan, Hari B; Jez, Joseph M

2014-04-29

The symbiosis between rhizobial microbes and host plants involves the coordinated expression of multiple genes, which leads to nodule formation and nitrogen fixation. As part of the transcriptional machinery for nodulation and symbiosis across a range of Rhizobium, NolR serves as a global regulatory protein. Here, we present the X-ray crystal structures of NolR in the unliganded form and complexed with two different 22-base pair (bp) double-stranded operator sequences (oligos AT and AA). Structural and biochemical analysis of NolR reveals protein-DNA interactions with an asymmetric operator site and defines a mechanism for conformational switching of a key residue (Gln56) to accommodate variation in target DNA sequences from diverse rhizobial genes for nodulation and symbiosis. This conformational switching alters the energetic contributions to DNA binding without changes in affinity for the target sequence. Two possible models for the role of NolR in the regulation of different nodulation and symbiosis genes are proposed. To our knowledge, these studies provide the first structural insight on the regulation of genes involved in the agriculturally and ecologically important symbiosis of microbes and plants that leads to nodule formation and nitrogen fixation.
Genetic variation of viral protein 1 genes of field strains of waterfowl parvoviruses and their attenuated derivatives.

PubMed

Tsai, Hsiang-Jung; Tseng, Chun-hsien; Chang, Poa-chun; Mei, Kai; Wang, Shih-Chi

2004-09-01

To understand the genetic variations between the field strains of waterfowl parvoviruses and their attenuated derivatives, we analyzed the complete nucleotide sequences of the viral protein 1 (VP1) genes of nine field strains and two vaccine strains of waterfowl parvoviruses. Sequence comparison of the VP1 proteins showed that these viruses could be divided into goose parvovirus (GPV) related and Muscovy duck parvovirus (MDPV) related groups. The amino acid difference between GPV- and MDPV-related groups ranged from 13.1% to 15.8%, and the most variable region resided in the N terminus of VP2. The vaccine strains of GPV and MDPV exhibited only 1.2% and 0.3% difference in amino acid when compared with their parental field strains, and most of these differences resided in residues 497-575 of VP1, suggesting that these residues might be important for the attenuation of GPV and MDPV. When the GPV strains isolated in 1982 (the strain 82-0308) and in 2001 (the strain 01-1001) were compared, only 0.3% difference in amino acid was found, while MDPV strains isolated in 1990 (the strain 90-0219) and 1997 (the strain 97-0104) showed only 0.4% difference in amino acid. The result indicates that the genome of waterfowl parvovirus had remained highly stable in the field.
Impact of germline and somatic missense variations on drug binding sites.

PubMed

Yan, C; Pattabiraman, N; Goecks, J; Lam, P; Nayak, A; Pan, Y; Torcivia-Rodriguez, J; Voskanian, A; Wan, Q; Mazumder, R

2017-03-01

Advancements in next-generation sequencing (NGS) technologies are generating a vast amount of data. This exacerbates the current challenge of translating NGS data into actionable clinical interpretations. We have comprehensively combined germline and somatic nonsynonymous single-nucleotide variations (nsSNVs) that affect drug binding sites in order to investigate their prevalence. The integrated data thus generated in conjunction with exome or whole-genome sequencing can be used to identify patients who may not respond to a specific drug because of alterations in drug binding efficacy due to nsSNVs in the target protein's gene. To identify the nsSNVs that may affect drug binding, protein-drug complex structures were retrieved from Protein Data Bank (PDB) followed by identification of amino acids in the protein-drug binding sites using an occluded surface method. Then, the germline and somatic mutations were mapped to these amino acids to identify which of these alter protein-drug binding sites. Using this method we identified 12 993 amino acid-drug binding sites across 253 unique proteins bound to 235 unique drugs. The integration of amino acid-drug binding sites data with both germline and somatic nsSNVs data sets revealed 3133 nsSNVs affecting amino acid-drug binding sites. In addition, a comprehensive drug target discovery was conducted based on protein structure similarity and conservation of amino acid-drug binding sites. Using this method, 81 paralogs were identified that could serve as alternative drug targets. In addition, non-human mammalian proteins bound to drugs were used to identify 142 homologs in humans that can potentially bind to drugs. In the current protein-drug pairs that contain somatic mutations within their binding site, we identified 85 proteins with significant differential gene expression changes associated with specific cancer types. Information on protein-drug binding predicted drug target proteins and prevalence of both somatic and germline nsSNVs that disrupt these binding sites can provide valuable knowledge for personalized medicine treatment. A web portal is available where nsSNVs from individual patient can be checked by scanning against DrugVar to determine whether any of the SNVs affect the binding of any drug in the database.
Molecular basis for the wide range of affinity found in Csr/Rsm protein-RNA recognition.

PubMed

Duss, Olivier; Michel, Erich; Diarra dit Konté, Nana; Schubert, Mario; Allain, Frédéric H-T

2014-04-01

The carbon storage regulator/regulator of secondary metabolism (Csr/Rsm) type of small non-coding RNAs (sRNAs) is widespread throughout bacteria and acts by sequestering the global translation repressor protein CsrA/RsmE from the ribosome binding site of a subset of mRNAs. Although we have previously described the molecular basis of a high affinity RNA target bound to RsmE, it remains unknown how other lower affinity targets are recognized by the same protein. Here, we have determined the nuclear magnetic resonance solution structures of five separate GGA binding motifs of the sRNA RsmZ of Pseudomonas fluorescens in complex with RsmE. The structures explain how the variation of sequence and structural context of the GGA binding motifs modulate the binding affinity for RsmE by five orders of magnitude (∼10 nM to ∼3 mM, Kd). Furthermore, we see that conformational adaptation of protein side-chains and RNA enable recognition of different RNA sequences by the same protein contributing to binding affinity without conferring specificity. Overall, our findings illustrate how the variability in the Csr/Rsm protein-RNA recognition allows a fine-tuning of the competition between mRNAs and sRNAs for the CsrA/RsmE protein.
ExDom: an integrated database for comparative analysis of the exon–intron structures of protein domains in eukaryotes

PubMed Central

Bhasi, Ashwini; Philip, Philge; Manikandan, Vinu; Senapathy, Periannan

2009-01-01

We have developed ExDom, a unique database for the comparative analysis of the exon–intron structures of 96 680 protein domains from seven eukaryotic organisms (Homo sapiens, Mus musculus, Bos taurus, Rattus norvegicus, Danio rerio, Gallus gallus and Arabidopsis thaliana). ExDom provides integrated access to exon-domain data through a sophisticated web interface which has the following analytical capabilities: (i) intergenomic and intragenomic comparative analysis of exon–intron structure of domains; (ii) color-coded graphical display of the domain architecture of proteins correlated with their corresponding exon-intron structures; (iii) graphical analysis of multiple sequence alignments of amino acid and coding nucleotide sequences of homologous protein domains from seven organisms; (iv) comparative graphical display of exon distributions within the tertiary structures of protein domains; and (v) visualization of exon–intron structures of alternative transcripts of a gene correlated to variations in the domain architecture of corresponding protein isoforms. These novel analytical features are highly suited for detailed investigations on the exon–intron structure of domains and make ExDom a powerful tool for exploring several key questions concerning the function, origin and evolution of genes and proteins. ExDom database is freely accessible at: http://66.170.16.154/ExDom/. PMID:18984624
The complete nucleotide sequence of the domestic dog (Canis familiaris) mitochondrial genome.

PubMed

Kim, K S; Lee, S E; Jeong, H W; Ha, J H

1998-10-01

The complete nucleotide sequence of the mitochondrial genome of the domestic dog, Canis familiaris, was determined. The length of the sequence was 16,728 bp; however, the length was not absolute due to the variation (heteroplasmy) caused by differing numbers of the repetitive motif, 5'-GTACACGT(A/G)C-3', in the control region. The genome organization, gene contents, and codon usage conformed to those of other mammalian mitochondrial genomes. Although its features were unknown, the "CTAGA" duplication event which followed the translational stop codon of the COII gene was not observed in other mammalian mitochondrial genomes. In order to determine the possible differences between mtDNAs in carnivores, two rRNA and 13 protein-coding genes from the cat, dog, and seal were compared. The combined molecular differences, in two rRNA genes as well as in the inferred amino acid sequences of the mitochondrial 13 protein-coding genes, suggested that there is a closer relationship between the dog and the seal than there is between either of these species and the cat. Based on the molecular differences of the mtDNA, the evolutionary divergence between the cat, the dog, and the seal was dated to approximately 50 +/- 4 million years ago. The degree of difference between carnivore mtDNAs varied according to the individual protein-coding gene applied, showing that the evolutionary relationships of distantly related species should be presented in an extended study based on ample sequence data like complete mtDNA molecules. Copyright 1998 Academic Press.
Sequence variation in the M2 ion channel protein influences its intracellular localization during avian influenza replication

USDA-ARS?s Scientific Manuscript database

Avian influenza virus (AIV) is a constant threat to poultry worldwide due to its ability to mutate from a low pathogenic (LP) form into a highly pathogenic (HP) one. It is known that the incorporation of a polybasic cleavage site (PBCS) on the hemagglutinin (HA) gene is a major indicator of pathogen...
What can we learn about lipoprotein metabolism and coronary heart disease from studying rare variants?

PubMed

Jeff, Janina M; Peloso, Gina M; Do, Ron

2016-04-01

Rare variant association studies (RVAS) target the class of genetic variation with frequencies less than 1%. Recently, investigators have used exome sequencing in RVAS to identify rare alleles responsible for Mendelian diseases but have experienced greater difficulty discovering such alleles for complex diseases. In this review, we describe what we have learned about lipoprotein metabolism and coronary heart disease through the conduct of RVAS. Rare protein-altering genetic variation can provide important insights that are not as easily attainable from common variant association studies. First, RVAS can facilitate gene discovery by identifying novel rare protein-altering variants in specific genes that are associated with disease. Second, rare variant associations can provide supportive evidence for putative drug targets for novel therapies. Finally, rare variants can uncover new pathways and reveal new biologic mechanisms. The field of human genetics has already made tremendous progress in understanding lipoprotein metabolism and the causes of coronary heart disease in the context of rare variants. As next generation sequencing becomes more cost-effective, RVAS with larger sample sizes will be conducted. This will lead to more novel rare variant discoveries and the translation of genomic data into biological knowledge and clinical insights for cardiovascular disease.
A genome-wide interactome of DNA-associated proteins in the human liver.

PubMed

Ramaker, Ryne C; Savic, Daniel; Hardigan, Andrew A; Newberry, Kimberly; Cooper, Gregory M; Myers, Richard M; Cooper, Sara J

2017-11-01

Large-scale efforts like the ENCODE Project have made tremendous progress in cataloging the genomic binding patterns of DNA-associated proteins (DAPs), such as transcription factors (TFs). However, most chromatin immunoprecipitation-sequencing (ChIP-seq) analyses have focused on a few immortalized cell lines whose activities and physiology differ in important ways from endogenous cells and tissues. Consequently, binding data from primary human tissue are essential to improving our understanding of in vivo gene regulation. Here, we identify and analyze more than 440,000 binding sites using ChIP-seq data for 20 DAPs in two human liver tissue samples. We integrated binding data with transcriptome and phased WGS data to investigate allelic DAP interactions and the impact of heterozygous sequence variation on the expression of neighboring genes. Our tissue-based data set exhibits binding patterns more consistent with liver biology than cell lines, and we describe uses of these data to better prioritize impactful noncoding variation. Collectively, our rich data set offers novel insights into genome function in human liver tissue and provides a valuable resource for assessing disease-related disruptions. © 2017 Ramaker et al.; Published by Cold Spring Harbor Laboratory Press.
Comparative genomics of citric-acid-producing Aspergillus niger ATCC 1015 versus enzyme-producing CBS 513.88

PubMed Central

Andersen, Mikael R.; Salazar, Margarita P.; Schaap, Peter J.; van de Vondervoort, Peter J.I.; Culley, David; Thykaer, Jette; Frisvad, Jens C.; Nielsen, Kristian F.; Albang, Richard; Albermann, Kaj; Berka, Randy M.; Braus, Gerhard H.; Braus-Stromeyer, Susanna A.; Corrochano, Luis M.; Dai, Ziyu; van Dijck, Piet W.M.; Hofmann, Gerald; Lasure, Linda L.; Magnuson, Jon K.; Menke, Hildegard; Meijer, Martin; Meijer, Susan L.; Nielsen, Jakob B.; Nielsen, Michael L.; van Ooyen, Albert J.J.; Pel, Herman J.; Poulsen, Lars; Samson, Rob A.; Stam, Hein; Tsang, Adrian; van den Brink, Johannes M.; Atkins, Alex; Aerts, Andrea; Shapiro, Harris; Pangilinan, Jasmyn; Salamov, Asaf; Lou, Yigong; Lindquist, Erika; Lucas, Susan; Grimwood, Jane; Grigoriev, Igor V.; Kubicek, Christian P.; Martinez, Diego; van Peij, Noël N.M.E.; Roubos, Johannes A.; Nielsen, Jens; Baker, Scott E.

2011-01-01

The filamentous fungus Aspergillus niger exhibits great diversity in its phenotype. It is found globally, both as marine and terrestrial strains, produces both organic acids and hydrolytic enzymes in high amounts, and some isolates exhibit pathogenicity. Although the genome of an industrial enzyme-producing A. niger strain (CBS 513.88) has already been sequenced, the versatility and diversity of this species compel additional exploration. We therefore undertook whole-genome sequencing of the acidogenic A. niger wild-type strain (ATCC 1015) and produced a genome sequence of very high quality. Only 15 gaps are present in the sequence, and half the telomeric regions have been elucidated. Moreover, sequence information from ATCC 1015 was used to improve the genome sequence of CBS 513.88. Chromosome-level comparisons uncovered several genome rearrangements, deletions, a clear case of strain-specific horizontal gene transfer, and identification of 0.8 Mb of novel sequence. Single nucleotide polymorphisms per kilobase (SNPs/kb) between the two strains were found to be exceptionally high (average: 7.8, maximum: 160 SNPs/kb). High variation within the species was confirmed with exo-metabolite profiling and phylogenetics. Detailed lists of alleles were generated, and genotypic differences were observed to accumulate in metabolic pathways essential to acid production and protein synthesis. A transcriptome analysis supported up-regulation of genes associated with biosynthesis of amino acids that are abundant in glucoamylase A, tRNA-synthases, and protein transporters in the protein producing CBS 513.88 strain. Our results and data sets from this integrative systems biology analysis resulted in a snapshot of fungal evolution and will support further optimization of cell factories based on filamentous fungi. PMID:21543515
Comparative Sequence Analysis of the Plasmid-Encoded Regulator of Enteropathogenic Escherichia coli Strains

PubMed Central

Okeke, Iruka N.; Borneman, Jade A.; Shin, Sooan; Mellies, Jay L.; Quinn, Laura E.; Kaper, James B.

2001-01-01

Enteropathogenic Escherichia coli (EPEC) strains that carry the EPEC adherence factor (EAF) plasmid were screened for the presence of different EAF sequences, including those of the plasmid-encoded regulator (per). Considerable variation in gene content of EAF plasmids from different strains was seen. However, bfpA, the gene encoding the structural subunit for the bundle-forming pilus, bundlin, and per genes were found in 96.8% of strains. Sequence analysis of the per operon and its promoter region from 15 representative strains revealed that it is highly conserved. Most of the variation occurs in the 5′ two-thirds of the perA gene. In contrast, the C-terminal portion of the predicted PerA protein that contains the DNA-binding helix-turn-helix motif is 100% conserved in all strains that possess a full-length gene. In a minority of strains including the O119:H2 and canine isolates and in a subset of O128:H2 and O142:H6 strains, frameshift mutations in perA leading to premature truncation and consequent inactivation of the gene were identified. Cloned perA, -B, and -C genes from these strains, unlike those from strains with a functional operon, failed to activate the LEE1 operon and bfpA transcriptional fusions or to complement a per mutant in reference strain E2348/69. Furthermore, O119, O128, and canine strains that carry inactive per operons were deficient in virulence protein expression. The context in which the perABC operon occurs on the EAF plasmid varies. The sequence upstream of the per promoter region in EPEC reference strains E2348/69 and B171-8 was present in strains belonging to most serogroups. In a subset of O119:H2, O128:H2, and O142:H6 strains and in the canine isolate, this sequence was replaced by an IS1294-homologous sequence. PMID:11500429
The evolutionary capacitor HSP90 buffers the regulatory effects of mammalian endogenous retroviruses.

PubMed

Hummel, Barbara; Hansen, Erik C; Yoveva, Aneliya; Aprile-Garcia, Fernando; Hussong, Rebecca; Sawarkar, Ritwick

2017-03-01

Understanding how genotypes are linked to phenotypes is important in biomedical and evolutionary studies. The chaperone heat-shock protein 90 (HSP90) buffers genetic variation by stabilizing proteins with variant sequences, thereby uncoupling phenotypes from genotypes. Here we report an unexpected role of HSP90 in buffering cis-regulatory variation affecting gene expression. By using the tripartite-motif-containing 28 (TRIM28; also known as KAP1)-mediated epigenetic pathway, HSP90 represses the regulatory influence of endogenous retroviruses (ERVs) on neighboring genes that are critical for mouse development. Our data based on natural variations in the mouse genome show that genes respond to HSP90 inhibition in a manner dependent on their genomic location with regard to strain-specific ERV-insertion sites. The evolutionary-capacitor function of HSP90 may thus have facilitated the exaptation of ERVs as key modifiers of gene expression and morphological diversification. Our findings add a new regulatory layer through which HSP90 uncouples phenotypic outcomes from individual genotypes.
Non-B-Form DNA Is Enriched at Centromeres

PubMed Central

Henikoff, Steven

2018-01-01

Abstract Animal and plant centromeres are embedded in repetitive “satellite” DNA, but are thought to be epigenetically specified. To define genetic characteristics of centromeres, we surveyed satellite DNA from diverse eukaryotes and identified variation in <10-bp dyad symmetries predicted to adopt non-B-form conformations. Organisms lacking centromeric dyad symmetries had binding sites for sequence-specific DNA-binding proteins with DNA-bending activity. For example, human and mouse centromeres are depleted for dyad symmetries, but are enriched for non-B-form DNA and are associated with binding sites for the conserved DNA-binding protein CENP-B, which is required for artificial centromere function but is paradoxically nonessential. We also detected dyad symmetries and predicted non-B-form DNA structures at neocentromeres, which form at ectopic loci. We propose that centromeres form at non-B-form DNA because of dyad symmetries or are strengthened by sequence-specific DNA binding proteins. This may resolve the CENP-B paradox and provide a general basis for centromere specification. PMID:29365169
A Change in SHATTERPROOF Protein Lies at the Origin of a Fruit Morphological Novelty and a New Strategy for Seed Dispersal in Medicago Genus1[C][W

PubMed Central

Fourquin, Chloé; del Cerro, Carolina; Victoria, Filipe C.; Vialette-Guiraud, Aurélie; de Oliveira, Antonio C.; Ferrándiz, Cristina

2013-01-01

Angiosperms are the most diverse and numerous group of plants, and it is generally accepted that this evolutionary success owes in part to the diversity found in fruits, key for protecting the developing seeds and ensuring seed dispersal. Although studies on the molecular basis of morphological innovations are few, they all illustrate the central role played by transcription factors acting as developmental regulators. Here, we show that a small change in the protein sequence of a MADS-box transcription factor correlates with the origin of a highly modified fruit morphology and the change in seed dispersal strategies that occurred in Medicago, a genus belonging to the large legume family. This protein sequence modification alters the functional properties of the protein, affecting the affinities for other protein partners involved in high-order complexes. Our work illustrates that variation in coding regions can generate evolutionary novelties not based on gene duplication/subfunctionalization but by interactions in complex networks, contributing also to the current debate on the relative importance of changes in regulatory or coding regions of master regulators in generating morphological novelties. PMID:23640757
Differential principal component analysis of ChIP-seq.

PubMed

Ji, Hongkai; Li, Xia; Wang, Qian-fei; Ning, Yang

2013-04-23

We propose differential principal component analysis (dPCA) for analyzing multiple ChIP-sequencing datasets to identify differential protein-DNA interactions between two biological conditions. dPCA integrates unsupervised pattern discovery, dimension reduction, and statistical inference into a single framework. It uses a small number of principal components to summarize concisely the major multiprotein synergistic differential patterns between the two conditions. For each pattern, it detects and prioritizes differential genomic loci by comparing the between-condition differences with the within-condition variation among replicate samples. dPCA provides a unique tool for efficiently analyzing large amounts of ChIP-sequencing data to study dynamic changes of gene regulation across different biological conditions. We demonstrate this approach through analyses of differential chromatin patterns at transcription factor binding sites and promoters as well as allele-specific protein-DNA interactions.
Two-fold Bioorthogonal Derivatization by Different Formylglycine-Generating Enzymes.

PubMed

Krüger, Tobias; Weiland, Stefanie; Falck, Georg; Gerlach, Marcus; Boschanski, Mareile; Alam, Sarfaraz; Müller, Kristian M; Dierks, Thomas; Sewald, Norbert

2018-03-26

Formylglycine-generating enzymes are of increasing interest in the field of bioconjugation chemistry. They catalyze the site-specific oxidation of a cysteine residue to the aldehyde-containing amino acid C α -formylglycine (FGly). This non-canonical residue can be generated within any desired target protein and can subsequently be used for bioorthogonal conjugation reactions. The prototypic formylglycine-generating enzyme (FGE) and the iron-sulfur protein AtsB display slight variations in their recognition sequences. We designed specific tags in peptides and proteins that were selectively converted by the different enzymes. Combination of the different tag motifs within a single peptide or recombinant protein enabled the independent and consecutive introduction of two formylglycine residues and the generation of heterobifunctionalized protein conjugates. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Slowing Translation between Protein Domains by Increasing Affinity between mRNAs and the Ribosomal Anti-Shine-Dalgarno Sequence Improves Solubility.

PubMed

Vasquez, Kevin A; Hatridge, Taylor A; Curtis, Nicholas C; Contreras, Lydia M

2016-02-19

Recent studies have demonstrated that effective protein production requires coordination of multiple cotranslational cellular processes, which are heavily affected by translation timing. Until recently, protein engineering has focused on codon optimization to maximize protein production rates, mostly considering the effect of tRNA abundance. However, as it relates to complex multidomain proteins, it has been hypothesized that strategic translational pauses between domains and between distinct individual structural motifs can prevent interactions between nascent chain fragments that generate kinetically trapped misfolded peptides and thereby enhance protein yields. In this study, we introduce synthetic transient pauses between structural domains in a heterologous model protein based on designed patterns of affinity between the mRNA and the anti-Shine-Dalgarno (aSD) sequence on the ribosome. We demonstrate that optimizing translation attenuation at domain boundaries can predictably affect solubility patterns in bacteria. Exploration of the affinity space showed that modifying less than 1% of the nucleotides (on a small 12 amino acid linker) can vary soluble protein yields up to ∼7-fold without altering the primary sequence of the protein. In the context of longer linkers, where a larger number of distinct structural motifs can fold outside the ribosome, optimal synonymous codon variations resulted in an additional 2.1-fold increase in solubility, relative to that of nonoptimized linkers of the same length. While rational construction of 54 linkers of various affinities showed a significant correlation between protein solubility and predicted affinity, only weaker correlations were observed between tRNA abundance and protein solubility. We also demonstrate that naturally occurring high-affinity clusters are present between structural domains of β-galactosidase, one of Escherichia coli's largest native proteins. Interdomain ribosomal affinity is an important factor that has not previously been explored in the context of protein engineering.

A complete Neandertal mitochondrial genome sequence determined by high-throughput sequencing

PubMed Central

Green, Richard E.; Malaspinas, Anna-Sapfo; Krause, Johannes; Briggs, Adrian W.; Johnson, Philip L. F.; Uhler, Caroline; Meyer, Matthias; Good, Jeffrey M.; Maricic, Tomislav; Stenzel, Udo; Prüfer, Kay; Siebauer, Michael; Burbano, Hernán A.; Ronan, Michael; Rothberg, Jonathan M.; Egholm, Michael; Rudan, Pavao; Brajković, Dejana; Kućan, Željko; Gušić, Ivan; Wikström, Mårten; Laakkonen, Liisa; Kelso, Janet; Slatkin, Montgomery; Pääbo, Svante

2008-01-01

Summary A complete mitochondrial (mt) genome sequence was reconstructed from a 38,000-year-old Neandertal individual using 8,341 mtDNA sequences identified among 4.8 Gb of DNA generated from ~0.3 grams of bone. Analysis of the assembled sequence unequivocally establishes that the Neandertal mtDNA falls outside the variation of extant human mtDNAs and allows an estimate of the divergence date between the two mtDNA lineages of 660,000±140,000 years. Of the 13 proteins encoded in the mtDNA, subunit 2 of cytochrome c oxidase of the mitochondrial electron transport chain has experienced the largest number of amino acid substitutions in human ancestors since the separation from Neandertals. There is evidence that purifying selection in the Neandertal mtDNA was reduced compared to other primate lineages suggesting that the effective population size of Neandertals was small. PMID:18692465
Proteome Studies of Filamentous Fungi

DOE Office of Scientific and Technical Information (OSTI.GOV)

Baker, Scott E.; Panisko, Ellen A.

2011-04-20

The continued fast pace of fungal genome sequence generation has enabled proteomic analysis of a wide breadth of organisms that span the breadth of the Kingdom Fungi. There is some phylogenetic bias to the current catalog of fungi with reasonable DNA sequence databases (genomic or EST) that could be analyzed at a global proteomic level. However, the rapid development of next generation sequencing platforms has lowered the cost of genome sequencing such that in the near future, having a genome sequence will no longer be a time or cost bottleneck for downstream proteomic (and transcriptomic) analyses. High throughput, non-gel basedmore » proteomics offers a snapshot of proteins present in a given sample at a single point in time. There are a number of different variations on the general method and technologies for identifying peptides in a given sample. We present a method that can serve as a “baseline” for proteomic studies of fungi.« less
Proteome studies of filamentous fungi.

PubMed

Baker, Scott E; Panisko, Ellen A

2011-01-01

The continued fast pace of fungal genome sequence generation has enabled proteomic analysis of a wide variety of organisms that span the breadth of the Kingdom Fungi. There is some phylogenetic bias to the current catalog of fungi with reasonable DNA sequence databases (genomic or EST) that could be analyzed at a global proteomic level. However, the rapid development of next generation sequencing platforms has lowered the cost of genome sequencing such that in the near future, having a genome sequence will no longer be a time or cost bottleneck for downstream proteomic (and transcriptomic) analyses. High throughput, nongel-based proteomics offers a snapshot of proteins present in a given sample at a single point in time. There are a number of variations on the general methods and technologies for identifying peptides in a given sample. We present a method that can serve as a "baseline" for proteomic studies of fungi.
Combined Proteomic and Transcriptomic Interrogation of the Venom Gland of Conus geographus Uncovers Novel Components and Functional Compartmentalization*

PubMed Central

Safavi-Hemami, Helena; Hu, Hao; Gorasia, Dhana G.; Bandyopadhyay, Pradip K.; Veith, Paul D.; Young, Neil D.; Reynolds, Eric C.; Yandell, Mark; Olivera, Baldomero M.; Purcell, Anthony W.

2014-01-01

Cone snails are highly successful marine predators that use complex venoms to capture prey. At any given time, hundreds of toxins (conotoxins) are synthesized in the secretory epithelial cells of the venom gland, a long and convoluted organ that can measure 4 times the length of the snail's body. In recent years a number of studies have begun to unveil the transcriptomic, proteomic and peptidomic complexity of the venom and venom glands of a number of cone snail species. By using a combination of DIGE, bottom-up proteomics and next-generation transcriptome sequencing the present study identifies proteins involved in envenomation and conotoxin maturation, significantly extending the repertoire of known (poly)peptides expressed in the venom gland of these remarkable animals. We interrogate the molecular and proteomic composition of different sections of the venom glands of 3 specimens of the fish hunter Conus geographus and demonstrate regional variations in gene expression and protein abundance. DIGE analysis identified 1204 gel spots of which 157 showed significant regional differences in abundance as determined by biological variation analysis. Proteomic interrogation identified 342 unique proteins including those that exhibited greatest fold change. The majority of these proteins also exhibited significant changes in their mRNA expression levels validating the reliability of the experimental approach. Transcriptome sequencing further revealed a yet unknown genetic diversity of several venom gland components. Interestingly, abundant proteins that potentially form part of the injected venom mixture, such as echotoxins, phospholipase A2 and con-ikots-ikots, classified into distinct expression clusters with expression peaking in different parts of the gland. Our findings significantly enhance the known repertoire of venom gland polypeptides and provide molecular and biochemical evidence for the compartmentalization of this organ into distinct functional entities. PMID:24478445
Human milk peptides differentiate between the preterm and term infant and across varying lactational stages.

PubMed

Dingess, Kelly A; de Waard, Marita; Boeren, Sjef; Vervoort, Jacques; Lambers, Tim T; van Goudoever, Johannes B; Hettinga, Kasper

2017-10-18

Variations in endogenous peptide profiles, functionality, and the enzymes responsible for the formation of these peptides in human milk are understudied. Additionally, there is a lack of knowledge regarding peptides in donor human milk, which is used to feed preterm infants when mother's own milk is not (sufficiently) available. To assess this, 29 human milk samples from the Dutch Human Milk Bank were analyzed as three groups, preterm late lactation stage (LS) (n = 12), term early (n = 8) and term late LS (n = 9). Gestational age (GA) groups were defined as preterm (24-36 weeks) and term (≥37 weeks). LS was determined as days postpartum as early (16-36 days) or late (55-88 days). Peptides, analyzed by LC-MS/MS, and parent proteins (proteins from matched peptide sequences) were identified and quantified, after which peptide functionality and the enzymes responsible for protein cleavage were determined. A total of 16 different parent proteins were identified from human milk, with no differences by GA or LS. We identified 1104 endogenous peptides, of which, the majority were from the parent proteins β-casein, polymeric immunoglobulin receptor, α s1 -casein, osteopontin, and κ-casein. The absolute number of peptides differed by GA and LS with 30 and 41 differing sequences respectively (p < 0.05) Odds likelihood tests determined that 32 peptides had a predicted bioactive functionality, with no significant differences between groups. Enzyme prediction analysis showed that plasmin/trypsin enzymes most likely cleaved the identified human milk peptides. These results explain some of the variation in endogenous peptides in human milk, leading to future targets that may be studied for functionality.
The genetic basis of adaptive pigmentation variation in Drosophila melanogaster.

PubMed

Pool, John E; Aquadro, Charles F

2007-07-01

In a broad survey of Drosophila melanogaster population samples, levels of abdominal pigmentation were found to be highly variable and geographically differentiated. A strong positive correlation was found between dark pigmentation and high altitude, suggesting adaptation to specific environments. DNA sequence polymorphism at the candidate gene ebony revealed a clear association with the pigmentation of homozygous third chromosome lines. The darkest lines sequenced had nearly identical haplotypes spanning 14.5 kb upstream of the protein-coding exons of ebony. Thus, natural selection may have elevated the frequency of an allele that confers dark abdominal pigmentation by influencing the regulation of ebony.
Comparative analysis of vaginal microbiota sampling using 16S rRNA gene analysis.

PubMed

Virtanen, Seppo; Kalliala, Ilkka; Nieminen, Pekka; Salonen, Anne

2017-01-01

Molecular methods such as next-generation sequencing are actively being employed to characterize the vaginal microbiota in health and disease. Previous studies have focused on characterizing the biological variation in the microbiota, and less is known about how factors related to sampling contribute to the results. Our aim was to investigate the impact of a sampling device and anatomical sampling site on the quantitative and qualitative outcomes relevant for vaginal microbiota research. We sampled 10 Finnish women representing diverse clinical characteristics with flocked swabs, the Evalyn® self-sampling device, sterile plastic spatulas and a cervical brush that were used to collect samples from fornix, vaginal wall and cervix. Samples were compared on DNA and protein yield, bacterial load, and microbiota diversity and species composition based on Illumina MiSeq sequencing of the 16S rRNA gene. We quantified the relative contributions of sampling variables versus intrinsic variables in the overall microbiota variation, and evaluated the microbiota profiles using several commonly employed metrics such as alpha and beta diversity as well as abundance of major bacterial genera and species. The total DNA yield was strongly dependent on the sampling device and to a lesser extent on the anatomical site of sampling. The sampling strategy did not affect the protein yield or the bacterial load. All tested sampling methods produced highly comparable microbiota profiles based on MiSeq sequencing. The sampling method explained only 2% (p-value = 0.89) of the overall microbiota variation, markedly surpassed by intrinsic factors such as clinical status (microscopy for bacterial vaginosis 53%, p = 0.0001), bleeding (19%, p = 0.0001), and the variation between subjects (11%, p-value 0.0001). The results indicate that different sampling strategies yield comparable vaginal microbiota composition and diversity. Hence, past and future vaginal microbiota studies employing different sampling strategies should be comparable in the absence of other technical confounders. The Evalyn® self-sampling device performed equally well compared to samples taken by a clinician, and hence offers a good-quality microbiota sample without the need for a gynecological examination. The amount of collected sample as well as the DNA and protein yield varied across the sampling techniques, which may have practical implications for study design.
Comparative analysis of vaginal microbiota sampling using 16S rRNA gene analysis

PubMed Central

Kalliala, Ilkka; Nieminen, Pekka; Salonen, Anne

2017-01-01

Background Molecular methods such as next-generation sequencing are actively being employed to characterize the vaginal microbiota in health and disease. Previous studies have focused on characterizing the biological variation in the microbiota, and less is known about how factors related to sampling contribute to the results. Our aim was to investigate the impact of a sampling device and anatomical sampling site on the quantitative and qualitative outcomes relevant for vaginal microbiota research. We sampled 10 Finnish women representing diverse clinical characteristics with flocked swabs, the Evalyn® self-sampling device, sterile plastic spatulas and a cervical brush that were used to collect samples from fornix, vaginal wall and cervix. Samples were compared on DNA and protein yield, bacterial load, and microbiota diversity and species composition based on Illumina MiSeq sequencing of the 16S rRNA gene. We quantified the relative contributions of sampling variables versus intrinsic variables in the overall microbiota variation, and evaluated the microbiota profiles using several commonly employed metrics such as alpha and beta diversity as well as abundance of major bacterial genera and species. Results The total DNA yield was strongly dependent on the sampling device and to a lesser extent on the anatomical site of sampling. The sampling strategy did not affect the protein yield or the bacterial load. All tested sampling methods produced highly comparable microbiota profiles based on MiSeq sequencing. The sampling method explained only 2% (p-value = 0.89) of the overall microbiota variation, markedly surpassed by intrinsic factors such as clinical status (microscopy for bacterial vaginosis 53%, p = 0.0001), bleeding (19%, p = 0.0001), and the variation between subjects (11%, p-value 0.0001). Conclusions The results indicate that different sampling strategies yield comparable vaginal microbiota composition and diversity. Hence, past and future vaginal microbiota studies employing different sampling strategies should be comparable in the absence of other technical confounders. The Evalyn® self-sampling device performed equally well compared to samples taken by a clinician, and hence offers a good-quality microbiota sample without the need for a gynecological examination. The amount of collected sample as well as the DNA and protein yield varied across the sampling techniques, which may have practical implications for study design. PMID:28723942
Balancing Selection on a Regulatory Region Exhibiting Ancient Variation That Predates Human–Neandertal Divergence

PubMed Central

Iskow, Rebecca C.; Austermann, Christian; Scharer, Christopher D.; Raj, Towfique; Boss, Jeremy M.; Sunyaev, Shamil; Price, Alkes; Stranger, Barbara; Simon, Viviana; Lee, Charles

2013-01-01

Ancient population structure shaping contemporary genetic variation has been recently appreciated and has important implications regarding our understanding of the structure of modern human genomes. We identified a ∼36-kb DNA segment in the human genome that displays an ancient substructure. The variation at this locus exists primarily as two highly divergent haplogroups. One of these haplogroups (the NE1 haplogroup) aligns with the Neandertal haplotype and contains a 4.6-kb deletion polymorphism in perfect linkage disequilibrium with 12 single nucleotide polymorphisms (SNPs) across diverse populations. The other haplogroup, which does not contain the 4.6-kb deletion, aligns with the chimpanzee haplotype and is likely ancestral. Africans have higher overall pairwise differences with the Neandertal haplotype than Eurasians do for this NE1 locus (p<10−15). Moreover, the nucleotide diversity at this locus is higher in Eurasians than in Africans. These results mimic signatures of recent Neandertal admixture contributing to this locus. However, an in-depth assessment of the variation in this region across multiple populations reveals that African NE1 haplotypes, albeit rare, harbor more sequence variation than NE1 haplotypes found in Europeans, indicating an ancient African origin of this haplogroup and refuting recent Neandertal admixture. Population genetic analyses of the SNPs within each of these haplogroups, along with genome-wide comparisons revealed significant FST (p = 0.00003) and positive Tajima's D (p = 0.00285) statistics, pointing to non-neutral evolution of this locus. The NE1 locus harbors no protein-coding genes, but contains transcribed sequences as well as sequences with putative regulatory function based on bioinformatic predictions and in vitro experiments. We postulate that the variation observed at this locus predates Human–Neandertal divergence and is evolving under balancing selection, especially among European populations. PMID:23593015
Allelic variations of α-gliadin genes from species of Aegilops section Sitopsis and insights into evolution of α-gliadin multigene family among Triticum and Aegilops.

PubMed

Huang, Zhuo; Long, Hai; Wei, Yu-Ming; Yan, Ze-Hong; Zheng, You-Liang

2016-04-01

The α-gliadins account for 15-30 % of the total storage protein in wheat endosperm and play important roles in the dough extensibility and nutritional quality. On the other side, they act as a main source of toxic peptides triggering celiac disease. In this study, 37 α-gliadins were isolated from three species of Aegilops section Sitopsis. Sequence similarity and phylogenetic analyses revealed novel allelic variation at Gli-2 loci of species of Sitopsis and regular organization of motifs in their repetitive domain. Based on the comprehensive analyses of a large number of known sequences of bread wheat and its diploid genome progenitors, the distributions of four T cell epitopes and length variations of two polyglutamine domains are analyzed. Additionally, according to the organization of repeat motifs, we classified the α-gliadins of Triticum and Aegilops into eight types. Their most recent common ancestor and putative divergence patterns were further considered. This study provides new insights into the allelic variations of α-gliadins in Aegilops section Sitopsis, as well as evolution of α-gliadin multigene family among Triticum and Aegilops species.
A novel dominant GJB2 (DFNA3) mutation in a Chinese family

NASA Astrophysics Data System (ADS)

Wang, Hongyang; Wu, Kaiwen; Yu, Lan; Xie, Linyi; Xiong, Wenping; Wang, Dayong; Guan, Jing; Wang, Qiuju

2017-01-01

To decipher the phenotype and genotype of a Chinese family with autosomal dominant non-syndromic hearing loss (ADNSHL) and a novel dominant missense mutation in the GJB2 gene (DFNA3), mutation screening of GJB2 was performed on the propositus from a five-generation ADNSHL family through polymerase chain reaction amplification and Sanger sequencing. The candidate variation and the co-segregation of the phenotype were verified in all ascertained family members. Targeted genes capture and next-generation sequencing (NGS) were performed to explore additional genetic variations. We identified the novel GJB2 mutation c.524C > A (p.P175H), which segregated with high frequency and was involved in progressive sensorineural hearing loss. One subject with an additional c.235delC mutation showed a more severe phenotype than did the other members with single GJB2 dominant variations. Four patients diagnosed with noise-induced hearing loss did not carry this mutation. No other pathogenic variations or modifier genes were identified by NGS. In conclusion, a novel missense mutation in GJB2 (DFNA3), affecting the second extracellular domain of the protein, was identified in a family with ADNSHL.
Identification of New Single Nucleotide Polymorphism-Based Markers for Inter- and Intraspecies Discrimination of Obligate Bacterial Parasites (Pasteuria spp.) of Invertebrates ▿ †

PubMed Central

Mauchline, Tim H.; Knox, Rachel; Mohan, Sharad; Powers, Stephen J.; Kerry, Brian R.; Davies, Keith G.; Hirsch, Penny R.

2011-01-01

Protein-encoding and 16S rRNA genes of Pasteuria penetrans populations from a wide range of geographic locations were examined. Most interpopulation single nucleotide polymorphisms (SNPs) were detected in the 16S rRNA gene. However, in order to fully resolve all populations, these were supplemented with SNPs from protein-encoding genes in a multilocus SNP typing approach. Examination of individual 16S rRNA gene sequences revealed the occurrence of “cryptic” SNPs which were not present in the consensus sequences of any P. penetrans population. Additionally, hierarchical cluster analysis separated P. penetrans 16S rRNA gene clones into four groups, and one of which contained sequences from the most highly passaged population, demonstrating that it is possible to manipulate the population structure of this fastidious bacterium. The other groups were made from representatives of the other populations in various proportions. Comparison of sequences among three Pasteuria species, namely, P. penetrans, P. hartismeri, and P. ramosa, showed that the protein-encoding genes provided greater discrimination than the 16S rRNA gene. From these findings, we have developed a toolbox for the discrimination of Pasteuria at both the inter- and intraspecies levels. We also provide a model to monitor genetic variation in other obligate hyperparasites and difficult-to-culture microorganisms. PMID:21803895
Identification of new single nucleotide polymorphism-based markers for inter- and intraspecies discrimination of obligate bacterial parasites (Pasteuria spp.) of invertebrates.

PubMed

Mauchline, Tim H; Knox, Rachel; Mohan, Sharad; Powers, Stephen J; Kerry, Brian R; Davies, Keith G; Hirsch, Penny R

2011-09-01

Protein-encoding and 16S rRNA genes of Pasteuria penetrans populations from a wide range of geographic locations were examined. Most interpopulation single nucleotide polymorphisms (SNPs) were detected in the 16S rRNA gene. However, in order to fully resolve all populations, these were supplemented with SNPs from protein-encoding genes in a multilocus SNP typing approach. Examination of individual 16S rRNA gene sequences revealed the occurrence of "cryptic" SNPs which were not present in the consensus sequences of any P. penetrans population. Additionally, hierarchical cluster analysis separated P. penetrans 16S rRNA gene clones into four groups, and one of which contained sequences from the most highly passaged population, demonstrating that it is possible to manipulate the population structure of this fastidious bacterium. The other groups were made from representatives of the other populations in various proportions. Comparison of sequences among three Pasteuria species, namely, P. penetrans, P. hartismeri, and P. ramosa, showed that the protein-encoding genes provided greater discrimination than the 16S rRNA gene. From these findings, we have developed a toolbox for the discrimination of Pasteuria at both the inter- and intraspecies levels. We also provide a model to monitor genetic variation in other obligate hyperparasites and difficult-to-culture microorganisms.
Plant Genome Resources at the National Center for Biotechnology Information

PubMed Central

Wheeler, David L.; Smith-White, Brian; Chetvernin, Vyacheslav; Resenchuk, Sergei; Dombrowski, Susan M.; Pechous, Steven W.; Tatusova, Tatiana; Ostell, James

2005-01-01

The National Center for Biotechnology Information (NCBI) integrates data from more than 20 biological databases through a flexible search and retrieval system called Entrez. A core Entrez database, Entrez Nucleotide, includes GenBank and is tightly linked to the NCBI Taxonomy database, the Entrez Protein database, and the scientific literature in PubMed. A suite of more specialized databases for genomes, genes, gene families, gene expression, gene variation, and protein domains dovetails with the core databases to make Entrez a powerful system for genomic research. Linked to the full range of Entrez databases is the NCBI Map Viewer, which displays aligned genetic, physical, and sequence maps for eukaryotic genomes including those of many plants. A specialized plant query page allow maps from all plant genomes covered by the Map Viewer to be searched in tandem to produce a display of aligned maps from several species. PlantBLAST searches against the sequences shown in the Map Viewer allow BLAST alignments to be viewed within a genomic context. In addition, precomputed sequence similarities, such as those for proteins offered by BLAST Link, enable fluid navigation from unannotated to annotated sequences, quickening the pace of discovery. NCBI Web pages for plants, such as Plant Genome Central, complete the system by providing centralized access to NCBI's genomic resources as well as links to organism-specific Web pages beyond NCBI. PMID:16010002
DOE Office of Scientific and Technical Information (OSTI.GOV)

Ristow, Sandra S.; Arnzen, Jeanene M.; Leong, JoAnn Ching

Seventeen strains of infectious hematopoietic necrosis virus (IHNV) from different geographical regions and from different fish stocks were typed by polyacrylamide gel electrophoresis, indirect fluorescence with 27 monoclonal antibodies against both the G and N proteins of the virus, and by serum neutralization with six monoclonal anti-glycoprotein antibodies. In addition, many other IHNV isolates have been examined. Studying the isolates with the antibodies has shown that a greater amount of variation exists between isolates than was first predicted by the application of the polyacrylamide technique. Isolates within electrophoretic types I-V may be further classified according to their reactions with themore » monoclonal antibodies in indirect fluorescence. Serum neutralization with selected anti-glycoprotein antibodies in conjunction with fluorescence analysis confirms one of the original findings of Hsu et al. (1986) that two different species in a single facility can be infected with the same isolate. Variation among isolates as measured by reactivity with the monoclonal library appears to be greater within the G protein than within the N protein sequence. 9 refs., 7 figs., 6 tabs.« less
Ebolavirus is evolving but not changing: No evidence for functional change in EBOV from 1976 to the 2014 outbreak.

PubMed

Olabode, Abayomi S; Jiang, Xiaowei; Robertson, David L; Lovell, Simon C

2015-08-01

The 2014 epidemic of Ebola virus disease (EVD) has had a devastating impact in West Africa. Sequencing of ebolavirus (EBOV) from infected individuals has revealed extensive genetic variation, leading to speculation that the virus may be adapting to humans, accounting for the scale of the 2014 outbreak. We computationally analyze the variation associated with all EVD outbreaks, and find none of the amino acid replacements lead to identifiable functional changes. These changes have minimal effect on protein structure, being neither stabilizing nor destabilizing, are not found in regions of the proteins associated with known functions and tend to cluster in poorly constrained regions of proteins, specifically intrinsically disordered regions. We find no evidence that the difference between the current and previous outbreaks is due to evolutionary changes associated with transmission to humans. Instead, epidemiological factors are likely to be responsible for the unprecedented spread of EVD. Crown Copyright © 2015. Published by Elsevier Inc. All rights reserved.
Global genetic diversity of the Plasmodium vivax transmission-blocking vaccine candidate Pvs48/45.

PubMed

Vallejo, Andres F; Martinez, Nora L; Tobon, Alejandra; Alger, Jackeline; Lacerda, Marcus V; Kajava, Andrey V; Arévalo-Herrera, Myriam; Herrera, Sócrates

2016-04-12

Plasmodium vivax 48/45 protein is expressed on the surface of gametocytes/gametes and plays a key role in gamete fusion during fertilization. This protein was recently expressed in Escherichia coli host as a recombinant product that was highly immunogenic in mice and monkeys and induced antibodies with high transmission-blocking activity, suggesting its potential as a P. vivax transmission-blocking vaccine candidate. To determine sequence polymorphism of natural parasite isolates and its potential influence on the protein structure, all pvs48/45 sequences reported in databases from around the world as well as those from low-transmission settings of Latin America were compared. Plasmodium vivax parasite isolates from malaria-endemic regions of Colombia, Brazil and Honduras (n = 60) were used to sequence the Pvs48/45 gene, and compared to those previously reported to GenBank and PlasmoDB (n = 222). Pvs48/45 gene haplotypes were analysed to determine the functional significance of genetic variation in protein structure and vaccine potential. Nine non-synonymous substitutions (E35K, Y196H, H211N, K250N, D335Y, E353Q, A376T, K390T, K418R) and three synonymous substitutions (I73, T149, C156) that define seven different haplotypes were found among the 282 isolates from nine countries when compared with the Sal I reference sequence. Nucleotide diversity (π) was 0.00173 for worldwide samples (range 0.00033-0.00216), resulting in relatively high diversity in Myanmar and Colombia, and low diversity in Mexico, Peru and South Korea. The two most frequent substitutions (E353Q: 41.9 %, K250N: 39.5 %) were predicted to be located in antigenic regions without affecting putative B cell epitopes or the tertiary protein structure. There is limited sequence polymorphism in pvs48/45 with noted geographical clustering among Asian and American isolates. The low genetic diversity of the protein does not influence the predicted antigenicity or protein structure and, therefore, supports its further development as transmission-blocking vaccine candidate.
Mutation screening of AURKB and SYCP3 in patients with reproductive problems.

PubMed

López-Carrasco, A; Oltra, S; Monfort, S; Mayo, S; Roselló, M; Martínez, F; Orellana, C

2013-02-01

Mutations in the spindle checkpoint genes can cause improper chromosome segregations and aneuploidies, which in turn may lead to reproductive problems. Two of the proteins involved in this checkpoint are Aurora kinase B (AURKB), preventing the anaphase whenever microtubule-kinetochore attachments are not the proper ones during metaphase; and synaptonemal complex protein 3 (SYCP3), which is essential for the formation of the complex and for the recombination of the homologous chromosomes. This study has attempted to clarify the possible involvement of both proteins in the reproductive problems of patients with chromosomal instability. In order to do this, we have performed a screening for genetic variants in AURKB and SYCP3 among these patients using Sanger sequencing. Only one apparently non-pathogenic deletion was found in SYCP3. On the other hand, we found six sequence variations in AURKB. The consequences of these changes on the protein were studied in silico using different bioinformatic tools. In addition, the frequency of three of the variations was studied using a high-resolution melting approach. The absence of these three variants in control samples and their position in the AURKB gene suggests their possible involvement in the patients' chromosomal instability. Interestingly, two of the identified changes in AURKB were found in each member of a couple with antecedents of spontaneous pregnancy loss, a fetal anencephaly and a deaf daughter. One of these changes is described here for the first time. Although further studies are necessary, our results are encouraging enough to propose the analysis of AURKB in couples with reproductive problems.
A comparative gene analysis with rice identified orthologous group II HKT genes and their association with Na(+) concentration in bread wheat.

PubMed

Ariyarathna, H A Chandima K; Oldach, Klaus H; Francki, Michael G

2016-01-19

Although the HKT transporter genes ascertain some of the key determinants of crop salt tolerance mechanisms, the diversity and functional role of group II HKT genes are not clearly understood in bread wheat. The advanced knowledge on rice HKT and whole genome sequence was, therefore, used in comparative gene analysis to identify orthologous wheat group II HKT genes and their role in trait variation under different saline environments. The four group II HKTs in rice identified two orthologous gene families from bread wheat, including the known TaHKT2;1 gene family and a new distinctly different gene family designated as TaHKT2;2. A single copy of TaHKT2;2 was found on each homeologous chromosome arm 7AL, 7BL and 7DL and each gene was expressed in leaf blade, sheath and root tissues under non-stressed and at 200 mM salt stressed conditions. The proteins encoded by genes of the TaHKT2;2 family revealed more than 93% amino acid sequence identity but ≤52% amino acid identity compared to the proteins encoded by TaHKT2;1 family. Specifically, variations in known critical domains predicted functional differences between the two protein families. Similar to orthologous rice genes on chromosome 6L, TaHKT2;1 and TaHKT2;2 genes were located approximately 3 kb apart on wheat chromosomes 7AL, 7BL and 7DL, forming a static syntenic block in the two species. The chromosomal region on 7AL containing TaHKT2;1 7AL-1 co-located with QTL for shoot Na(+) concentration and yield in some saline environments. The differences in copy number, genes sequences and encoded proteins between TaHKT2;2 homeologous genes and other group II HKT gene families within and across species likely reflect functional diversity for ion selectivity and transport in plants. Evidence indicated that neither TaHKT2;2 nor TaHKT2;1 were associated with primary root Na(+) uptake but TaHKT2;1 may be associated with trait variation for Na(+) exclusion and yield in some but not all saline environments.
Using Evolution to Guide Protein Engineering: The Devil IS in the Details.

PubMed

Swint-Kruse, Liskin

2016-07-12

For decades, protein engineers have endeavored to reengineer existing proteins for novel applications. Overall, protein folds and gross functions can be readily transferred from one protein to another by transplanting large blocks of sequence (i.e., domain recombination). However, predictably fine-tuning function (e.g., by adjusting ligand affinity, specificity, catalysis, and/or allosteric regulation) remains a challenge. One approach has been to use the sequences of protein families to identify amino acid positions that change during the evolution of functional variation. The rationale is that these nonconserved positions could be mutated to predictably fine-tune function. Evolutionary approaches to protein design have had some success, but the engineered proteins seldom replicate the functional performances of natural proteins. This Biophysical Perspective reviews several complexities that have been revealed by evolutionary and experimental studies of protein function. These include 1) challenges in defining computational and biological thresholds that define important amino acids; 2) the co-occurrence of many different patterns of amino acid changes in evolutionary data; 3) difficulties in mapping the patterns of amino acid changes to discrete functional parameters; 4) the nonconventional mutational outcomes that occur for a particular group of functionally important, nonconserved positions; 5) epistasis (nonadditivity) among multiple mutations; and 6) the fact that a large fraction of a protein's amino acids contribute to its overall function. To overcome these challenges, new goals are identified for future studies. Copyright © 2016 Biophysical Society. Published by Elsevier Inc. All rights reserved.

Increased apomixis expression concurrent with genetic and epigenetic variation in a newly synthesized Eragrostis curvula polyploid

NASA Astrophysics Data System (ADS)

Zappacosta, Diego C.; Ochogavía, Ana C.; Rodrigo, Juan M.; Romero, José R.; Meier, Mauro S.; Garbus, Ingrid; Pessino, Silvina C.; Echenique, Viviana C.

2014-04-01

Eragrostis curvula includes biotypes reproducing through obligate and facultative apomixis or, rarely, full sexuality. We previously generated a ``tetraploid-dihaploid-tetraploid'' series of plants consisting of a tetraploid apomictic plant (T), a sexual dihaploid plant (D) and a tetraploid artificial colchiploid (C). Initially, plant C was nearly 100% sexual. However, its capacity to form non-reduced embryo sacs dramatically increased over a four year period (2003-2007) to reach levels of 85-90%. Here, we confirmed high rates of apomixis in plant C, and used AFLPs and MSAPs to characterize the genetic and epigenetic variation observed in this plant in 2007 as compared to 2003. Of the polymorphic sequences, some had no coding potential whereas others were homologous to retrotransposons and/or protein-coding-like sequences. Our results suggest that in this particular plant system increased apomixis expression is concurrent with genetic and epigenetic modifications, possibly involving transposable elements.
Insights into mechanisms of bacterial antigenic variation derived from the complete genome sequence of Anaplasma marginale.

PubMed

Palmer, Guy H; Futse, James E; Knowles, Donald P; Brayton, Kelly A

2006-10-01

Persistence of Anaplasma spp. in the animal reservoir host is required for efficient tick-borne transmission of these pathogens to animals and humans. Using A. marginale infection of its natural reservoir host as a model, persistent infection has been shown to reflect sequential cycles in which antigenic variants emerge, replicate, and are controlled by the immune system. Variation in the immunodominant outer-membrane protein MSP2 is generated by a process of gene conversion, in which unique hypervariable region sequences (HVRs) located in pseudogenes are recombined into a single operon-linked msp2 expression site. Although organisms expressing whole HVRs derived from pseudogenes emerge early in infection, long-term persistent infection is dependent on the generation of complex mosaics in which segments from different HVRs recombine into the expression site. The resulting combinatorial diversity generates the number of variants both predicted and shown to emerge during persistence.
The complete mitochondrial genome sequence of Malus hupehensis var. pinyiensis.

PubMed

Duan, Naibin; Sun, Honghe; Wang, Nan; Fei, Zhangjun; Chen, Xuesen

2016-07-01

The complete mitochondrial genome sequence of Malus hupehensis var. pinyiensis, a widely used apple rootstock, was determined using the Illumina high-throughput sequencing approach. The genome is 422,555 bp in length and has a GC content of 45.21%. It is separated by a pair of inverted repeats of 32,504 bp, to form a large single copy region of 213,055 bp and a small single copy region of 144,492 bp. The genome contains 38 protein-coding genes, four pseudogenes, 25 tRNA genes, and three rRNA genes. The genome is 25,608 bp longer than that of M. domestica, and several structural variations between these two mitogenomes were detected.
Phylogeny of isolates of Prunus necrotic ringspot virus from the Ilarvirus Ringtest and identification of group-specific features.

PubMed

Hammond, R W

2003-06-01

Isolates of Prunus necrotic ringspot virus (PNRSV) were examined to establish the level of naturally occurring sequence variation in the coat protein (CP) gene and to identify group-specific genome features that may prove valuable for the generation of diagnostic reagents. Phylogenetic analysis of a 452 bp sequence of 68 virus isolates, 20 obtained from the European Union Ilarvirus Ringtest held in October 1998, confirmed the clustering of the isolates into three distinct groups. Although no correlation was found between the sequence and host or geographic origin, there was a general trend for severe isolates to cluster into one group. Group-specific features have been identified for discrimination between virus strains.
Phenotype classification of single cells using SRS microscopy, RNA sequencing, and microfluidics (Conference Presentation)

NASA Astrophysics Data System (ADS)

Streets, Aaron M.; Cao, Chen; Zhang, Xiannian; Huang, Yanyi

2016-03-01

Phenotype classification of single cells reveals biological variation that is masked in ensemble measurement. This heterogeneity is found in gene and protein expression as well as in cell morphology. Many techniques are available to probe phenotypic heterogeneity at the single cell level, for example quantitative imaging and single-cell RNA sequencing, but it is difficult to perform multiple assays on the same single cell. In order to directly track correlation between morphology and gene expression at the single cell level, we developed a microfluidic platform for quantitative coherent Raman imaging and immediate RNA sequencing (RNA-Seq) of single cells. With this device we actively sort and trap cells for analysis with stimulated Raman scattering microscopy (SRS). The cells are then processed in parallel pipelines for lysis, and preparation of cDNA for high-throughput transcriptome sequencing. SRS microscopy offers three-dimensional imaging with chemical specificity for quantitative analysis of protein and lipid distribution in single cells. Meanwhile, the microfluidic platform facilitates single-cell manipulation, minimizes contamination, and furthermore, provides improved RNA-Seq detection sensitivity and measurement precision, which is necessary for differentiating biological variability from technical noise. By combining coherent Raman microscopy with RNA sequencing, we can better understand the relationship between cellular morphology and gene expression at the single-cell level.
Complementary genetic and genomic approaches help characterize the linkage group I seed protein QTL in soybean

PubMed Central

2010-01-01

Background The nutritional and economic value of many crops is effectively a function of seed protein and oil content. Insight into the genetic and molecular control mechanisms involved in the deposition of these constituents in the developing seed is needed to guide crop improvement. A quantitative trait locus (QTL) on Linkage Group I (LG I) of soybean (Glycine max (L.) Merrill) has a striking effect on seed protein content. Results A soybean near-isogenic line (NIL) pair contrasting in seed protein and differing in an introgressed genomic segment containing the LG I protein QTL was used as a resource to demarcate the QTL region and to study variation in transcript abundance in developing seed. The LG I QTL region was delineated to less than 8.4 Mbp of genomic sequence on chromosome 20. Using Affymetrix® Soy GeneChip and high-throughput Illumina® whole transcriptome sequencing platforms, 13 genes displaying significant seed transcript accumulation differences between NILs were identified that mapped to the 8.4 Mbp LG I protein QTL region. Conclusions This study identifies gene candidates at the LG I protein QTL for potential involvement in the regulation of protein content in the soybean seed. The results demonstrate the power of complementary approaches to characterize contrasting NILs and provide genome-wide transcriptome insight towards understanding seed biology and the soybean genome. PMID:20199683
Wheat beta-expansin (EXPB11) genes: Identification of the expressed gene on chromosome 3BS carrying a pollen allergen domain

PubMed Central

2010-01-01

Background Expansins form a large multi-gene family found in wheat and other cereal genomes that are involved in the expansion of cell walls as a tissue grows. The expansin family can be divided up into two main groups, namely, alpha-expansin (EXPA) and beta-expansin proteins (EXPB), with the EXPB group being of particular interest as group 1-pollen allergens. Results In this study, three beta-expansin genes were identified and characterized from a newly sequenced region of the Triticum aestivum cv. Chinese Spring chromosome 3B physical map at the Sr2 locus (FPC contig ctg11). The analysis of a 357 kb sub-sequence of FPC contig ctg11 identified one beta-expansin genes to be TaEXPB11, originally identified as a cDNA from the wheat cv Wyuna. Through the analysis of intron sequences of the three wheat cv. Chinese Spring genes, we propose that two of these beta-expansin genes are duplications of the TaEXPB11 gene. Comparative sequence analysis with two other wheat cultivars (cv. Westonia and cv. Hope) and a Triticum aestivum var. spelta line validated the identification of the Chinese Spring variant of TaEXPB11. The expression in maternal and grain tissues was confirmed by examining EST databases and carrying out RT-PCR experiments. Detailed examination of the position of TaEXPB11 relative to the locus encoding Sr2 disease resistance ruled out the possibility of this gene directly contributing to the resistance phenotype. Conclusions Through 3-D structural protein comparisons with Zea mays EXPB1, we proposed that variations within the coding sequence of TaEXPB11 in wheats may produce a functional change within features such as domain 1 related to possible involvement in cell wall structure and domain 2 defining the pollen allergen domain and binding to IgE protein. The variation established in this gene suggests it is a clearly identifiable member of a gene family and reflects the dynamic features of the wheat genome as it adapted to a range of different environments and uses. Accession Numbers: ctg11 =FN564426 Survey sequences of TaEXPB11ws and TsEXPB11 are provided request. PMID:20507562
Challenges in the association of human single nucleotide polymorphism mentions with unique database identifiers

PubMed Central

2011-01-01

Background Most information on genomic variations and their associations with phenotypes are covered exclusively in scientific publications rather than in structured databases. These texts commonly describe variations using natural language; database identifiers are seldom mentioned. This complicates the retrieval of variations, associated articles, as well as information extraction, e. g. the search for biological implications. To overcome these challenges, procedures to map textual mentions of variations to database identifiers need to be developed. Results This article describes a workflow for normalization of variation mentions, i.e. the association of them to unique database identifiers. Common pitfalls in the interpretation of single nucleotide polymorphism (SNP) mentions are highlighted and discussed. The developed normalization procedure achieves a precision of 98.1 % and a recall of 67.5% for unambiguous association of variation mentions with dbSNP identifiers on a text corpus based on 296 MEDLINE abstracts containing 527 mentions of SNPs. The annotated corpus is freely available at http://www.scai.fraunhofer.de/snp-normalization-corpus.html. Conclusions Comparable approaches usually focus on variations mentioned on the protein sequence and neglect problems for other SNP mentions. The results presented here indicate that normalizing SNPs described on DNA level is more difficult than the normalization of SNPs described on protein level. The challenges associated with normalization are exemplified with ambiguities and errors, which occur in this corpus. PMID:21992066
Comparison of theoretical proteomes: identification of COGs with conserved and variable pI within the multimodal pI distribution.

PubMed

Nandi, Soumyadeep; Mehra, Nipun; Lynn, Andrew M; Bhattacharya, Alok

2005-09-09

Theoretical proteome analysis, generated by plotting theoretical isoelectric points (pI) against molecular masses of all proteins encoded by the genome show a multimodal distribution for pI. This multimodal distribution is an effect of allowed combinations of the charged amino acids, and not due to evolutionary causes. The variation in this distribution can be correlated to the organisms ecological niche. Contributions to this variation maybe mapped to individual proteins by studying the variation in pI of orthologs across microorganism genomes. The distribution of ortholog pI values showed trimodal distributions for all prokaryotic genomes analyzed, similar to whole proteome plots. Pairwise analysis of pI variation show that a few COGs are conserved within, but most vary between, the acidic and basic regions of the distribution, while molecular mass is more highly conserved. At the level of functional grouping of orthologs, five groups vary significantly from the population of orthologs, which is attributed to either conservation at the level of sequences or a bias for either positively or negatively charged residues contributing to the function. Individual COGs conserved in both the acidic and basic regions of the trimodal distribution are identified, and orthologs that best represent the variation in levels of the acidic and basic regions are listed. The analysis of pI distribution by using orthologs provides a basis for resolution of theoretical proteome comparison at the level of individual proteins. Orthologs identified that significantly vary between the major acidic and basic regions maybe used as representative of the variation of the entire proteome.
Structure of the circumsporozoite protein gene in 18 strains of Plasmodium falciparum.

PubMed

Weber, J L; Hockmeyer, W T

1985-06-01

Using the cloned circumsporozoite (CS) protein gene of a Brazilian strain of Plasmodium falciparum as probe, we have analyzed the structure of the CS protein gene from 17 other Asian, African, Central and South American parasite strains by nucleic acid hybridization. Each strain appears to have one CS protein gene which hybridizes readily to the Brazilian strain probe. The 5' and 3' thirds of the genes are invariant in size in all 18 strains whereas the central third containing the 12 base pair tandem repeats varies in size over a range of about 100 base pairs. Several differences were found in the locations of Sau3A sites in the genes. The Sau3A sites are significant because each of the minority Asn-Val-Asp-Pro repeats in the cloned gene has a Sau3A site. DNA melting of hybrids revealed a high degree of homology between the sequences of the cloned gene and genes from an Asian strain and an African strain. A 14 base oligodeoxynucleotide with a sequence from the central repeat region hybridized to all strains tested. We conclude that the CS protein gene is highly conserved among strains of P. falciparum and that malaria vaccine development with the CS protein is unlikely to be complicated by strain variation.
Rare coding variation in paraoxonase-1 is associated with ischemic stroke in the NHLBI Exome Sequencing Project.

PubMed

Kim, Daniel Seung; Crosslin, David R; Auer, Paul L; Suzuki, Stephanie M; Marsillach, Judit; Burt, Amber A; Gordon, Adam S; Meschia, James F; Nalls, Mike A; Worrall, Bradford B; Longstreth, W T; Gottesman, Rebecca F; Furlong, Clement E; Peters, Ulrike; Rich, Stephen S; Nickerson, Deborah A; Jarvik, Gail P

2014-06-01

HDL-associated paraoxonase-1 (PON1) is an enzyme whose activity is associated with cerebrovascular disease. Common PON1 genetic variants have not been consistently associated with cerebrovascular disease. Rare coding variation that likely alters PON1 enzyme function may be more strongly associated with stroke. The National Heart, Lung, and Blood Institute Exome Sequencing Project sequenced the coding regions (exomes) of the genome for heart, lung, and blood-related phenotypes (including ischemic stroke). In this sample of 4,204 unrelated participants, 496 had verified, noncardioembolic ischemic stroke. After filtering, 28 nonsynonymous PON1 variants were identified. Analysis with the sequence kernel association test, adjusted for covariates, identified significant associations between PON1 variants and ischemic stroke (P = 3.01 × 10(-3)). Stratified analyses demonstrated a stronger association of PON1 variants with ischemic stroke in African ancestry (AA) participants (P = 5.03 × 10(-3)). Ethnic differences in the association between PON1 variants with stroke could be due to the effects of PON1Val109Ile (overall P = 7.88 × 10(-3); AA P = 6.52 × 10(-4)), found at higher frequency in AA participants (1.16% vs. 0.02%) and whose protein is less stable than the common allele. In summary, rare genetic variation in PON1 was associated with ischemic stroke, with stronger associations identified in those of AA. Increased focus on PON1 enzyme function and its role in cerebrovascular disease is warranted.
DNA Shape Dominates Sequence Affinity in Nucleosome Formation

NASA Astrophysics Data System (ADS)

Freeman, Gordon S.; Lequieu, Joshua P.; Hinckley, Daniel M.; Whitmer, Jonathan K.; de Pablo, Juan J.

2014-10-01

Nucleosomes provide the basic unit of compaction in eukaryotic genomes, and the mechanisms that dictate their position at specific locations along a DNA sequence are of central importance to genetics. In this Letter, we employ molecular models of DNA and proteins to elucidate various aspects of nucleosome positioning. In particular, we show how DNA's histone affinity is encoded in its sequence-dependent shape, including subtle deviations from the ideal straight B-DNA form and local variations of minor groove width. By relying on high-precision simulations of the free energy of nucleosome complexes, we also demonstrate that, depending on DNA's intrinsic curvature, histone binding can be dominated by bending interactions or electrostatic interactions. More generally, the results presented here explain how sequence, manifested as the shape of the DNA molecule, dominates molecular recognition in the problem of nucleosome positioning.
Genotypic characterization of CRF01_AE env genes derived from human immunodeficiency virus type 1-infected patients residing in central Thailand.

PubMed

Utachee, Piraporn; Jinnopat, Piyamat; Isarangkura-Na-Ayuthaya, Panasda; de Silva, Udayanga Chandimal; Nakamura, Shota; Siripanyaphinyo, Uamporn; Wichukchinda, Nuanjun; Tokunaga, Kenzo; Yasunaga, Teruo; Sawanpanyalert, Pathom; Ikuta, Kazuyoshi; Auwanit, Wattana; Kameoka, Masanori

2009-02-01

CRF01_AE is a major subtype of human immunodeficiency virus type 1 (HIV-1) circulating in Southeast Asia, including Thailand. HIV-1 env genes were amplified by polymerase chain reaction from blood samples of HIV-1-infected patients residing in Thailand in 2006, and cloned into the pNL4-3-derived reporter viral construct. Generated envelope protein (Env)-recombinant virus was examined for its infectivity, and then 35 infectious CRF01_AE Env-recombinant viruses were selected. Sequencing analysis revealed that the interclone variation of the deduced amino acid sequences was higher in CRF01_AE env genes isolated in 2006 than in those isolated in the early 1990s, suggesting that env gene variation has been increasing gradually among CRF01_AE viruses prevalent in Thailand. We also examined the characteristics of the deduced amino acid sequences of 35 CRF01_AE env genes. Our results may provide useful information to help in better understanding the genotype of env genes of CRF01_AE viruses currently circulating in Thailand.
Using Common Spatial Distributions of Atoms to Relate Functionally Divergent Influenza Virus N10 and N11 Protein Structures to Functionally Characterized Neuraminidase Structures, Toxin Cell Entry Domains, and Non-Influenza Virus Cell Entry Domains

PubMed Central

Weininger, Arthur; Weininger, Susan

2015-01-01

The ability to identify the functional correlates of structural and sequence variation in proteins is a critical capability. We related structures of influenza A N10 and N11 proteins that have no established function to structures of proteins with known function by identifying spatially conserved atoms. We identified atoms with common distributed spatial occupancy in PDB structures of N10 protein, N11 protein, an influenza A neuraminidase, an influenza B neuraminidase, and a bacterial neuraminidase. By superposing these spatially conserved atoms, we aligned the structures and associated molecules. We report spatially and sequence invariant residues in the aligned structures. Spatially invariant residues in the N6 and influenza B neuraminidase active sites were found in previously unidentified spatially equivalent sites in the N10 and N11 proteins. We found the corresponding secondary and tertiary structures of the aligned proteins to be largely identical despite significant sequence divergence. We found structural precedent in known non-neuraminidase structures for residues exhibiting structural and sequence divergence in the aligned structures. In N10 protein, we identified staphylococcal enterotoxin I-like domains. In N11 protein, we identified hepatitis E E2S-like domains, SARS spike protein-like domains, and toxin components shared by alpha-bungarotoxin, staphylococcal enterotoxin I, anthrax lethal factor, clostridium botulinum neurotoxin, and clostridium tetanus toxin. The presence of active site components common to the N6, influenza B, and S. pneumoniae neuraminidases in the N10 and N11 proteins, combined with the absence of apparent neuraminidase function, suggests that the role of neuraminidases in H17N10 and H18N11 emerging influenza A viruses may have changed. The presentation of E2S-like, SARS spike protein-like, or toxin-like domains by the N10 and N11 proteins in these emerging viruses may indicate that H17N10 and H18N11 sialidase-facilitated cell entry has been supplemented or replaced by sialidase-independent receptor binding to an expanded cell population that may include neurons and T-cells. PMID:25706124
AGAPE (Automated Genome Analysis PipelinE) for Pan-Genome Analysis of Saccharomyces cerevisiae

PubMed Central

Song, Giltae; Dickins, Benjamin J. A.; Demeter, Janos; Engel, Stacia; Dunn, Barbara; Cherry, J. Michael

2015-01-01

The characterization and public release of genome sequences from thousands of organisms is expanding the scope for genetic variation studies. However, understanding the phenotypic consequences of genetic variation remains a challenge in eukaryotes due to the complexity of the genotype-phenotype map. One approach to this is the intensive study of model systems for which diverse sources of information can be accumulated and integrated. Saccharomyces cerevisiae is an extensively studied model organism, with well-known protein functions and thoroughly curated phenotype data. To develop and expand the available resources linking genomic variation with function in yeast, we aim to model the pan-genome of S. cerevisiae. To initiate the yeast pan-genome, we newly sequenced or re-sequenced the genomes of 25 strains that are commonly used in the yeast research community using advanced sequencing technology at high quality. We also developed a pipeline for automated pan-genome analysis, which integrates the steps of assembly, annotation, and variation calling. To assign strain-specific functional annotations, we identified genes that were not present in the reference genome. We classified these according to their presence or absence across strains and characterized each group of genes with known functional and phenotypic features. The functional roles of novel genes not found in the reference genome and associated with strains or groups of strains appear to be consistent with anticipated adaptations in specific lineages. As more S. cerevisiae strain genomes are released, our analysis can be used to collate genome data and relate it to lineage-specific patterns of genome evolution. Our new tool set will enhance our understanding of genomic and functional evolution in S. cerevisiae, and will be available to the yeast genetics and molecular biology community. PMID:25781462
Enhancer activity of Helitron in sericin-1 gene promoter from Bombyx mori.

PubMed

Huang, Ke; Li, Chun-Feng; Wu, Jie; Wei, Jun-Hong; Zou, Yong; Han, Min-Jin; Zhou, Ze-Yang

2016-06-01

Sericin is a kind of water-soluble protein expressed specifically in the middle silk gland of Bombyx mori. When the sericin-1 gene promoter was cloned and a transgenic vector was constructed to express a foreign protein, a specific Helitron, Bmhel-8, was identified in the sericin-1 gene promoter sequence in some genotypes of Bombyx mori and Bombyx mandarina. Given that the Bmhel-8 Helitron transposon was present only in some genotypes, it could be the source of allelic variation in the sericin-1 promoter. The length of the sericin-1 promoter sequence is approximately 1063 or 643 bp. The larger size of the sequence or allele is ascribed to the presence of Bmhel-8. Silkworm genotypes can be homozygous for either the shorter or larger promoter sequence or heterozygous, containing both alleles. Bmhel-8 in the sericin-1 promoter exhibits enhancer activity, as demonstrated by a dual-luciferase reporter system in BmE cell lines. Furthermore, Bmhel-8 displays enhancer activity in a sericin-1 promoter-driven gene expression system but does not regulate the tissue-specific expression of sericin-1. © 2016 Institute of Zoology, Chinese Academy of Sciences.
Gene: a gene-centered information resource at NCBI.

PubMed

Brown, Garth R; Hem, Vichet; Katz, Kenneth S; Ovetsky, Michael; Wallin, Craig; Ermolaeva, Olga; Tolstoy, Igor; Tatusova, Tatiana; Pruitt, Kim D; Maglott, Donna R; Murphy, Terence D

2015-01-01

The National Center for Biotechnology Information's (NCBI) Gene database (www.ncbi.nlm.nih.gov/gene) integrates gene-specific information from multiple data sources. NCBI Reference Sequence (RefSeq) genomes for viruses, prokaryotes and eukaryotes are the primary foundation for Gene records in that they form the critical association between sequence and a tracked gene upon which additional functional and descriptive content is anchored. Additional content is integrated based on the genomic location and RefSeq transcript and protein sequence data. The content of a Gene record represents the integration of curation and automated processing from RefSeq, collaborating model organism databases, consortia such as Gene Ontology, and other databases within NCBI. Records in Gene are assigned unique, tracked integers as identifiers. The content (citations, nomenclature, genomic location, gene products and their attributes, phenotypes, sequences, interactions, variation details, maps, expression, homologs, protein domains and external databases) is available via interactive browsing through NCBI's Entrez system, via NCBI's Entrez programming utilities (E-Utilities and Entrez Direct) and for bulk transfer by FTP. Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.
Complete Sequence and Analysis of Coconut Palm (Cocos nucifera) Mitochondrial Genome.

PubMed

Aljohi, Hasan Awad; Liu, Wanfei; Lin, Qiang; Zhao, Yuhui; Zeng, Jingyao; Alamer, Ali; Alanazi, Ibrahim O; Alawad, Abdullah O; Al-Sadi, Abdullah M; Hu, Songnian; Yu, Jun

2016-01-01

Coconut (Cocos nucifera L.), a member of the palm family (Arecaceae), is one of the most economically important crops in tropics, serving as an important source of food, drink, fuel, medicine, and construction material. Here we report an assembly of the coconut (C. nucifera, Oman local Tall cultivar) mitochondrial (mt) genome based on next-generation sequencing data. This genome, 678,653bp in length and 45.5% in GC content, encodes 72 proteins, 9 pseudogenes, 23 tRNAs, and 3 ribosomal RNAs. Within the assembly, we find that the chloroplast (cp) derived regions account for 5.07% of the total assembly length, including 13 proteins, 2 pseudogenes, and 11 tRNAs. The mt genome has a relatively large fraction of repeat content (17.26%), including both forward (tandem) and inverted (palindromic) repeats. Sequence variation analysis shows that the Ti/Tv ratio of the mt genome is lower as compared to that of the nuclear genome and neutral expectation. By combining public RNA-Seq data for coconut, we identify 734 RNA editing sites supported by at least two datasets. In summary, our data provides the second complete mt genome sequence in the family Arecaceae, essential for further investigations on mitochondrial biology of seed plants.
Structural adaptation of the subunit interface of oligomeric thermophilic and hyperthermophilic enzymes.

PubMed

Maugini, Elisa; Tronelli, Daniele; Bossa, Francesco; Pascarella, Stefano

2009-04-01

Enzymes from thermophilic and, particularly, from hyperthermophilic organisms are surprisingly stable. Understanding of the molecular origin of protein thermostability and thermoactivity attracted the interest of many scientist both for the perspective comprehension of the principles of protein structure and for the possible biotechnological applications through application of protein engineering. Comparative studies at sequence and structure levels were aimed at detecting significant differences of structural parameters related to protein stability between thermophilic and hyperhermophilic structures and their mesophilic homologs. Comparative studies were useful in the identification of a few recurrent themes which the evolution utilized in different combinations in different protein families. These studies were mostly carried out at the monomer level. However, maintenance of a proper quaternary structure is an essential prerequisite for a functional macromolecule. At the environmental temperatures experienced typically by hyper- and thermophiles, the subunit interactions mediated by the interface must be sufficiently stable. Our analysis was therefore aimed at the identification of the molecular strategies adopted by evolution to enhance interface thermostability of oligomeric enzymes. The variation of several structural properties related to protein stability were tested at the subunit interfaces of thermophilic and hyperthermophilic oligomers. The differences of the interface structural features observed between the hyperthermophilic and thermophilic enzymes were compared with the differences of the same properties calculated from pairwise comparisons of oligomeric mesophilic proteins contained in a reference dataset. The significance of the observed differences of structural properties was measured by a t-test. Ion pairs and hydrogen bonds do not vary significantly while hydrophobic contact area increases specially in hyperthermophilic interfaces. Interface compactness also appears to increase in the hyperthermophilic proteins. Variations of amino acid composition at the interfaces reflects the variation of the interface properties.
Structural flexibility and protein adaptation to temperature: Molecular dynamics analysis of malate dehydrogenases of marine molluscs.

PubMed

Dong, Yun-Wei; Liao, Ming-Ling; Meng, Xian-Liang; Somero, George N

2018-02-06

Orthologous proteins of species adapted to different temperatures exhibit differences in stability and function that are interpreted to reflect adaptive variation in structural "flexibility." However, quantifying flexibility and comparing flexibility across proteins has remained a challenge. To address this issue, we examined temperature effects on cytosolic malate dehydrogenase (cMDH) orthologs from differently thermally adapted congeners of five genera of marine molluscs whose field body temperatures span a range of ∼60 °C. We describe consistent patterns of convergent evolution in adaptation of function [temperature effects on K M of cofactor (NADH)] and structural stability (rate of heat denaturation of activity). To determine how these differences depend on flexibilities of overall structure and of regions known to be important in binding and catalysis, we performed molecular dynamics simulation (MDS) analyses. MDS analyses revealed a significant negative correlation between adaptation temperature and heat-induced increase of backbone atom movements [root mean square deviation (rmsd) of main-chain atoms]. Root mean square fluctuations (RMSFs) of movement by individual amino acid residues varied across the sequence in a qualitatively similar pattern among orthologs. Regions of sequence involved in ligand binding and catalysis-termed mobile regions 1 and 2 (MR1 and MR2), respectively-showed the largest values for RMSF. Heat-induced changes in RMSF values across the sequence and, importantly, in MR1 and MR2 were greatest in cold-adapted species. MDS methods are shown to provide powerful tools for examining adaptation of enzymes by providing a quantitative index of protein flexibility and identifying sequence regions where adaptive change in flexibility occurs.

Community detection in sequence similarity networks based on attribute clustering

DOE PAGES

Chowdhary, Janamejaya; Loeffler, Frank E.; Smith, Jeremy C.

2017-07-24

Networks are powerful tools for the presentation and analysis of interactions in multi-component systems. A commonly studied mesoscopic feature of networks is their community structure, which arises from grouping together similar nodes into one community and dissimilar nodes into separate communities. Here in this paper, the community structure of protein sequence similarity networks is determined with a new method: Attribute Clustering Dependent Communities (ACDC). Sequence similarity has hitherto typically been quantified by the alignment score or its expectation value. However, pair alignments with the same score or expectation value cannot thus be differentiated. To overcome this deficiency, the method constructs,more » for pair alignments, an extended alignment metric, the link attribute vector, which includes the score and other alignment characteristics. Rescaling components of the attribute vectors qualitatively identifies a systematic variation of sequence similarity within protein superfamilies. The problem of community detection is then mapped to clustering the link attribute vectors, selection of an optimal subset of links and community structure refinement based on the partition density of the network. ACDC-predicted communities are found to be in good agreement with gold standard sequence databases for which the "ground truth" community structures (or families) are known. ACDC is therefore a community detection method for sequence similarity networks based entirely on pair similarity information. A serial implementation of ACDC is available from https://cmb.ornl.gov/resources/developments« less
Origins of Genes: "Big Bang" or Continuous Creation?

NASA Astrophysics Data System (ADS)

Kesse, Paul K.; Gibbs, Adrian

1992-10-01

Many protein families are common to all cellular organisms, indicating that many genes have ancient origins. Genetic variation is mostly attributed to processes such as mutation, duplication, and rearrangement of ancient modules. Thus it is widely assumed that much of present-day genetic diversity can be traced by common ancestry to a molecular "big bang." A rarely considered alternative is that proteins may arise continuously de novo. One mechanism of generating different coding sequences is by "overprinting," in which an existing nucleotide sequence is translated de novo in a different reading frame or from noncoding open reading frames. The clearest evidence for overprinting is provided when the original gene function is retained, as in overlapping genes. Analysis of their phylogenies indicates which are the original genes and which are their informationally novel partners. We report here the phylogenetic relationships of overlapping coding sequences from steroid-related receptor genes and from tymovirus, luteovirus, and lentivirus genomes. For each pair of overlapping coding sequences, one is confined to a single lineage, whereas the other is more widespread. This suggests that the phylogenetically restricted coding sequence arose only in the progenitor of that lineage by translating an out-of-frame sequence to yield the new polypeptide. The production of novel exons by alternative splicing in thyroid receptor and lentivirus genes suggests that introns can be a valuable evolutionary source for overprinting. New genes and their products may drive major evolutionary changes.
Draft genome sequence for virulent and avirulent strains of Xanthomonas arboricola isolated from Prunus spp. in Spain.

PubMed

Garita-Cambronero, Jerson; Palacio-Bielsa, Ana; López, María M; Cubero, Jaime

2016-01-01

Xanthomonas arboricola is a species in genus Xanthomonas which is mainly comprised of plant pathogens. Among the members of this taxon, X. arboricola pv. pruni, the causal agent of bacterial spot disease of stone fruits and almond, is distributed worldwide although it is considered a quarantine pathogen in the European Union. Herein, we report the draft genome sequence, the classification, the annotation and the sequence analyses of a virulent strain, IVIA 2626.1, and an avirulent strain, CITA 44, of X. arboricola associated with Prunus spp. The draft genome sequence of IVIA 2626.1 consists of 5,027,671 bp, 4,720 protein coding genes and 50 RNA encoding genes. The draft genome sequence of strain CITA 44 consists of 4,760,482 bp, 4,250 protein coding genes and 56 RNA coding genes. Initial comparative analyses reveals differences in the presence of structural and regulatory components of the type IV pilus, the type III secretion system, the type III effectors as well as variations in the number of the type IV secretion systems. The genome sequence data for these strains will facilitate the development of molecular diagnostics protocols that differentiate virulent and avirulent strains. In addition, comparative genome analysis will provide insights into the plant-pathogen interaction during the bacterial spot disease process.
Community detection in sequence similarity networks based on attribute clustering

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chowdhary, Janamejaya; Loeffler, Frank E.; Smith, Jeremy C.

Networks are powerful tools for the presentation and analysis of interactions in multi-component systems. A commonly studied mesoscopic feature of networks is their community structure, which arises from grouping together similar nodes into one community and dissimilar nodes into separate communities. Here in this paper, the community structure of protein sequence similarity networks is determined with a new method: Attribute Clustering Dependent Communities (ACDC). Sequence similarity has hitherto typically been quantified by the alignment score or its expectation value. However, pair alignments with the same score or expectation value cannot thus be differentiated. To overcome this deficiency, the method constructs,more » for pair alignments, an extended alignment metric, the link attribute vector, which includes the score and other alignment characteristics. Rescaling components of the attribute vectors qualitatively identifies a systematic variation of sequence similarity within protein superfamilies. The problem of community detection is then mapped to clustering the link attribute vectors, selection of an optimal subset of links and community structure refinement based on the partition density of the network. ACDC-predicted communities are found to be in good agreement with gold standard sequence databases for which the "ground truth" community structures (or families) are known. ACDC is therefore a community detection method for sequence similarity networks based entirely on pair similarity information. A serial implementation of ACDC is available from https://cmb.ornl.gov/resources/developments« less
High sequence variations in the region containing genes encoding a cellular morphogenesis protein and the repressor of sexual development help to reveal origins of Aspergillus oryzae

USDA-ARS?s Scientific Manuscript database

Aspergillus oryzae and Aspergillus flavus are closely related fungal species. The A. flavus population that produces numerous small sclerotia (S strain) and aflatoxin has a unique 1.5 kb deletion in the norB-cypA region of the aflatoxin gene cluster (the S genotype). Phylogenetic studies have indica...
Prognostic and predictive value of TP53 mutations in node-positive breast cancer patients treated with anthracycline- or anthracycline/taxane-based adjuvant therapy: results from the BIG 02-98 phase III trial

PubMed Central

2012-01-01

Abstract Introduction Pre-clinical data suggest p53-dependent anthracycline-induced apoptosis and p53-independent taxane activity. However, dedicated clinical research has not defined a predictive role for TP53 gene mutations. The aim of the current study was to retrospectively explore the prognosis and predictive values of TP53 somatic mutations in the BIG 02-98 randomized phase III trial in which women with node-positive breast cancer were treated with adjuvant doxorubicin-based chemotherapy with or without docetaxel. Methods The prognostic and predictive values of TP53 were analyzed in tumor samples by gene sequencing within exons 5 to 8. Patients were classified according to p53 protein status predicted from TP53 gene sequence, as wild-type (no TP53 variation or TP53 variations which are predicted not to modify p53 protein sequence) or mutant (p53 nonsynonymous mutations). Mutations were subcategorized according to missense or truncating mutations. Survival analyses were performed using the Kaplan-Meier method and log-rank test. Cox-regression analysis was used to identify independent predictors of outcome. Results TP53 gene status was determined for 18% (520 of 2887) of the women enrolled in BIG 02-98. TP53 gene variations were found in 17% (90 of 520). Nonsynonymous p53 mutations, found in 16.3% (85 of 520), were associated with older age, ductal morphology, higher grade and hormone-receptor negativity. Of the nonsynonymous mutations, 12.3% (64 of 520) were missense and 3.6% were truncating (19 of 520). Only truncating mutations showed significant independent prognostic value, with an increased recurrence risk compared to patients with non-modified p53 protein (hazard ratio = 3.21, 95% confidence interval = 1.740 to 5.935, P = 0.0002). p53 status had no significant predictive value for response to docetaxel. Conclusions p53 truncating mutations were uncommon but associated with poor prognosis. No significant predictive role for p53 status was detected. Trial registration ClinicalTrials.gov NCT00174655 PMID:22551440
Mutations in the C-terminus of CDKL5: proceed with caution.

PubMed

Diebold, Bertrand; Delépine, Chloé; Gataullina, Svetlana; Delahaye, Andrée; Nectoux, Juliette; Bienvenu, Thierry

2014-02-01

Mutations in the cyclin-dependent kinase-like 5 (CDKL5) gene have been described in girls with Rett-like features and early-onset epileptic encephalopathy including infantile spasms. Milder phenotypes have been associated with sequence variations in the 3'-end of the CDKL5 gene. Identification of novel CDKL5 transcripts coding isoforms characterized by an altered C-terminal region strongly questions the eventual pathogenicity of sequence variations located in the 3'-end of the gene. We investigated a group of 30 female patients with a clinically heterogeneous phenotype ranging from nonspecific intellectual disability to a severe neonatal encephalopathy and identified two heterozygous CDKL5 missense mutations, the previously reported p.Val999Met and the novel mutation p.Pro944Thr. However, these mutations have also been detected in their healthy father. Considering our results and all data from the literature, we suggest that genetic variations beyond the codon 938 in human CDKL5115 protein may have minor or no significance. It is probable that screening of exons 19-21 of the CDKL5 gene is not useful in practical molecular diagnosis of atypical Rett syndrome.
Analysis of Nucleotide Variations in Genes of Iron Management in Patients of Parkinson's Disease and Other Movement Disorders

PubMed Central

Castiglioni, Emanuela; Finazzi, Dario; Goldwurm, Stefano; Pezzoli, Gianni; Forni, Gianluca; Girelli, Domenico; Maccarinelli, Federica; Poli, Maura; Ferrari, Maurizio; Cremonesi, Laura; Arosio, Paolo

2011-01-01

The capacity to act as an electron donor and acceptor makes iron an essential cofactor of many vital processes. Its balance in the body has to be tightly regulated since its excess can be harmful by favouring oxidative damage, while its deficiency can impair fundamental activities like erythropoiesis. In the brain, an accumulation of iron or an increase in its availability has been associated with the development and/or progression of different degenerative processes, including Parkinson's disease, while iron paucity seems to be associated with cognitive deficits, motor dysfunction, and restless legs syndrome. In the search of DNA sequence variations affecting the individual predisposition to develop movement disorders, we scanned by DHPLC the exons and intronic boundary regions of ceruloplasmin, iron regulatory protein 2, hemopexin, hepcidin and hemojuvelin genes in cohorts of subjects affected by Parkinson's disease and idiopathic neurodegeneration with brain iron accumulation (NBIA). Both novel and known sequence variations were identified in most of the genes, but none of them seemed to be significantly associated to the movement diseases of interest. PMID:20981230
The Diversity Present in 5140 Human Mitochondrial Genomes

PubMed Central

Pereira, Luísa; Freitas, Fernando; Fernandes, Verónica; Pereira, Joana B.; Costa, Marta D.; Costa, Stephanie; Máximo, Valdemar; Macaulay, Vincent; Rocha, Ricardo; Samuels, David C.

2009-01-01

We analyzed the current status (as of the end of August 2008) of human mitochondrial genomes deposited in GenBank, amounting to 5140 complete or coding-region sequences, in order to present an overall picture of the diversity present in the mitochondrial DNA of the global human population. To perform this task, we developed mtDNA-GeneSyn, a computer tool that identifies and exhaustedly classifies the diversity present in large genetic data sets. The diversity observed in the 5140 human mitochondrial genomes was compared with all possible transitions and transversions from the standard human mitochondrial reference genome. This comparison showed that tRNA and rRNA secondary structures have a large effect in limiting the diversity of the human mitochondrial sequences, whereas for the protein-coding genes there is a bias toward less variation at the second codon positions. The analysis of the observed amino acid variations showed a tolerance of variations that convert between the amino acids V, I, A, M, and T. This defines a group of amino acids with similar chemical properties that can interconvert by a single transition. PMID:19426953
Molecular diversity of α-gliadin expressed genes in genetically contrasted spelt (Triticum aestivum ssp. spelta) accessions and comparison with bread wheat (T. aestivum ssp. aestivum) and related diploid Triticum and Aegilops species.

PubMed

Dubois, Benjamin; Bertin, Pierre; Mingeot, Dominique

2016-01-01

The gluten proteins of cereals such as bread wheat ( Triticum aestivum ssp. aestivum ) and spelt ( T. aestivum ssp. spelta ) are responsible for celiac disease (CD). The α-gliadins constitute the most immunogenic class of gluten proteins as they include four main T-cell stimulatory epitopes that affect CD patients. Spelt has been less studied than bread wheat and could constitute a source of valuable diversity. The objective of this work was to study the genetic diversity of spelt α-gliadin transcripts and to compare it with those of bread wheat. Genotyping data from 85 spelt accessions obtained with 19 simple sequence repeat (SSR) markers were used to select 11 contrasted accessions, from which 446 full open reading frame α-gliadin genes were cloned and sequenced, which revealed a high allelic diversity. High variations among the accessions were highlighted, in terms of the proportion of α-gliadin sequences from each of the three genomes (A, B and D), and their composition in the four T-cell stimulatory epitopes. An accession from Tajikistan stood out, having a particularly high proportion of α-gliadins from the B genome and a low immunogenic content. Even if no clear separation between spelt and bread wheat sequences was shown, spelt α-gliadins displayed specific features concerning e.g. the frequencies of some amino acid substitutions. Given this observation and the variations in toxicity revealed in the spelt accessions in this study, the high genetic diversity held in spelt germplasm collections could be a valuable resource in the development of safer varieties for CD patients.
Viral evolution in HLA-B27-restricted CTL epitopes in human immunodeficiency virus type 1-infected individuals.

PubMed

Setiawan, Laurentia C; Gijsbers, Esther F; van Nuenen, Adrianus C; Kootstra, Neeltje A

2015-08-01

The HLA-B27 allele is over-represented among human immunodeficiency virus type 1-infected long-term non-progressors. In these patients, strong CTL responses targeting HLA-B27-restricted viral epitopes have been associated with long-term asymptomatic survival. Indeed, loss of control of viraemia in HLA-B27 patients has been associated with CTL escape at position 264 in the immunodominant KK10 epitope. This CTL escape mutation in the viral Gag protein has been associated with severe viral attenuation and may require the presence of compensatory mutations before emerging. Here, we studied sequence evolution within HLA-B27-restricted CTL epitopes in the viral Gag protein during the course of infection of seven HLA-B27-positive patients. Longitudinal gag sequences obtained at different time points around the time of AIDS diagnosis were obtained and analysed for the presence of mutations in epitopes restricted by HLA-B27, and for potential compensatory mutations. Sequence variations were observed in the HLA-B27-restricted CTL epitopes IK9 and DR11, and the immunodominant KK10 epitope. However, the presence of sequence variations in the HLA-B27-restricted CTL epitopes could not be associated with an increase in viraemia in the majority of the patients studied. Furthermore, we observed low genetic diversity in the gag region of the viral variants throughout the course of infection, which is indicative of low viral replication and corresponds to the low viral load observed in the HLA-B27-positive patients. These data indicated that control of viral replication can be maintained in HLA-B27-positive patients despite the emergence of viral mutations in HLA-B27-restricted epitopes.
A versatile palindromic amphipathic repeat coding sequence horizontally distributed among diverse bacterial and eucaryotic microbes

PubMed Central

2010-01-01

Background Intragenic tandem repeats occur throughout all domains of life and impart functional and structural variability to diverse translation products. Repeat proteins confer distinctive surface phenotypes to many unicellular organisms, including those with minimal genomes such as the wall-less bacterial monoderms, Mollicutes. One such repeat pattern in this clade is distributed in a manner suggesting its exchange by horizontal gene transfer (HGT). Expanding genome sequence databases reveal the pattern in a widening range of bacteria, and recently among eucaryotic microbes. We examined the genomic flux and consequences of the motif by determining its distribution, predicted structural features and association with membrane-targeted proteins. Results Using a refined hidden Markov model, we document a 25-residue protein sequence motif tandemly arrayed in variable-number repeats in ORFs lacking assigned functions. It appears sporadically in unicellular microbes from disparate bacterial and eucaryotic clades, representing diverse lifestyles and ecological niches that include host parasitic, marine and extreme environments. Tracts of the repeats predict a malleable configuration of recurring domains, with conserved hydrophobic residues forming an amphipathic secondary structure in which hydrophilic residues endow extensive sequence variation. Many ORFs with these domains also have membrane-targeting sequences that predict assorted topologies; others may comprise reservoirs of sequence variants. We demonstrate expressed variants among surface lipoproteins that distinguish closely related animal pathogens belonging to a subgroup of the Mollicutes. DNA sequences encoding the tandem domains display dyad symmetry. Moreover, in some taxa the domains occur in ORFs selectively associated with mobile elements. These features, a punctate phylogenetic distribution, and different patterns of dispersal in genomes of related taxa, suggest that the repeat may be disseminated by HGT and intra-genomic shuffling. Conclusions We describe novel features of PARCELs (Palindromic Amphipathic Repeat Coding ELements), a set of widely distributed repeat protein domains and coding sequences that were likely acquired through HGT by diverse unicellular microbes, further mobilized and diversified within genomes, and co-opted for expression in the membrane proteome of some taxa. Disseminated by multiple gene-centric vehicles, ORFs harboring these elements enhance accessory gene pools as part of the "mobilome" connecting genomes of various clades, in taxa sharing common niches. PMID:20626840
A phylogenetic analysis using full-length viral genomes of South American dengue serotype 3 in consecutive Venezuelan outbreaks reveals novel NS5 mutation

PubMed Central

Schmidt, DJ; Pickett, BE; Camacho, D; Comach, G; Xhaja, K; Lennon, NJ; Rizzolo, K; de Bosch, N; Becerra, A; Nogueira, ML; Mondini, A; da Silva, EV; Vasconcelos, PF; Muñoz-Jordán, JL; Santiago, GA; Ocazionez, R; Gehrke, L; Lefkowitz, EJ; Birren, BW; Henn, MR; Bosch, I

2013-01-01

Dengue virus currently causes 50-100 million infections annually. Comprehensive knowledge about the evolution of Dengue in response to selection pressure is currently unavailable, but would greatly enhance vaccine design efforts. In the current study, we sequenced 187 new dengue virus serotype 3(DENV-3) genotype III whole genomes isolated from Asia and the Americas. We analyzed them together with previously-sequenced isolates to gain a more detailed understanding of the evolutionary adaptations existing in this prevalent American serotype. In order to analyze the phylogenetic dynamics of DENV-3 during outbreak periods; we incorporated datasets of 48 and 11 sequences spanning two major outbreaks in Venezuela during 2001 and 2007-2008 respectively. Our phylogenetic analysis of newly sequenced viruses shows that subsets of genomes cluster primarily by geographic location, and secondarily by time of virus isolation. DENV-3 genotype III sequences from Asia are significantly divergent from those from the Americas due to their geographical separation and subsequent speciation. We measured amino acid variation for the E protein by calculating the Shannon entropy at each position between Asian and American genomes. We found a cluster of 7 amino acid substitutions having high variability within E protein domain III, which has previously been implicated in serotype-specific neutralization escape mutants. No novel mutations were found in the E protein of sequences isolated during either Venezuelan outbreak. Shannon entropy analysis of the NS5 polymerase mature protein revealed that a G374E mutation, in a region that contributes to interferon resistance in other flaviviruses by interfering with JAK-STAT signaling was present in both the Asian and American sequences from the 2007-2008 Venezuelan outbreak, but was absent in the sequences from the 2001 Venezuelan outbreak. In addition to E, several NS5 amino acid changes were unique to the 2007-2008 epidemic in Venezuela and may give additional insight into the adaptive response of DENV-3 at the population level. PMID:21964598
Positive selection in the SLC11A1 gene in the family Equidae.

PubMed

Bayerova, Zuzana; Janova, Eva; Matiasovic, Jan; Orlando, Ludovic; Horin, Petr

2016-05-01

Immunity-related genes are a suitable model for studying effects of selection at the genomic level. Some of them are highly conserved due to functional constraints and purifying selection, while others are variable and change quickly to cope with the variation of pathogens. The SLC11A1 gene encodes a transporter protein mediating antimicrobial activity of macrophages. Little is known about the patterns of selection shaping this gene during evolution. Although it is a typical evolutionarily conserved gene, functionally important polymorphisms associated with various diseases were identified in humans and other species. We analyzed the genomic organization, genetic variation, and evolution of the SLC11A1 gene in the family Equidae to identify patterns of selection within this important gene. Nucleotide SLC11A1 sequences were shown to be highly conserved in ten equid species, with more than 97 % sequence identity across the family. Single nucleotide polymorphisms (SNPs) were found in the coding and noncoding regions of the gene. Seven codon sites were identified to be under strong purifying selection. Codons located in three regions, including the glycosylated extracellular loop, were shown to be under diversifying selection. A 3-bp indel resulting in a deletion of the amino acid 321 in the predicted protein was observed in all horses, while it has been maintained in all other equid species. This codon comprised in an N-glycosylation site was found to be under positive selection. Interspecific variation in the presence of predicted N-glycosylation sites was observed.
Taxonomic distribution, repeats, and functions of the S1 domain-containing proteins as members of the OB-fold family.

PubMed

Deryusheva, Evgeniia I; Machulin, Andrey V; Selivanova, Olga M; Galzitskaya, Oxana V

2017-04-01

Proteins of the nucleic acid-binding proteins superfamily perform such functions as processing, transport, storage, stretching, translation, and degradation of RNA. It is one of the 16 superfamilies containing the OB-fold in protein structures. Here, we have analyzed the superfamily of nucleic acid-binding proteins (the number of sequences exceeds 200,000) and obtained that this superfamily prevalently consists of proteins containing the cold shock DNA-binding domain (ca. 131,000 protein sequences). Proteins containing the S1 domain compose 57% from the cold shock DNA-binding domain family. Furthermore, we have found that the S1 domain was identified mainly in the bacterial proteins (ca. 83%) compared to the eukaryotic and archaeal proteins, which are available in the UniProt database. We have found that the number of multiple repeats of S1 domain in the S1 domain-containing proteins depends on the taxonomic affiliation. All archaeal proteins contain one copy of the S1 domain, while the number of repeats in the eukaryotic proteins varies between 1 and 15 and correlates with the protein size. In the bacterial proteins, the number of repeats is no more than 6, regardless of the protein size. The large variation of the repeat number of S1 domain as one of the structural variants of the OB-fold is a distinctive feature of S1 domain-containing proteins. Proteins from the other families and superfamilies have either one OB-fold or change slightly the repeat numbers. On the whole, it can be supposed that the repeat number is a vital for multifunctional activity of the S1 domain-containing proteins. Proteins 2017; 85:602-613. © 2016 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Genetic diversity of the merozoite surface protein-3 gene in Plasmodium falciparum populations in Thailand.

PubMed

Pattaradilokrat, Sittiporn; Sawaswong, Vorthon; Simpalipan, Phumin; Kaewthamasorn, Morakot; Siripoon, Napaporn; Harnyuttanakorn, Pongchai

2016-10-21

An effective malaria vaccine is an urgently needed tool to fight against human malaria, the most deadly parasitic disease of humans. One promising candidate is the merozoite surface protein-3 (MSP-3) of Plasmodium falciparum. This antigenic protein, encoded by the merozoite surface protein (msp-3) gene, is polymorphic and classified according to size into the two allelic types of K1 and 3D7. A recent study revealed that both the K1 and 3D7 alleles co-circulated within P. falciparum populations in Thailand, but the extent of the sequence diversity and variation within each allelic type remains largely unknown. The msp-3 gene was sequenced from 59 P. falciparum samples collected from five endemic areas (Mae Hong Son, Kanchanaburi, Ranong, Trat and Ubon Ratchathani) in Thailand and analysed for nucleotide sequence diversity, haplotype diversity and deduced amino acid sequence diversity. The gene was also subject to population genetic analysis (F st ) and neutrality tests (Tajima's D, Fu and Li D* and Fu and Li' F* tests) to determine any signature of selection. The sequence analyses revealed eight unique DNA haplotypes and seven amino acid sequence variants, with a haplotype and nucleotide diversity of 0.828 and 0.049, respectively. Neutrality tests indicated that the polymorphism detected in the alanine heptad repeat region of MSP-3 was maintained by positive diversifying selection, suggesting its role as a potential target of protective immune responses and supporting its role as a vaccine candidate. Comparison of MSP-3 variants among parasite populations in Thailand, India and Nigeria also inferred a close genetic relationship between P. falciparum populations in Asia. This study revealed the extent of the msp-3 gene diversity in P. falciparum in Thailand, providing the fundamental basis for the better design of future blood stage malaria vaccines against P. falciparum.
Within-Host Variations of Human Papillomavirus Reveal APOBEC Signature Mutagenesis in the Viral Genome.

PubMed

Hirose, Yusuke; Onuki, Mamiko; Tenjimbayashi, Yuri; Mori, Seiichiro; Ishii, Yoshiyuki; Takeuchi, Takamasa; Tasaka, Nobutaka; Satoh, Toyomi; Morisada, Tohru; Iwata, Takashi; Miyamoto, Shingo; Matsumoto, Koji; Sekizawa, Akihiko; Kukimoto, Iwao

2018-06-15

Persistent infection with oncogenic human papillomaviruses (HPVs) causes cervical cancer, accompanied by the accumulation of somatic mutations into the host genome. There are concomitant genetic changes in the HPV genome during viral infection; however, their relevance to cervical carcinogenesis is poorly understood. Here, we explored within-host genetic diversity of HPV by performing deep-sequencing analyses of viral whole-genome sequences in clinical specimens. The whole genomes of HPV types 16, 52, and 58 were amplified by type-specific PCR from total cellular DNA of cervical exfoliated cells collected from patients with cervical intraepithelial neoplasia (CIN) and invasive cervical cancer (ICC) and were deep sequenced. After constructing a reference viral genome sequence for each specimen, nucleotide positions showing changes with >0.5% frequencies compared to the reference sequence were determined for individual samples. In total, 1,052 positions of nucleotide variations were detected in HPV genomes from 151 samples (CIN1, n = 56; CIN2/3, n = 68; ICC, n = 27), with various numbers per sample. Overall, C-to-T and C-to-A substitutions were the dominant changes observed across all histological grades. While C-to-T transitions were predominantly detected in CIN1, their prevalence was decreased in CIN2/3 and fell below that of C-to-A transversions in ICC. Analysis of the trinucleotide context encompassing substituted bases revealed that TpCpN, a preferred target sequence for cellular APOBEC cytosine deaminases, was a primary site for C-to-T substitutions in the HPV genome. These results strongly imply that the APOBEC proteins are drivers of HPV genome mutation, particularly in CIN1 lesions. IMPORTANCE HPVs exhibit surprisingly high levels of genetic diversity, including a large repertoire of minor genomic variants in each viral genotype. Here, by conducting deep-sequencing analyses, we show for the first time a comprehensive snapshot of the within-host genetic diversity of high-risk HPVs during cervical carcinogenesis. Quasispecies harboring minor nucleotide variations in viral whole-genome sequences were extensively observed across different grades of CIN and cervical cancer. Among the within-host variations, C-to-T transitions, a characteristic change mediated by cellular APOBEC cytosine deaminases, were predominantly detected throughout the whole viral genome, most strikingly in low-grade CIN lesions. The results strongly suggest that within-host variations of the HPV genome are primarily generated through the interaction with host cell DNA-editing enzymes and that such within-host variability is an evolutionary source of the genetic diversity of HPVs. Copyright © 2018 American Society for Microbiology.
Comprehensive Analysis of Non-Synonymous Natural Variants of G Protein-Coupled Receptors.

PubMed

Kim, Hee Ryung; Duc, Nguyen Minh; Chung, Ka Young

2018-03-01

G protein-coupled receptors (GPCRs) are the largest superfamily of transmembrane receptors and have vital signaling functions in various organs. Because of their critical roles in physiology and pathology, GPCRs are the most commonly used therapeutic target. It has been suggested that GPCRs undergo massive genetic variations such as genetic polymorphisms and DNA insertions or deletions. Among these genetic variations, non-synonymous natural variations change the amino acid sequence and could thus alter GPCR functions such as expression, localization, signaling, and ligand binding, which may be involved in disease development and altered responses to GPCR-targeting drugs. Despite the clinical importance of GPCRs, studies on the genotype-phenotype relationship of GPCR natural variants have been limited to a few GPCRs such as β-adrenergic receptors and opioid receptors. Comprehensive understanding of non-synonymous natural variations within GPCRs would help to predict the unknown genotype-phenotype relationship and yet-to-be-discovered natural variants. Here, we analyzed the non-synonymous natural variants of all non-olfactory GPCRs available from a public database, UniProt. The results suggest that non-synonymous natural variations occur extensively within the GPCR superfamily especially in the N-terminus and transmembrane domains. Within the transmembrane domains, natural variations observed more frequently in the conserved residues, which leads to disruption of the receptor function. Our analysis also suggests that only few non-synonymous natural variations have been studied in efforts to link the variations with functional consequences.
Tracing outbreaks of Streptococcus equi infection (strangles) in horses using sequence variation in the seM gene and pulsed-field gel electrophoresis.

PubMed

Lindahl, Susanne; Söderlund, Robert; Frosth, Sara; Pringle, John; Båverud, Viveca; Aspán, Anna

2011-11-21

Strangles is a serious respiratory disease in horses caused by Streptococcus equi subspecies equi (S. equi). Transmission of the disease occurs by direct contact with an infected horse or contaminated equipment. Genetically, S. equi strains are highly homogenous and differentiation of strains has proven difficult. However, the S. equi M-protein SeM contains a variable N-terminal region and has been proposed as a target gene to distinguish between different strains of S. equi and determine the source of an outbreak. In this study, strains of S. equi (n=60) from 32 strangles outbreaks in Sweden during 1998-2003 and 2008-2009 were genetically characterized by sequencing the SeM protein gene (seM), and by pulsed-field gel electrophoresis (PFGE). Swedish strains belonged to 10 different seM types, of which five have not previously been described. Most were identical or highly similar to allele types from strangles outbreaks in the UK. Outbreaks in 2008/2009 sharing the same seM type were associated by geographic location and/or type of usage of the horses (racing stables). Sequencing of the seM gene generally agreed with pulsed-field gel electrophoresis profiles. Our data suggest that seM sequencing as a epidemiological tool is supported by the agreement between seM and PFGE and that sequencing of the SeM protein gene is more sensitive than PFGE in discriminating strains of S. equi. Copyright © 2011 Elsevier B.V. All rights reserved.
Sequence Analysis of Different Domains of Plasmodium vivax Apical Membrane Antigen (PvAMA-1 gene) Locus in Iran.

PubMed

Motevalli Haghi, A; Nateghpour, M; Edrissian, Ghh; Sepehrizadeh, Z; Mohebali, M; Khoramizade, Mr; Shahrbabak, S Sabouri; Moghimi, H

2012-01-01

Plasmodium vivax is responsible for approximately 80 million malaria cases in the world. Apical membrane antigen1 (AMA-1) is a type I integral membrane protein present in all Plasmodium species. AMA-1 interferes in critical steps of invasion of human hepatocytes by sporozoites and red blood cells by merozoites and is one of the most immunodominant antigens for eliciting a protective immune response in human. It is considered as a promising antigen for inclusion in a vaccine against P. vivax. Since more knowledge is needed to lighten the scope of such antigen we compared genetic variation in P. vivax AMA-1from an Iranian isolate with those reported from some of the other malarious countries so far. P. vivax genomic DNA was extracted from the whole blood of an Iranian patient with patent P. vivax infection. The nucleotide sequence for 446 amino acid (AA) residues (42-488 of PvAMA-1) was amplified by PCR and cloned in pUC19 vector for sequencing. Sequence analysis of the antigen showed a high degree of identity (99%) with strong homology to the PvAMA-1 gene of P. vivax S3 and SKO814 isolates from India and Korea (Asian isolates) respectively, and 96% similarity with P. vivax Sal-1 AMA-1 gene from El Salvador. We cloned and characterized three domains of PvAMA-1 gene from an Iranian patient. Predicted protein sequence of this gene showed some discrepancies in corresponding protein in comparing with similar genes reported from other malarious countries.

Characterizing the genetic diversity of the monkey malaria parasite Plasmodium cynomolgi

PubMed Central

Sutton, Patrick L.; Luo, Zunping; Divis, Paul C. S.; Friedrich, Volney K.; Conway, David J.; Singh, Balbir; Barnwell, John W.; Carlton, Jane M.; Sullivan, Steven A.

2016-01-01

Plasmodium cynomolgi is a malaria parasite that typically infects Asian macaque monkeys, and humans on rare occasions. P. cynomolgi serves as a model system for the human malaria parasite Plasmodium vivax, with which it shares such important biological characteristics as formation of a dormant liver stage and a preference to invade reticulocytes. While genomes of three P. cynomolgi strains have been sequenced, genetic diversity of P. cynomolgi has not been widely investigated. To address this we developed the first panel of P. cynomolgi microsatellite markers to genotype eleven P. cynomolgi laboratory strains and 18 field isolates from Sarawak, Malaysian Borneo. We found diverse genotypes among most of the laboratory strains, though two nominally different strains were found to be genetically identical, We also investigated sequence polymorphism in two erythrocyte invasion gene families, the reticulocyte binding protein and Duffy binding protein genes, in these strains. We also observed copy number variation in rbp genes. PMID:26980604
RNA Crystallization

NASA Technical Reports Server (NTRS)

Golden, Barbara L.; Kundrot, Craig E.

2003-01-01

RNA molecules may be crystallized using variations of the methods developed for protein crystallography. As the technology has become available to syntheisize and purify RNA molecules in the quantities and with the quality that is required for crystallography, the field of RNA structure has exploded. The first consideration when crystallizing an RNA is the sequence, which may be varied in a rational way to enhance crystallizability or prevent formation of alternate structures. Once a sequence has been designed, the RNA may be synthesized chemically by solid-state synthesis, or it may be produced enzymatically using RNA polymerase and an appropriate DNA template. Purification of milligram quantities of RNA can be accomplished by HPLC or gel electrophoresis. As with proteins, crystallization of RNA is usually accomplished by vapor diffusion techniques. There are several considerations that are either unique to RNA crystallization or more important for RNA crystallization. Techniques for design, synthesis, purification, and crystallization of RNAs will be reviewed here.
A polygenic burden of rare disruptive mutations in schizophrenia

PubMed Central

Purcell, Shaun M.; Moran, Jennifer L.; Fromer, Menachem; Ruderfer, Douglas; Solovieff, Nadia; Roussos, Panos; O’Dushlaine, Colm; Chambert, Kimberly; Bergen, Sarah E.; Kähler, Anna; Duncan, Laramie; Stahl, Eli; Genovese, Giulio; Fernández, Esperanza; Collins, Mark O; Komiyama, Noboru H.; Choudhary, Jyoti S.; Magnusson, Patrik K. E.; Banks, Eric; Shakir, Khalid; Garimella, Kiran; Fennell, Tim; de Pristo, Mark; Grant, Seth G.N.; Haggarty, Stephen; Gabriel, Stacey; Scolnick, Edward M.; Lander, Eric S.; Hultman, Christina; Sullivan, Patrick F.; McCarroll, Steven A.; Sklar, Pamela

2014-01-01

By analyzing the exome sequences of 2,536 schizophrenia cases and 2,543 controls, we have demonstrated a polygenic burden primarily arising from rare (<1/10,000), disruptive mutations distributed across many genes. Especially enriched genesets included the voltage-gated calcium ion channel and the signaling complex formed by the activity-regulated cytoskeleton-associated (ARC) scaffold protein of the postsynaptic density (PSD), sets previously implicated by genome-wide association studies (GWAS) and copy-number variation (CNV) studies. Similar to reports in autism, targets of the fragile × mental retardation protein (FMRP, product of FMR1) were enriched for case mutations. No individual gene-based test achieved significance after correction for multiple testing and we did not detect any alleles of moderately low frequency (~0.5-1%) and moderately large effect. Taken together, these data suggest that population-based exome sequencing can discover risk alleles and complements established gene mapping paradigms in neuropsychiatric disease. PMID:24463508
Sequence-dependent DNA flexibility mediates DNase I cleavage.

PubMed

Heddi, Brahim; Abi-Ghanem, Josephine; Lavigne, Marc; Hartmann, Brigitte

2010-01-08

Understanding the preference of nonspecific proteins for certain DNA structural features requires an accurate description of the properties of free DNA, especially regarding their possible predisposition to adopt a conformation that favors the formation of a complex. Exploiting previous exhaustive NMR studies performed on free DNA oligomers, we investigated the molecular basis of DNase I sensitivity under conditions where DNase I binding limits the probability of cleavage. We showed that cleavage intensity was correlated with adjacent 3' phosphate linkage flexibility, monitored by (31)P chemical shifts. Examining NMR-refined DNA structures highlighted that sequence-dependent flexible phosphates were associated with large minor groove variations that may promote the affinity of DNase I, according to relevant DNA-protein complexes. In sum, this work demonstrates that specificity in DNA-DNase I interaction is mediated by DNA flexibility, which influences the induced-fit transitions required to form productive complexes.
Identification of a member of the catalase multigene family on wheat chromosome 7A associated with flour b* colour and biological significance of allelic variation.

PubMed

Li, Dora A; Walker, Esther; Francki, Michael G

2015-12-01

Carotenoids (especially lutein) are known to be the pigment source for flour b* colour in bread wheat. Flour b* colour variation is controlled by a quantitative trait locus (QTL) on wheat chromosome 7AL and one gene from the carotenoid pathway, phytoene synthase, was functionally associated with the QTL on 7AL in some, but not all, wheat genotypes. A SNP marker within a sequence similar to catalase (Cat3-A1snp) derived from full-length (FL) cDNA (AK332460), however, was consistently associated with the QTL on 7AL and implicated in regulating hydrogen peroxide (H2O2) to control carotenoid accumulation affecting flour b* colour. The number of catalase genes on chromosome 7AL was investigated in this study to identify which gene may be implicated in flour b* variation and two were identified through interrogation of the draft wheat genome survey sequence consisting of five exons and a further two members having eight exons identified through comparative analysis with the single catalase gene on rice chromosome 6, PCR amplification and sequencing. It was evident that the catalase genes on chromosome 7A had duplicated and diverged during evolution relative to its counterpart on rice chromosome 6. The detection of transcripts in seeds, the co-location with Cat3-A1snp marker and maximised alignment of FL-cDNA (AK332460) with cognate genomic sequence indicated that TaCat3-A1 was the member of the catalase gene family associated with flour b* colour variation. Re-sequencing identified three alleles from three wheat varieties, TaCat3-A1a, TaCat3-A1b and TaCat3-A1c, and their predicted protein identified differences in peroxisomal targeting signal tri-peptide domain in the carboxyl terminal end providing new insights into their potential role in regulating cellular H2O2 that contribute to flour b* colour variation.
PDBFlex: exploring flexibility in protein structures

PubMed Central

Hrabe, Thomas; Li, Zhanwen; Sedova, Mayya; Rotkiewicz, Piotr; Jaroszewski, Lukasz; Godzik, Adam

2016-01-01

The PDBFlex database, available freely and with no login requirements at http://pdbflex.org, provides information on flexibility of protein structures as revealed by the analysis of variations between depositions of different structural models of the same protein in the Protein Data Bank (PDB). PDBFlex collects information on all instances of such depositions, identifying them by a 95% sequence identity threshold, performs analysis of their structural differences and clusters them according to their structural similarities for easy analysis. The PDBFlex contains tools and viewers enabling in-depth examination of structural variability including: 2D-scaling visualization of RMSD distances between structures of the same protein, graphs of average local RMSD in the aligned structures of protein chains, graphical presentation of differences in secondary structure and observed structural disorder (unresolved residues), difference distance maps between all sets of coordinates and 3D views of individual structures and simulated transitions between different conformations, the latter displayed using JSMol visualization software. PMID:26615193
Structures of membrane proteins

PubMed Central

Vinothkumar, Kutti R.; Henderson, Richard

2010-01-01

In reviewing the structures of membrane proteins determined up to the end of 2009, we present in words and pictures the most informative examples from each family. We group the structures together according to their function and architecture to provide an overview of the major principles and variations on the most common themes. The first structures, determined 20 years ago, were those of naturally abundant proteins with limited conformational variability, and each membrane protein structure determined was a major landmark. With the advent of complete genome sequences and efficient expression systems, there has been an explosion in the rate of membrane protein structure determination, with many classes represented. New structures are published every month and more than 150 unique membrane protein structures have been determined. This review analyses the reasons for this success, discusses the challenges that still lie ahead, and presents a concise summary of the key achievements with illustrated examples selected from each class. PMID:20667175
Variation in cooking and eating quality traits in Japanese rice germplasm accessions

PubMed Central

Hori, Kiyosumi; Suzuki, Keitaro; Iijima, Ken; Ebana, Kaworu

2016-01-01

The eating quality of cooked rice is important and determines its market price and consumer acceptance. To comprehensively describe the variation of eating quality in 183 rice germplasm accessions, we evaluated 33 eating-quality traits including amylose and protein contents, pasting properties of rice flour, and texture of cooked rice grains. All eating-quality traits varied widely in the germplasm accessions. Principal-components analysis (PCA) revealed that allelic differences in the Wx gene explained the largest proportion of phenotypic variation of the eating-quality traits. In 146 accessions of non-glutinous temperate japonica rice, PCA revealed that protein content and surface texture of the cooked rice grains significantly explained phenotypic variations of the eating-quality traits. An allelic difference based on simple sequence repeats, which was located near a quantitative trait locus (QTL) on the short arm of chromosome 3, was associated with differences in the eating quality of non-glutinous temperate japonica rice. These results suggest that eating quality is controlled by genetic factors, including the Wx gene and the QTL on chromosome 3, in Japanese rice accessions. These genetic factors have been consciously selected for eating quality during rice breeding programs in Japan. PMID:27162502
Variation in cooking and eating quality traits in Japanese rice germplasm accessions.

PubMed

Hori, Kiyosumi; Suzuki, Keitaro; Iijima, Ken; Ebana, Kaworu

2016-03-01

The eating quality of cooked rice is important and determines its market price and consumer acceptance. To comprehensively describe the variation of eating quality in 183 rice germplasm accessions, we evaluated 33 eating-quality traits including amylose and protein contents, pasting properties of rice flour, and texture of cooked rice grains. All eating-quality traits varied widely in the germplasm accessions. Principal-components analysis (PCA) revealed that allelic differences in the Wx gene explained the largest proportion of phenotypic variation of the eating-quality traits. In 146 accessions of non-glutinous temperate japonica rice, PCA revealed that protein content and surface texture of the cooked rice grains significantly explained phenotypic variations of the eating-quality traits. An allelic difference based on simple sequence repeats, which was located near a quantitative trait locus (QTL) on the short arm of chromosome 3, was associated with differences in the eating quality of non-glutinous temperate japonica rice. These results suggest that eating quality is controlled by genetic factors, including the Wx gene and the QTL on chromosome 3, in Japanese rice accessions. These genetic factors have been consciously selected for eating quality during rice breeding programs in Japan.
The Glaciozyma antarctica genome reveals an array of systems that provide sustained responses towards temperature variations in a persistently cold habitat

PubMed Central

Hashim, Noor Haza Fazlin; Bharudin, Izwan; Abu Bakar, Mohd Faizal; Huang, Kie Kyon; Alias, Halimah; Lee, Bernard K. B.; Mat Isa, Mohd Noor; Mat-Sharani, Shuhaila; Sulaiman, Suhaila; Tay, Lih Jinq; Zolkefli, Radziah; Muhammad Noor, Yusuf; Law, Douglas Sie Nguong; Abdul Rahman, Siti Hamidah; Md-Illias, Rosli; Abu Bakar, Farah Diba; Najimudin, Nazalan; Abdul Murad, Abdul Munir; Mahadi, Nor Muhammad

2018-01-01

Extremely low temperatures present various challenges to life that include ice formation and effects on metabolic capacity. Psyhcrophilic microorganisms typically have an array of mechanisms to enable survival in cold temperatures. In this study, we sequenced and analysed the genome of a psychrophilic yeast isolated in the Antarctic region, Glaciozyma antarctica. The genome annotation identified 7857 protein coding sequences. From the genome sequence analysis we were able to identify genes that encoded for proteins known to be associated with cold survival, in addition to annotating genes that are unique to G. antarctica. For genes that are known to be involved in cold adaptation such as anti-freeze proteins (AFPs), our gene expression analysis revealed that they were differentially transcribed over time and in response to different temperatures. This indicated the presence of an array of adaptation systems that can respond to a changing but persistent cold environment. We were also able to validate the activity of all the AFPs annotated where the recombinant AFPs demonstrated anti-freeze capacity. This work is an important foundation for further collective exploration into psychrophilic microbiology where among other potential, the genes unique to this species may represent a pool of novel mechanisms for cold survival. PMID:29385175
The NS3 proteins of global strains of bluetongue virus evolve into regional topotypes through negative (purifying) selection.

PubMed

Balasuriya, U B R; Nadler, S A; Wilson, W C; Pritchard, L I; Smythe, A B; Savini, G; Monaco, F; De Santis, P; Zhang, N; Tabachnick, W J; Maclachlan, N J

2008-01-01

Comparison of the deduced amino acid sequences of the genes (S10) encoding the NS3 protein of 137 strains of bluetongue virus (BTV) from Africa, the Americas, Asia, Australia and the Mediterranean Basin showed limited variation. Common to all NS3 sequences were potential glycosylation sites at amino acid residues 63 and 150 and a cysteine at residue 137, whereas a cysteine at residue 181 was not conserved. The PPXY and PS/TAP late-domain motifs were conserved in all but three of the viruses. Phylogenetic analyses of these same sequences yielded two principal clades that grouped the viruses irrespective of their serotype or year of isolation (1900-2003). All viruses from Asia and Australia were grouped in one clade, whereas those from the other regions were present in both clades. Each clade segregated into distinct subclades that included viruses from single or multiple regions, and the S10 genes of some field viruses were identical to those of live-attenuated BTV vaccines. There was no evidence of positive selection on the S10 gene as assessed by reconstruction of ancestral codon states on the phylogeny, rather the functional constraints of the NS3 protein are expressed through substantial negative (purifying) selection.
Protein function in precision medicine: deep understanding with machine learning.

PubMed

Rost, Burkhard; Radivojac, Predrag; Bromberg, Yana

2016-08-01

Precision medicine and personalized health efforts propose leveraging complex molecular, medical and family history, along with other types of personal data toward better life. We argue that this ambitious objective will require advanced and specialized machine learning solutions. Simply skimming some low-hanging results off the data wealth might have limited potential. Instead, we need to better understand all parts of the system to define medically relevant causes and effects: how do particular sequence variants affect particular proteins and pathways? How do these effects, in turn, cause the health or disease-related phenotype? Toward this end, deeper understanding will not simply diffuse from deeper machine learning, but from more explicit focus on understanding protein function, context-specific protein interaction networks, and impact of variation on both. © 2016 Federation of European Biochemical Societies.
Analysis of 16S-23S rRNA intergenic spacer regions of Vibrio cholerae and Vibrio mimicus.

PubMed

Chun, J; Huq, A; Colwell, R R

1999-05-01

Vibrio cholerae identification based on molecular sequence data has been hampered by a lack of sequence variation from the closely related Vibrio mimicus. The two species share many genes coding for proteins, such as ctxAB, and show almost identical 16S DNA coding for rRNA (rDNA) sequences. Primers targeting conserved sequences flanking the 3' end of the 16S and the 5' end of the 23S rDNAs were used to amplify the 16S-23S rRNA intergenic spacer regions of V. cholerae and V. mimicus. Two major (ca. 580 and 500 bp) and one minor (ca. 750 bp) amplicons were consistently generated for both species, and their sequences were determined. The largest fragment contains three tRNA genes (tDNAs) coding for tRNAGlu, tRNALys, and tRNAVal, which has not previously been found in bacteria examined to date. The 580-bp amplicon contained tDNAIle and tDNAAla, whereas the 500-bp fragment had single tDNA coding either tRNAGlu or tRNAAla. Little variation, i.e., 0 to 0.4%, was found among V. cholerae O1 classical, O1 El Tor, and O139 epidemic strains. Slightly more variation was found against the non-O1/non-O139 serotypes (ca. 1% difference) and V. mimicus (2 to 3% difference). A pair of oligonucleotide primers were designed, based on the region differentiating all of V. cholerae strains from V. mimicus. The PCR system developed was subsequently evaluated by using representatives of V. cholerae from environmental and clinical sources, and of other taxa, including V. mimicus. This study provides the first molecular tool for identifying the species V. cholerae.
Duplication and concerted evolution of MiSp-encoding genes underlie the material properties of minor ampullate silks of cobweb weaving spiders.

PubMed

Vienneau-Hathaway, Jannelle M; Brassfield, Elizabeth R; Lane, Amanda Kelly; Collin, Matthew A; Correa-Garhwal, Sandra M; Clarke, Thomas H; Schwager, Evelyn E; Garb, Jessica E; Hayashi, Cheryl Y; Ayoub, Nadia A

2017-03-14

Orb-web weaving spiders and their relatives use multiple types of task-specific silks. The majority of spider silk studies have focused on the ultra-tough dragline silk synthesized in major ampullate glands, but other silk types have impressive material properties. For instance, minor ampullate silks of orb-web weaving spiders are as tough as draglines, due to their higher extensibility despite lower strength. Differences in material properties between silk types result from differences in their component proteins, particularly members of the spidroin (spider fibroin) gene family. However, the extent to which variation in material properties within a single silk type can be explained by variation in spidroin sequences is unknown. Here, we compare the minor ampullate spidroins (MiSp) of orb-weavers and cobweb weavers. Orb-web weavers use minor ampullate silk to form the auxiliary spiral of the orb-web while cobweb weavers use it to wrap prey, suggesting that selection pressures on minor ampullate spidroins (MiSp) may differ between the two groups. We report complete or nearly complete MiSp sequences from five cobweb weaving spider species and measure material properties of minor ampullate silks in a subset of these species. We also compare MiSp sequences and silk properties of our cobweb weavers to published data for orb-web weavers. We demonstrate that all our cobweb weavers possess multiple MiSp loci and that one locus is more highly expressed in at least two species. We also find that the proportion of β-spiral-forming amino acid motifs in MiSp positively correlates with minor ampullate silk extensibility across orb-web and cobweb weavers. MiSp sequences vary dramatically within and among spider species, and have likely been subject to multiple rounds of gene duplication and concerted evolution, which have contributed to the diverse material properties of minor ampullate silks. Our sequences also provide templates for recombinant silk proteins with tailored properties.
Sequence variation in the env gene of simian immunodeficiency virus recovered from immunized macaques is predominantly in the V1 region.

PubMed

Almond, N; Jenkins, A; Heath, A B; Kitchin, P

1993-05-01

Three cynomolgus macaques were immunized with recombinant envelope protein preparations derived from simian immunodeficiency virus (SIV). Although humoral and cellular responses were elicited by the immunization regime, all macaques became infected upon challenge with 10 MID50 of the 11/88 virus challenge stock of SIVmac251-32H. The polymerase chain reaction was used to amplify proviral SIV gp120 sequences present in the blood of both immunized and control macaques at 2 months post-infection. A comparison of the predominant sequences found in the region from V2 to V5 of gp120 failed to differentiate provirus recovered from either immunized or control animals. A detailed investigation of sequences obtained from the hypervariable V1 region identified a mixture of sequences in both immunized and control macaques. Some sequences were identical to those previously detected in the virus challenge stock, whereas others had not been detected previously. Phenogram analysis of the new V1 sequences found in immunized animals revealed that they were quite distinct from those from the virus challenge stock and that they included alterations to potential N-linked glycosylation sites. In contrast, new sequence variants recovered from the control animals were closely related to sequences from the virus challenge stock. The difference in diversity of new V1 sequences recovered from immunized and control macaques was highly significant (P < 0.001). Thus, the presence of pre-existing immune responses to SIV envelope protein is associated with greater genetic change in the V1 region of gp120. These data are discussed in relation to the epitopes of SIV gp120 that may confer protection from in vivo challenge.
Rapid evolution of cis-regulatory sequences via local point mutations

NASA Technical Reports Server (NTRS)

Stone, J. R.; Wray, G. A.

2001-01-01

Although the evolution of protein-coding sequences within genomes is well understood, the same cannot be said of the cis-regulatory regions that control transcription. Yet, changes in gene expression are likely to constitute an important component of phenotypic evolution. We simulated the evolution of new transcription factor binding sites via local point mutations. The results indicate that new binding sites appear and become fixed within populations on microevolutionary timescales under an assumption of neutral evolution. Even combinations of two new binding sites evolve very quickly. We predict that local point mutations continually generate considerable genetic variation that is capable of altering gene expression.
The genetic basis of adaptive pigmentation variation in Drosophila melanogaster

PubMed Central

Pool, John E.; Aquadro, Charles F.

2009-01-01

In a broad survey of Drosophila melanogaster population samples, levels of abdominal pigmentation were found to be highly variable and geographically differentiated. A strong positive correlation was found between dark pigmentation and high altitude, suggesting adaptation to specific environments. DNA sequence polymorphism at the candidate gene ebony revealed a clear association with the pigmentation of homozygous third chromosome lines. The darkest lines sequenced had nearly identical haplotypes spanning 14.5 kilobases upstream of the protein-coding exons of ebony. Thus, natural selection may have elevated the frequency of an allele that confers dark abdominal pigmentation by influencing the regulation of ebony. PMID:17614900
Genome-wide Association Study Identifies Loci for the Polled Phenotype in Yak

PubMed Central

Wu, Xiaoyun; Wang, Kun; Ding, Xuezhi; Wang, Mingcheng; Chu, Min; Xie, Xiuyue; Qiu, Qiang; Yan, Ping

2016-01-01

The absence of horns, known as the polled phenotype, is an economically important trait in modern yak husbandry, but the genomic structure and genetic basis of this phenotype have yet to be discovered. Here, we conducted a genome-wide association study with a panel of 10 horned and 10 polled yaks using whole genome sequencing. We mapped the POLLED locus to a 200-kb interval, which comprises three protein-coding genes. Further characterization of the candidate region showed recent artificial selection signals resulting from the breeding process. We suggest that expressional variations rather than structural variations in protein probably contribute to the polled phenotype. Our results not only represent the first and important step in establishing the genomic structure of the polled region in yak, but also add to our understanding of the polled trait in bovid species. PMID:27389700
Structural test of the parameterized-backbone method for protein design.

PubMed

Plecs, Joseph J; Harbury, Pehr B; Kim, Peter S; Alber, Tom

2004-09-03

Designing new protein folds requires a method for simultaneously optimizing the conformation of the backbone and the side-chains. One approach to this problem is the use of a parameterized backbone, which allows the systematic exploration of families of structures. We report the crystal structure of RH3, a right-handed, three-helix coiled coil that was designed using a parameterized backbone and detailed modeling of core packing. This crystal structure was determined using another rationally designed feature, a metal-binding site that permitted experimental phasing of the X-ray data. RH3 adopted the intended fold, which has not been observed previously in biological proteins. Unanticipated structural asymmetry in the trimer was a principal source of variation within the RH3 structure. The sequence of RH3 differs from that of a previously characterized right-handed tetramer, RH4, at only one position in each 11 amino acid sequence repeat. This close similarity indicates that the design method is sensitive to the core packing interactions that specify the protein structure. Comparison of the structures of RH3 and RH4 indicates that both steric overlap and cavity formation provide strong driving forces for oligomer specificity.
Identification of avian wax synthases

PubMed Central

2012-01-01

Background Bird species show a high degree of variation in the composition of their preen gland waxes. For instance, galliform birds like chicken contain fatty acid esters of 2,3-alkanediols, while Anseriformes like goose or Strigiformes like barn owl contain wax monoesters in their preen gland secretions. The final biosynthetic step is catalyzed by wax synthases (WS) which have been identified in pro- and eukaryotic organisms. Results Sequence similarities enabled us to identify six cDNAs encoding putative wax synthesizing proteins in chicken and two from barn owl and goose. Expression studies in yeast under in vivo and in vitro conditions showed that three proteins from chicken performed WS activity while a sequence from chicken, goose and barn owl encoded a bifunctional enzyme catalyzing both wax ester and triacylglycerol synthesis. Mono- and bifunctional WS were found to differ in their substrate specificities especially with regard to branched-chain alcohols and acyl-CoA thioesters. According to the expression patterns of their transcripts and the properties of the enzymes, avian WS proteins might not be confined to preen glands. Conclusions We provide direct evidence that avian preen glands possess both monofunctional and bifunctional WS proteins which have different expression patterns and WS activities with different substrate specificities. PMID:22305293

MALDI Top-Down sequencing: calling N- and C-terminal protein sequences with high confidence and speed.

PubMed

Suckau, Detlev; Resemann, Anja

2009-12-01

The ability to match Top-Down protein sequencing (TDS) results by MALDI-TOF to protein sequences by classical protein database searching was evaluated in this work. Resulting from these analyses were the protein identity, the simultaneous assignment of the N- and C-termini and protein sequences of up to 70 residues from either terminus. In combination with de novo sequencing using the MALDI-TDS data, even fusion proteins were assigned and the detailed sequence around the fusion site was elucidated. MALDI-TDS allowed to efficiently match protein sequences quickly and to validate recombinant protein structures-in particular, protein termini-on the level of undigested proteins.
Distribution and Evolution of Yersinia Leucine-Rich Repeat Proteins

PubMed Central

Hu, Yueming; Huang, He; Hui, Xinjie; Cheng, Xi; White, Aaron P.

2016-01-01

Leucine-rich repeat (LRR) proteins are widely distributed in bacteria, playing important roles in various protein-protein interaction processes. In Yersinia, the well-characterized type III secreted effector YopM also belongs to the LRR protein family and is encoded by virulence plasmids. However, little has been known about other LRR members encoded by Yersinia genomes or their evolution. In this study, the Yersinia LRR proteins were comprehensively screened, categorized, and compared. The LRR proteins encoded by chromosomes (LRR1 proteins) appeared to be more similar to each other and different from those encoded by plasmids (LRR2 proteins) with regard to repeat-unit length, amino acid composition profile, and gene expression regulation circuits. LRR1 proteins were also different from LRR2 proteins in that the LRR1 proteins contained an E3 ligase domain (NEL domain) in the C-terminal region or an NEL domain-encoding nucleotide relic in flanking genomic sequences. The LRR1 protein-encoding genes (LRR1 genes) varied dramatically and were categorized into 4 subgroups (a to d), with the LRR1a to -c genes evolving from the same ancestor and LRR1d genes evolving from another ancestor. The consensus and ancestor repeat-unit sequences were inferred for different LRR1 protein subgroups by use of a maximum parsimony modeling strategy. Structural modeling disclosed very similar repeat-unit structures between LRR1 and LRR2 proteins despite the different unit lengths and amino acid compositions. Structural constraints may serve as the driving force to explain the observed mutations in the LRR regions. This study suggests that there may be functional variation and lays the foundation for future experiments investigating the functions of the chromosomally encoded LRR proteins of Yersinia. PMID:27217422
Methods and statistics for combining motif match scores.

PubMed

Bailey, T L; Gribskov, M

1998-01-01

Position-specific scoring matrices are useful for representing and searching for protein sequence motifs. A sequence family can often be described by a group of one or more motifs, and an effective search must combine the scores for matching a sequence to each of the motifs in the group. We describe three methods for combining match scores and estimating the statistical significance of the combined scores and evaluate the search quality (classification accuracy) and the accuracy of the estimate of statistical significance of each. The three methods are: 1) sum of scores, 2) sum of reduced variates, 3) product of score p-values. We show that method 3) is superior to the other two methods in both regards, and that combining motif scores indeed gives better search accuracy. The MAST sequence homology search algorithm utilizing the product of p-values scoring method is available for interactive use and downloading at URL http:/(/)www.sdsc.edu/MEME.
Large-scale whole-genome sequencing of the Icelandic population.

PubMed

Gudbjartsson, Daniel F; Helgason, Hannes; Gudjonsson, Sigurjon A; Zink, Florian; Oddson, Asmundur; Gylfason, Arnaldur; Besenbacher, Soren; Magnusson, Gisli; Halldorsson, Bjarni V; Hjartarson, Eirikur; Sigurdsson, Gunnar Th; Stacey, Simon N; Frigge, Michael L; Holm, Hilma; Saemundsdottir, Jona; Helgadottir, Hafdis Th; Johannsdottir, Hrefna; Sigfusson, Gunnlaugur; Thorgeirsson, Gudmundur; Sverrisson, Jon Th; Gretarsdottir, Solveig; Walters, G Bragi; Rafnar, Thorunn; Thjodleifsson, Bjarni; Bjornsson, Einar S; Olafsson, Sigurdur; Thorarinsdottir, Hildur; Steingrimsdottir, Thora; Gudmundsdottir, Thora S; Theodors, Asgeir; Jonasson, Jon G; Sigurdsson, Asgeir; Bjornsdottir, Gyda; Jonsson, Jon J; Thorarensen, Olafur; Ludvigsson, Petur; Gudbjartsson, Hakon; Eyjolfsson, Gudmundur I; Sigurdardottir, Olof; Olafsson, Isleifur; Arnar, David O; Magnusson, Olafur Th; Kong, Augustine; Masson, Gisli; Thorsteinsdottir, Unnur; Helgason, Agnar; Sulem, Patrick; Stefansson, Kari

2015-05-01

Here we describe the insights gained from sequencing the whole genomes of 2,636 Icelanders to a median depth of 20×. We found 20 million SNPs and 1.5 million insertions-deletions (indels). We describe the density and frequency spectra of sequence variants in relation to their functional annotation, gene position, pathway and conservation score. We demonstrate an excess of homozygosity and rare protein-coding variants in Iceland. We imputed these variants into 104,220 individuals down to a minor allele frequency of 0.1% and found a recessive frameshift mutation in MYL4 that causes early-onset atrial fibrillation, several mutations in ABCB4 that increase risk of liver diseases and an intronic variant in GNAS associating with increased thyroid-stimulating hormone levels when maternally inherited. These data provide a study design that can be used to determine how variation in the sequence of the human genome gives rise to human diversity.
Genome-Wide Comparison of Magnaporthe Species Reveals a Host-Specific Pattern of Secretory Proteins and Transposable Elements

PubMed Central

Gowda, Malali

2016-01-01

Blast disease caused by the Magnaporthe species is a major factor affecting the productivity of rice, wheat and millets. This study was aimed at generating genomic information for rice and non-rice Magnaporthe isolates to understand the extent of genetic variation. We have sequenced the whole genome of the Magnaporthe isolates, infecting rice (leaf and neck), finger millet (leaf and neck), foxtail millet (leaf) and buffel grass (leaf). Rice and finger millet isolates infecting both leaf and neck tissues were sequenced, since the damage and yield loss caused due to neck blast is much higher as compared to leaf blast. The genome-wide comparison was carried out to study the variability in gene content, candidate effectors, repeat element distribution, genes involved in carbohydrate metabolism and SNPs. The analysis of repeat element footprints revealed some genes such as naringenin, 2-oxoglutarate 3-dioxygenase being targeted by Pot2 and Occan, in isolates from different host species. Some repeat insertions were host-specific while other insertions were randomly shared between isolates. The distributions of repeat elements, secretory proteins, CAZymes and SNPs showed significant variation across host-specific lineages of Magnaporthe indicating an independent genome evolution orchestrated by multiple genomic factors. PMID:27658241
Understanding the structural and dynamic consequences of DNA epigenetic modifications: Computational insights into cytosine methylation and hydroxymethylation

PubMed Central

Carvalho, Alexandra T P; Gouveia, Leonor; Kanna, Charan Raju; Wärmländer, Sebastian K T S; Platts, Jamie A; Kamerlin, Shina Caroline Lynn

2014-01-01

We report a series of molecular dynamics (MD) simulations of up to a microsecond combined simulation time designed to probe epigenetically modified DNA sequences. More specifically, by monitoring the effects of methylation and hydroxymethylation of cytosine in different DNA sequences, we show, for the first time, that DNA epigenetic modifications change the molecule's dynamical landscape, increasing the propensity of DNA toward different values of twist and/or roll/tilt angles (in relation to the unmodified DNA) at the modification sites. Moreover, both the extent and position of different modifications have significant effects on the amount of structural variation observed. We propose that these conformational differences, which are dependent on the sequence environment, can provide specificity for protein binding. PMID:25625845
Identification of two novel pathogenic compound heterozygous MYO7A mutations in Usher syndrome by whole exome sequencing.

PubMed

Jia, Ying; Li, Xiaoge; Yang, Dong; Xu, Yi; Guo, Ying; Li, Xin

2018-01-01

The current study aims to identify the pathogenic sites in a core pedigree of Usher syndrome (USH). A core pedigree of USH was analyzed by whole exome sequencing (WES). Mutations were verified by polymerase chain reaction (PCR) amplification and Sanger sequencing. Two pathogenic variations (c.849+2T>C and c.5994G>A) in MYO7A were successfully identified and individually separated from parents. One variant (c.849+2T>C) was nonsense mutation, causing the protein terminated in advance, and the other one (c.5994G>A) located near the boundary of exon could cause aberrant splicing. This study provides a meaningful exploration for identification of clinical core genetic pedigrees. Copyright © 2017 Elsevier B.V. All rights reserved.
The mechanical design of spider silks: from fibroin sequence to mechanical function.

PubMed

Gosline, J M; Guerette, P A; Ortlepp, C S; Savage, K N

1999-12-01

Spiders produce a variety of silks, and the cloning of genes for silk fibroins reveals a clear link between protein sequence and structure-property relationships. The fibroins produced in the spider's major ampullate (MA) gland, which forms the dragline and web frame, contain multiple repeats of motifs that include an 8-10 residue long poly-alanine block and a 24-35 residue long glycine-rich block. When fibroins are spun into fibres, the poly-alanine blocks form (&bgr;)-sheet crystals that crosslink the fibroins into a polymer network with great stiffness, strength and toughness. As illustrated by a comparison of MA silks from Araneus diadematus and Nephila clavipes, variation in fibroin sequence and properties between spider species provides the opportunity to investigate the design of these remarkable biomaterials.
Activation of c-jun N-terminal kinase upon influenza A virus (IAV) infection is independent of pathogen-related receptors but dependent on amino acid sequence variations of IAV NS1.

PubMed

Nacken, Wolfgang; Anhlan, Darisuren; Hrincius, Eike R; Mostafa, Ahmed; Wolff, Thorsten; Sadewasser, Anne; Pleschka, Stephan; Ehrhardt, Christina; Ludwig, Stephan

2014-08-01

A hallmark cell response to influenza A virus (IAV) infections is the phosphorylation and activation of c-jun N-terminal kinase (JNK). However, so far it is not fully clear which molecules are involved in the activation of JNK upon IAV infection. Here, we report that the transfection of influenza viral-RNA induces JNK in a retinoic acid-inducible gene I (RIG-I)-dependent manner. However, neither RIG-I-like receptors nor MyD88-dependent Toll-like receptors were found to be involved in the activation of JNK upon IAV infection. Viral JNK activation may be blocked by addition of cycloheximide and heat shock protein inhibitors during infection, suggesting that the expression of an IAV-encoded protein is responsible for JNK activation. Indeed, the overexpression of nonstructural protein 1 (NS1) of certain IAV subtypes activated JNK, whereas those of some other subtypes failed to activate JNK. Site-directed mutagenesis experiments using NS1 of the IAV H7N7, H5N1, and H3N2 subtypes identified the amino acid residue phenylalanine (F) at position 103 to be decisive for JNK activation. Cleavage- and polyadenylation-specific factor 30 (CPSF30), whose binding to NS1 is stabilized by the amino acids F103 and M106, is not involved in JNK activation. Conclusively, subtype-specific sequence variations in the IAV NS1 protein result in subtype-specific differences in JNK signaling upon IAV infection. Influenza A virus (IAV) infection leads to the activation or modulation of multiple signaling pathways. Here, we demonstrate for the first time that the c-jun N-terminal kinase (JNK), a long-known stress-activated mitogen-activated protein (MAP) kinase, is activated by RIG-I when cells are treated with IAV RNA. However, at the same time, nonstructural protein 1 (NS1) of IAV has an intrinsic JNK-activating property that is dependent on IAV subtype-specific amino acid variations around position 103. Our findings identify two different and independent pathways that result in the activation of JNK in the course of an IAV infection. Copyright © 2014, American Society for Microbiology. All Rights Reserved.
Activation of c-jun N-Terminal Kinase upon Influenza A Virus (IAV) Infection Is Independent of Pathogen-Related Receptors but Dependent on Amino Acid Sequence Variations of IAV NS1

PubMed Central

Nacken, Wolfgang; Anhlan, Darisuren; Hrincius, Eike R.; Mostafa, Ahmed; Wolff, Thorsten; Sadewasser, Anne; Pleschka, Stephan; Ehrhardt, Christina

2014-01-01

ABSTRACT A hallmark cell response to influenza A virus (IAV) infections is the phosphorylation and activation of c-jun N-terminal kinase (JNK). However, so far it is not fully clear which molecules are involved in the activation of JNK upon IAV infection. Here, we report that the transfection of influenza viral-RNA induces JNK in a retinoic acid-inducible gene I (RIG-I)-dependent manner. However, neither RIG-I-like receptors nor MyD88-dependent Toll-like receptors were found to be involved in the activation of JNK upon IAV infection. Viral JNK activation may be blocked by addition of cycloheximide and heat shock protein inhibitors during infection, suggesting that the expression of an IAV-encoded protein is responsible for JNK activation. Indeed, the overexpression of nonstructural protein 1 (NS1) of certain IAV subtypes activated JNK, whereas those of some other subtypes failed to activate JNK. Site-directed mutagenesis experiments using NS1 of the IAV H7N7, H5N1, and H3N2 subtypes identified the amino acid residue phenylalanine (F) at position 103 to be decisive for JNK activation. Cleavage- and polyadenylation-specific factor 30 (CPSF30), whose binding to NS1 is stabilized by the amino acids F103 and M106, is not involved in JNK activation. Conclusively, subtype-specific sequence variations in the IAV NS1 protein result in subtype-specific differences in JNK signaling upon IAV infection. IMPORTANCE Influenza A virus (IAV) infection leads to the activation or modulation of multiple signaling pathways. Here, we demonstrate for the first time that the c-jun N-terminal kinase (JNK), a long-known stress-activated mitogen-activated protein (MAP) kinase, is activated by RIG-I when cells are treated with IAV RNA. However, at the same time, nonstructural protein 1 (NS1) of IAV has an intrinsic JNK-activating property that is dependent on IAV subtype-specific amino acid variations around position 103. Our findings identify two different and independent pathways that result in the activation of JNK in the course of an IAV infection. PMID:24872593
RNA sequencing to study gene expression and SNP variations associated with growth in zebrafish fed a plant protein-based diet.

PubMed

Ulloa, Pilar E; Rincón, Gonzalo; Islas-Trejo, Alma; Araneda, Cristian; Iturra, Patricia; Neira, Roberto; Medrano, Juan F

2015-06-01

The objectives of this study were to measure gene expression in zebrafish and then identify SNP to be used as potential markers in a growth association study. We developed an approach where muscle samples collected from low- and high-growth fish were analyzed using RNA-Sequencing (RNA-seq), and SNP were chosen from the genes that were differentially expressed between the low and high groups. A population of 24 families was fed a plant protein-based diet from the larval to adult stages. From a total of 440 males, 5 % of the fish from both tails of the weight gain distribution were selected. Total RNA was extracted from individual muscle of 8 low-growth and 8 high-growth fish. Two pooled RNA-Seq libraries were prepared for each phenotype using 4 fish per library. Libraries were sequenced using the Illumina GAII Sequencer and analyzed using the CLCBio genomic workbench software. One hundred and twenty-four genes were differentially expressed between phenotypes (p value < 0.05 and FDR < 0.2). From these genes, 164 SNP were selected and genotyped in 240 fish samples. Marker-trait analysis revealed 5 SNP associated with growth in key genes (Nars, Lmod2b, Cuzd1, Acta1b, and Plac8l1). These genes are good candidates for further growth studies in fish and to consider for identification of potential SNPs associated with different growth rates in response to a plant protein-based diet.
Mapping Proteoforms and Protein Complexes From King Cobra Venom Using Both Denaturing and Native Top-down Proteomics.

PubMed

Melani, Rafael D; Skinner, Owen S; Fornelli, Luca; Domont, Gilberto B; Compton, Philip D; Kelleher, Neil L

2016-07-01

Characterizing whole proteins by top-down proteomics avoids a step of inference encountered in the dominant bottom-up methodology when peptides are assembled computationally into proteins for identification. The direct interrogation of whole proteins and protein complexes from the venom of Ophiophagus hannah (king cobra) provides a sharply clarified view of toxin sequence variation, transit peptide cleavage sites and post-translational modifications (PTMs) likely critical for venom lethality. A tube-gel format for electrophoresis (called GELFrEE) and solution isoelectric focusing were used for protein fractionation prior to LC-MS/MS analysis resulting in 131 protein identifications (18 more than bottom-up) and a total of 184 proteoforms characterized from 14 protein toxin families. Operating both GELFrEE and mass spectrometry to preserve non-covalent interactions generated detailed information about two of the largest venom glycoprotein complexes: the homodimeric l-amino acid oxidase (∼130 kDa) and the multichain toxin cobra venom factor (∼147 kDa). The l-amino acid oxidase complex exhibited two clusters of multiproteoform complexes corresponding to the presence of 5 or 6 N-glycans moieties, each consistent with a distribution of N-acetyl hexosamines. Employing top-down proteomics in both native and denaturing modes provides unprecedented characterization of venom proteoforms and their complexes. A precise molecular inventory of venom proteins will propel the study of snake toxin variation and the targeted development of new antivenoms or other biotherapeutics. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.
Transcription analysis of peloric mutants of Phalaenopsis orchids derived from tissue culture.

PubMed

Chen, Ya Huei; Tsai, Yi Jung; Huang, Jian Zhi; Chen, Fure Chyi

2005-08-01

Tissue culture has been widely used for mass propagation of Phalaenopsis. However, somaclonal variation occurred during micropropagation process posed a severe problem by affecting product quality. In this study, wild type and peloric flower buds of Phalaenopsis hybrids derived from flower stalk nodal culture were used for cDNA-RAPD and cDNA suppression subtractive hybridization analyses in order to study their genetic difference in terms of expressed sequence tags. A total of 209 ESTs from normal flower buds and 230 from mutants were sequenced. These ESTs sequences can be grouped into several functional categories involved in different cellular processes including metabolism, signal transduction, transcription, cell growth and division, protein synthesis, and protein localization, and into a subcategory of proteins with unknown function. Cymbidium mosaic virus transcript was surprisingly found expressed frequently in the peloric mutant of P. Little Mary. Real-time RT-PCR analysis on selected ESTs showed that in mutant flower buds, a bZIP transcription factor (TGA1a-like protein) was down-regulated, while up-regulated genes include auxin-regulated protein kinase, cyclophilin, and TCP-like genes. A retroelement clone was also preferentially expressed in the peloric mutant flowers. On the other hand, ESTs involved in DNA methylation, chromatin remodeling and post-transcriptional regulation, such as DNA methyltransferase, histone acetyltransferase, ERECTA, and DEAD/DEAH RNA helicase, were enriched in normal flower buds than the mutants. The enriched transcripts in the wild type indicate the down regulation of these transcripts in the mutants, and vice versa. The potential roles of the analyzed transcripts in the development of Phalaenopsis flowers are discussed.
Genome sequence and virulence variation-related transcriptome profiles of Curvularia lunata, an important maize pathogenic fungus.

PubMed

Gao, Shigang; Li, Yaqian; Gao, Jinxin; Suo, Yujuan; Fu, Kehe; Li, Yingying; Chen, Jie

2014-07-24

Curvularia lunata is an important maize foliar fungal pathogen that distributes widely in maize growing area in China. Genome sequencing of the pathogen will provide important information for globally understanding its virulence mechanism. We report the genome sequences of a highly virulent C. lunata strain. Phylogenomic analysis indicates that C. lunata was evolved from Bipolaris maydis (Cochliobolus heterostrophus). The highly virulent strain has a high potential to evolve into other pathogenic stains based on analyses on transposases and repeat-induced point mutations. C. lunata has a smaller proportion of secreted proteins as well as B. maydis than entomopathogenic fungi. C. lunata and B. maydis have a similar proportion of protein-encoding genes highly homologous to experimentally proven pathogenic genes from pathogen-host interaction database. However, relative to B. maydis, C. lunata possesses not only many expanded protein families including MFS transporters, G-protein coupled receptors, protein kinases and proteases for transport, signal transduction or degradation, but also many contracted families including cytochrome P450, lipases, glycoside hydrolases and polyketide synthases for detoxification, hydrolysis or secondary metabolites biosynthesis, which are expected to be crucial for the fungal survival in varied stress environments. Comparative transcriptome analysis between a lowly virulent C. lunata strain and its virulence-increased variant induced by resistant host selection reveals that the virulence increase of the pathogen is related to pathways of toxin and melanin biosynthesis in stress environments, and that the two pathways probably have some overlaps. The data will facilitate a full revelation of pathogenic mechanism and a better understanding of virulence differentiation of C. lunata.
Sequence variations and protein expression levels of the two immune evasion proteins Gpm1 and Pra1 influence virulence of clinical Candida albicans isolates.

PubMed

Luo, Shanshan; Hipler, Uta-Christina; Münzberg, Christin; Skerka, Christine; Zipfel, Peter F

2015-01-01

Candida albicans, the important human fungal pathogen uses multiple evasion strategies to control, modulate and inhibit host complement and innate immune attack. Clinical C. albicans strains vary in pathogenicity and in serum resistance, in this work we analyzed sequence polymorphisms and variations in the expression levels of two central fungal complement evasion proteins, Gpm1 (phosphoglycerate mutase 1) and Pra1 (pH-regulated antigen 1) in thirteen clinical C. albicans isolates. Four nucleotide (nt) exchanges, all representing synonymous exchanges, were identified within the 747-nt long GPM1 gene. For the 900-nt long PRA1 gene, sixteen nucleotide exchanges were identified, which represented synonymous, as well as non-synonymous exchanges. All thirteen clinical isolates had a homozygous exchange (A to G) at position 73 of the PRA1 gene. Surface levels of Gpm1 varied by 8.2, and Pra1 levels by 3.3 fold in thirteen tested isolates and these differences influenced fungal immune fitness. The high Gpm1/Pra1 expressing candida strains bound the three human immune regulators more efficiently, than the low expression strains. The difference was 44% for Factor H binding, 51% for C4BP binding and 23% for plasminogen binding. This higher Gpm1/Pra1 expressing strains result in enhanced survival upon challenge with complement active, Factor H depleted human serum (difference 40%). In addition adhesion to and infection of human endothelial cells was increased (difference 60%), and C3b surface deposition was less effective (difference 27%). Thus, variable expression levels of central immune evasion protein influences immune fitness of the human fungal pathogen C. albicans and thus contribute to fungal virulence.
Comparison of theoretical proteomes: Identification of COGs with conserved and variable pI within the multimodal pI distribution

PubMed Central

Nandi, Soumyadeep; Mehra, Nipun; Lynn, Andrew M; Bhattacharya, Alok

2005-01-01

Background Theoretical proteome analysis, generated by plotting theoretical isoelectric points (pI) against molecular masses of all proteins encoded by the genome show a multimodal distribution for pI. This multimodal distribution is an effect of allowed combinations of the charged amino acids, and not due to evolutionary causes. The variation in this distribution can be correlated to the organisms ecological niche. Contributions to this variation maybe mapped to individual proteins by studying the variation in pI of orthologs across microorganism genomes. Results The distribution of ortholog pI values showed trimodal distributions for all prokaryotic genomes analyzed, similar to whole proteome plots. Pairwise analysis of pI variation show that a few COGs are conserved within, but most vary between, the acidic and basic regions of the distribution, while molecular mass is more highly conserved. At the level of functional grouping of orthologs, five groups vary significantly from the population of orthologs, which is attributed to either conservation at the level of sequences or a bias for either positively or negatively charged residues contributing to the function. Individual COGs conserved in both the acidic and basic regions of the trimodal distribution are identified, and orthologs that best represent the variation in levels of the acidic and basic regions are listed. Conclusion The analysis of pI distribution by using orthologs provides a basis for resolution of theoretical proteome comparison at the level of individual proteins. Orthologs identified that significantly vary between the major acidic and basic regions maybe used as representative of the variation of the entire proteome. PMID:16150155
Longitudinal and Cross-Sectional Genetic Diversity in the Korean Peninsula Based on the P vivax Merozoite Surface Protein Gene.

PubMed

Kim, Jung-Yeon; Suh, Eun-Jung; Yu, Hyo-Soon; Jung, Hyun-Sik; Park, In-Ho; Choi, Yien-Kyeoug; Choi, Kyoung-Mi; Cho, Shin-Hyeong; Lee, Won-Ja

2011-12-01

Vivax malaria has reemerged and become endemic in Korea. Our study aimed to analyze by both longitudinal and cross-sectional genetic diversity of this malaria based on the P vivax Merozoite Surface Protein (PvMSP) gene parasites recently found in the Korean peninsula. PvMSP-1 gene sequence analysis from P vivax isolates (n = 835) during the 1996-2010 period were longitudinally analyzed and the isolates from the Korean peninsula through South Korea, the demilitarized zone and North Korea collected in 2008-2010 were enrolled in an overall analysis of MSP-1 gene diversity. New recombinant subtypes and severe multiple-cloneinfection rates were observed in recent vivax parasites. Regional variation was also observed in the study sites. This study revealed the great complexity of genetic variation and rapid dissemination of genes in P vivax. It also showed interesting patterns of diversity depending, on the region in the Korean Peninsula. Understanding the parasiteninsula. Under genetic variation may help to analyze trends and assess the extent of endemic malaria in Korea.
GENOMIC BASIS OF AGING AND LIFE HISTORY EVOLUTION IN DROSOPHILA MELANOGASTER

PubMed Central

Remolina, Silvia C.; Chang, Peter L.; Leips, Jeff; Nuzhdin, Sergey V.; Hughes, Kimberly A.

2015-01-01

Natural diversity in aging and other life history patterns is a hallmark of organismal variation. Related species, populations, and individuals within populations show genetically based variation in life span and other aspects of age-related performance. Population differences are especially informative because these differences can be large relative to within-population variation and because they occur in organisms with otherwise similar genomes. We used experimental evolution to produce populations divergent for life span and late-age fertility and then used deep genome sequencing to detect sequence variants with nucleotide-level resolution. Several genes and genome regions showed strong signatures of selection, and the same regions were implicated in independent comparisons, suggesting that the same alleles were selected in replicate lines. Genes related to oogenesis, immunity, and protein degradation were implicated as important modifiers of late-life performance. Expression profiling and functional annotation narrowed the list of strong candidate genes to 38, most of which are novel candidates for regulating aging. Life span and early-age fecundity were negatively correlated among populations; therefore the alleles we identified also are candidate regulators of a major life-history trade-off. More generally, we argue that hitchhiking mapping can be a powerful tool for uncovering the molecular bases of quantitative genetic variation. PMID:23106705
Generation and validation of homozygous fluorescent knock-in cells using CRISPR-Cas9 genome editing.

PubMed

Koch, Birgit; Nijmeijer, Bianca; Kueblbeck, Moritz; Cai, Yin; Walther, Nike; Ellenberg, Jan

2018-06-01

Gene tagging with fluorescent proteins is essential for investigations of the dynamic properties of cellular proteins. CRISPR-Cas9 technology is a powerful tool for inserting fluorescent markers into all alleles of the gene of interest (GOI) and allows functionality and physiological expression of the fusion protein. It is essential to evaluate such genome-edited cell lines carefully in order to preclude off-target effects caused by (i) incorrect insertion of the fluorescent protein, (ii) perturbation of the fusion protein by the fluorescent proteins or (iii) nonspecific genomic DNA damage by CRISPR-Cas9. In this protocol, we provide a step-by-step description of our systematic pipeline to generate and validate homozygous fluorescent knock-in cell lines.We have used the paired Cas9D10A nickase approach to efficiently insert tags into specific genomic loci via homology-directed repair (HDR) with minimal off-target effects. It is time-consuming and costly to perform whole-genome sequencing of each cell clone to check for spontaneous genetic variations occurring in mammalian cell lines. Therefore, we have developed an efficient validation pipeline of the generated cell lines consisting of junction PCR, Southern blotting analysis, Sanger sequencing, microscopy, western blotting analysis and live-cell imaging for cell-cycle dynamics. This protocol takes between 6 and 9 weeks. With this protocol, up to 70% of the targeted genes can be tagged homozygously with fluorescent proteins, thus resulting in physiological levels and phenotypically functional expression of the fusion proteins.
Positive selection and propeptide repeats promote rapid interspecific divergence of a gastropod sperm protein.

PubMed

Hellberg, M E; Moy, G W; Vacquier, V D

2000-03-01

Male-specific proteins have increasingly been reported as targets of positive selection and are of special interest because of the role they may play in the evolution of reproductive isolation. We report the rapid interspecific divergence of cDNA encoding a major acrosomal protein of unknown function (TMAP) of sperm from five species of teguline gastropods. A mitochondrial DNA clock (calibrated by congeneric species divided by the Isthmus of Panama) estimates that these five species diverged 2-10 MYA. Inferred amino acid sequences reveal a propeptide that has diverged rapidly between species. The mature protein has diverged faster still due to high nonsynonymous substitution rates (> 25 nonsynonymous substitutions per site per 10(9) years). cDNA encoding the mature protein (89-100 residues) shows evidence of positive selection (Dn/Ds > 1) for 4 of 10 pairwise species comparisons. cDNA and predicted secondary-structure comparisons suggest that TMAP is neither orthologous nor paralogous to abalone lysin, and thus marks a second, phylogenetically independent, protein subject to strong positive selection in free-spawning marine gastropods. In addition, an internal repeat in one species (Tegula aureotincta) produces a duplicated cleavage site which results in two alternatively processed mature proteins differing by nine amino acid residues. Such alternative processing may provide a mechanism for introducing novel amino acid sequence variation at the amino-termini of proteins. Highly divergent TMAP N-termini from two other tegulines (Tegula regina and Norrisia norrisii) may have originated by such a mechanism.

Sequence dependent aggregation of peptides and fibril formation

NASA Astrophysics Data System (ADS)

Hung, Nguyen Ba; Le, Duy-Manh; Hoang, Trinh X.

2017-09-01

Deciphering the links between amino acid sequence and amyloid fibril formation is key for understanding protein misfolding diseases. Here we use Monte Carlo simulations to study the aggregation of short peptides in a coarse-grained model with hydrophobic-polar (HP) amino acid sequences and correlated side chain orientations for hydrophobic contacts. A significant heterogeneity is observed in the aggregate structures and in the thermodynamics of aggregation for systems of different HP sequences and different numbers of peptides. Fibril-like ordered aggregates are found for several sequences that contain the common HPH pattern, while other sequences may form helix bundles or disordered aggregates. A wide variation of the aggregation transition temperatures among sequences, even among those of the same hydrophobic fraction, indicates that not all sequences undergo aggregation at a presumable physiological temperature. The transition is found to be the most cooperative for sequences forming fibril-like structures. For a fibril-prone sequence, it is shown that fibril formation follows the nucleation and growth mechanism. Interestingly, a binary mixture of peptides of an aggregation-prone and a non-aggregation-prone sequence shows the association and conversion of the latter to the fibrillar structure. Our study highlights the role of a sequence in selecting fibril-like aggregates and also the impact of a structural template on fibril formation by peptides of unrelated sequences.
Rare coding variation in paraoxonase-1 is associated with ischemic stroke in the NHLBI Exome Sequencing Project[S

PubMed Central

Kim, Daniel Seung; Crosslin, David R.; Auer, Paul L.; Suzuki, Stephanie M.; Marsillach, Judit; Burt, Amber A.; Gordon, Adam S.; Meschia, James F.; Nalls, Mike A.; Worrall, Bradford B.; Longstreth, W. T.; Gottesman, Rebecca F.; Furlong, Clement E.; Peters, Ulrike; Rich, Stephen S.; Nickerson, Deborah A.; Jarvik, Gail P.

2014-01-01

HDL-associated paraoxonase-1 (PON1) is an enzyme whose activity is associated with cerebrovascular disease. Common PON1 genetic variants have not been consistently associated with cerebrovascular disease. Rare coding variation that likely alters PON1 enzyme function may be more strongly associated with stroke. The National Heart, Lung, and Blood Institute Exome Sequencing Project sequenced the coding regions (exomes) of the genome for heart, lung, and blood-related phenotypes (including ischemic stroke). In this sample of 4,204 unrelated participants, 496 had verified, noncardioembolic ischemic stroke. After filtering, 28 nonsynonymous PON1 variants were identified. Analysis with the sequence kernel association test, adjusted for covariates, identified significant associations between PON1 variants and ischemic stroke (P = 3.01 × 10−3). Stratified analyses demonstrated a stronger association of PON1 variants with ischemic stroke in African ancestry (AA) participants (P = 5.03 × 10−3). Ethnic differences in the association between PON1 variants with stroke could be due to the effects of PON1Val109Ile (overall P = 7.88 × 10−3; AA P = 6.52 × 10−4), found at higher frequency in AA participants (1.16% vs. 0.02%) and whose protein is less stable than the common allele. In summary, rare genetic variation in PON1 was associated with ischemic stroke, with stronger associations identified in those of AA. Increased focus on PON1 enzyme function and its role in cerebrovascular disease is warranted. PMID:24711634
Differential display detects host nucleic acid motifs altered in scrapie-infected brain.

PubMed

Lathe, Richard; Harris, Alyson

2009-09-25

The transmissible spongiform encephalopathies (TSEs) including scrapie have been attributed to an infectious protein or prion. Infectivity is allied to conversion of the endogenous nucleic-acid-binding protein PrP to an infectious modified form known as PrP(sc). The protein-only theory does not easily explain the enigmatic properties of the agent including strain variation. It was previously suggested that a short nucleic acid, perhaps host-encoded, might contribute to the pathoetiology of the TSEs. No candidate host molecules that might explain transmission of strain differences have yet been put forward. Differential display is a robust technique for detecting nucleic acid differences between two populations. We applied this technique to total nucleic acid preparations from scrapie-infected and control brain. Independent RNA preparations from eight normal and eight scrapie-infected (strain 263K) hamster brains were randomly amplified and visualized in parallel. Though the nucleic acid patterns were generally identical in scrapie-infected versus control brain, some rare bands were differentially displayed. Molecular species consistently overrepresented (or underrepresented) in all eight infected brain samples versus all eight controls were excised from the display, sequenced, and assembled into contigs. Only seven ros contigs (RNAs over- or underrepresented in scrapie) emerged, representing <4 kb from the transcriptome. All contained highly stable regions of secondary structure. The most abundant scrapie-only ros sequence was homologous to a repetitive transposable element (LINE; long interspersed nuclear element). Other ros sequences identified cellular RNA 7SL, clathrin heavy chain, visinin-like protein-1, and three highly specific subregions of ribosomal RNA (ros1-3). The ribosomal ros sequences accurately corresponded to LINE; retrotransposon insertion sites in ribosomal DNA (p<0.01). These differential motifs implicate specific host RNAs in the pathoetiology of the TSEs.
Variations in protein/flavin hydrogen bonding in a LOV domain produce non-Arrhenius kinetics of adduct decay†

PubMed Central

Zoltowski, Brian D.; Nash, Abigail I.; Gardner, Kevin H.

2011-01-01

Light Oxygen Voltage (LOV) domains utilize a conserved blue light-dependent mechanism to control a diverse array of effector domains in biological and engineered proteins. Variations in the kinetics and efficiency of LOV photochemistry fine tune various aspects of the photic response. Characterization of the kinetics of a key aspect of this photochemical mechanism in EL222, a blue-light responsive DNA binding protein from Erythrobacter litoralis HTCC2594, reveals unique non-Arrhenius behavior in the rate of dark state cleavage of the photochemically-generated adduct. Sequence analysis and mutagenesis studies establish that this effect stems from a Gln to Ala mutation unique to EL222 and homologous proteins from marine bacteria. Kinetic and spectroscopic analyses reveal that hydrogen bonding interactions between the FMN N1, O2 and ribityl hydroxyls with the surrounding protein regulate photocycle kinetics and stabilize the LOV active site from temperature-induced alteration in local structure. Substitution of residues interacting with the N1-O2 locus modulates adduct stability, structural flexibility and sequestration of the active site from bulk solvent without perturbation of light-activated DNA binding. Together, these variants link non-Arrhenius behavior to specific alteration of an H-bonding network, while affording tunability of photocycle kinetics. PMID:21923139
Variations in protein-flavin hydrogen bonding in a light, oxygen, voltage domain produce non-Arrhenius kinetics of adduct decay.

PubMed

Zoltowski, Brian D; Nash, Abigail I; Gardner, Kevin H

2011-10-18

Light, oxygen, voltage (LOV) domains utilize a conserved blue light-dependent mechanism to control a diverse array of effector domains in biological and engineered proteins. Variations in the kinetics and efficiency of LOV photochemistry fine-tune various aspects of the photic response. Characterization of the kinetics of a key aspect of this photochemical mechanism in EL222, a blue light responsive DNA binding protein from Erythrobacter litoralis HTCC2594, reveals unique non-Arrhenius behavior in the rate of dark-state cleavage of the photochemically generated adduct. Sequence analysis and mutagenesis studies establish that this effect stems from a Gln to Ala mutation unique to EL222 and homologous proteins from marine bacteria. Kinetic and spectroscopic analyses reveal that hydrogen bonding interactions between the FMN N1, O2, and ribityl hydroxyls and the surrounding protein regulate photocycle kinetics and stabilize the LOV active site from temperature-induced alteration in local structure. Substitution of residues interacting with the N1-O2 locus modulates adduct stability, structural flexibility, and sequestration of the active site from bulk solvent without perturbation of light-activated DNA binding. Together, these variants link non-Arrhenius behavior to specific alteration of an H-bonding network, while affording tunability of photocycle kinetics. © 2011 American Chemical Society
Amplicon Sequencing of the slpH Locus Permits Culture-Independent Strain Typing of Lactobacillus helveticus in Dairy Products

PubMed Central

Moser, Aline; Wüthrich, Daniel; Bruggmann, Rémy; Eugster-Meier, Elisabeth; Meile, Leo; Irmler, Stefan

2017-01-01

The advent of massive parallel sequencing technologies has opened up possibilities for the study of the bacterial diversity of ecosystems without the need for enrichment or single strain isolation. By exploiting 78 genome data-sets from Lactobacillus helveticus strains, we found that the slpH locus that encodes a putative surface layer protein displays sufficient genetic heterogeneity to be a suitable target for strain typing. Based on high-throughput slpH gene sequencing and the detection of single-base DNA sequence variations, we established a culture-independent method to assess the biodiversity of the L. helveticus strains present in fermented dairy food. When we applied the method to study the L. helveticus strain composition in 15 natural whey cultures (NWCs) that were collected at different Gruyère, a protected designation of origin (PDO) production facilities, we detected a total of 10 sequence types (STs). In addition, we monitored the development of a three-strain mix in raclette cheese for 17 weeks. PMID:28775722
Effect of Base Sequence "Defects" on the Electrostatic Potential of Dissolved DNA

NASA Astrophysics Data System (ADS)

Adams, Scott V.; Wagner, Katrina; Kephart, Thomas S.; Edwards, Glenn

1997-11-01

An analytical model of the electrostatic potential surrounding dissolved DNA has been developed. The model consists of an all-atom, mathematically helical structure for DNA, in which the atoms are arranged in infinite lines of discrete point charges on concentric cylindrical surfaces. The surrounding solvent and counterions are treated with the Debye-Huckel approximation (Wagner et al., Biophysical Journal 73, 21-30, 1997). Variation in the electrostatic potential due to structural differences between A, B, and Z conformations and homopolymer base sequence is apparent. The most recent modification to the model exploits the principle of superposition to calculate the potential of DNA with a base sequence containing `defects.' That is, the base sequence is no longer uniform along the polymer. Differences between the potential of homopolymer DNA and the potential of DNA containing base `defects' are immediately obvious. These results may aid in understanding the role of electrostatics in base-sequence specificity exhibited by DNA-binding proteins.
In search of actionable targets for agrigenomics and microalgal biofuel production: sequence-structural diversity studies on algal and higher plants with a focus on GPAT protein.

PubMed

Misra, Namrata; Panda, Prasanna Kumar

2013-04-01

The triacylglycerol (TAG) pathway provides several targets for genetic engineering to optimize microalgal lipid productivity. GPAT (glycerol-3-phosphate acyltransferase) is a crucial enzyme that catalyzes the initial step of TAG biosynthesis. Despite many recent biochemical studies, a comprehensive sequence-structure analysis of GPAT across diverse lipid-yielding organisms is lacking. Hence, we performed a comparative genomic analysis of plastid-located GPAT proteins from 7 microalgae and 3 higher plants species. The close evolutionary relationship observed between red algae/diatoms and green algae/plant lineages in the phylogenetic tree were further corroborated by motif and gene structure analysis. The predicted molecular weight, amino acid composition, Instability Index, and hydropathicity profile gave an overall representation of the biochemical features of GPAT protein across the species under study. Furthermore, homology models of GPAT from Chlamydomonas reinhardtii, Arabidopsis thaliana, and Glycine max provided deep insights into the protein architecture and substrate binding sites. Despite low sequence identity found between algal and plant GPATs, the developed models exhibited strikingly conserved topology consisting of 14α helices and 9β sheets arranged in two domains. However, subtle variations in amino acids of fatty acyl binding site were identified that might influence the substrate selectivity of GPAT. Together, the results will provide useful resources to understand the functional and evolutionary relationship of GPAT and potentially benefit in development of engineered enzyme for augmenting algal biofuel production.
[Comparison of genotype characteristics between the circulating mumps virus strain in Beijing area and the vaccine strain].

PubMed

Chen, Meng; Zhang, Tie-gang; Chen, Li-juan; Wu, Jiang; Yang, Jie; Zhang, Wei

2009-11-01

To compare the genetic characteristics of mumps virus strain circulating in Beijing with vaccine strain and to preliminarily analysis the reasons of vaccine ineffectiveness. The following methods were used: Isolation and identification of the mumps virus which had been circulating in Beijing, immunization history analysis, SH gene sequence analysis and comparison genotype homology with reference strains and analysis of the key amino acid sites of HN variation. In 38 mumps cases that virus had been isolated from, another seven cases were IgM negative. In 2007 and 2008, the positive rates on virus isolation, RT-PCR and IgM-decreased significantly, while the cases with immunization history had an increase. Cases without histories of vaccination had both higher positive rates on virus isolation and IgM. Thirty-eight strains belonged to F genotype virus, but vaccine strain was A genotype. The circulating viruses showed 5.6% sequence divergence on SH gene nucleotide and 16.0% - 18.1% from vaccine strain. Conservative hydrophobic amino acids on SH protein of some Beijing strains had changed. For example, there were 6 strains, from No.8: L-->F. The circulating viruses showed 2.3% sequence divergence on HN protein amino acid sequences and 4.2% - 5.3% from vaccine strain. Amino acids sites, which deciding the ability of cross-neutralization of the Beijing strains and vaccine strains were different. At the 354 and 356 sites, all the Beijing strains were different from the vaccine strains. The N-glycosylation sites on HN of Beijing strains were also different from those on vaccine strains. Locations 464 - 466 appeared to be NCS on Beijing strain, but locations 464 - 466 were NCR on the vaccine strains. Another 18 unknown function amino acids sites of all Beijing strains were different from those on vaccine strains. In recent years, genotype F became the main genotype of circulating strains in Beijing without genotype variation, but larger difference was found between them. There was a big difference between SH and HN protein of Beijing strains and vaccine strain, which might explain the ineffectiveness of the vaccine.
The complete chloroplast genome sequence of the medicinal plant Salvia miltiorrhiza.

PubMed

Qian, Jun; Song, Jingyuan; Gao, Huanhuan; Zhu, Yingjie; Xu, Jiang; Pang, Xiaohui; Yao, Hui; Sun, Chao; Li, Xian'en; Li, Chuyuan; Liu, Juyan; Xu, Haibin; Chen, Shilin

2013-01-01

Salvia miltiorrhiza is an important medicinal plant with great economic and medicinal value. The complete chloroplast (cp) genome sequence of Salvia miltiorrhiza, the first sequenced member of the Lamiaceae family, is reported here. The genome is 151,328 bp in length and exhibits a typical quadripartite structure of the large (LSC, 82,695 bp) and small (SSC, 17,555 bp) single-copy regions, separated by a pair of inverted repeats (IRs, 25,539 bp). It contains 114 unique genes, including 80 protein-coding genes, 30 tRNAs and four rRNAs. The genome structure, gene order, GC content and codon usage are similar to the typical angiosperm cp genomes. Four forward, three inverted and seven tandem repeats were detected in the Salvia miltiorrhiza cp genome. Simple sequence repeat (SSR) analysis among the 30 asterid cp genomes revealed that most SSRs are AT-rich, which contribute to the overall AT richness of these cp genomes. Additionally, fewer SSRs are distributed in the protein-coding sequences compared to the non-coding regions, indicating an uneven distribution of SSRs within the cp genomes. Entire cp genome comparison of Salvia miltiorrhiza and three other Lamiales cp genomes showed a high degree of sequence similarity and a relatively high divergence of intergenic spacers. Sequence divergence analysis discovered the ten most divergent and ten most conserved genes as well as their length variation, which will be helpful for phylogenetic studies in asterids. Our analysis also supports that both regional and functional constraints affect gene sequence evolution. Further, phylogenetic analysis demonstrated a sister relationship between Salvia miltiorrhiza and Sesamum indicum. The complete cp genome sequence of Salvia miltiorrhiza reported in this paper will facilitate population, phylogenetic and cp genetic engineering studies of this medicinal plant.
Characterization of the two intra-individual sequence variants in the 18S rRNA gene in the plant parasitic nematode, Rotylenchulus reniformis.

PubMed

Nyaku, Seloame T; Sripathi, Venkateswara R; Kantety, Ramesh V; Gu, Yong Q; Lawrence, Kathy; Sharma, Govind C

2013-01-01

The 18S rRNA gene is fundamental to cellular and organismal protein synthesis and because of its stable persistence through generations it is also used in phylogenetic analysis among taxa. Sequence variation in this gene within a single species is rare, but it has been observed in few metazoan organisms. More frequently it has mostly been reported in the non-transcribed spacer region. Here, we have identified two sequence variants within the near full coding region of 18S rRNA gene from a single reniform nematode (RN) Rotylenchulus reniformis labeled as reniform nematode variant 1 (RN_VAR1) and variant 2 (RN_VAR2). All sequences from three of the four isolates had both RN variants in their sequences; however, isolate 13B had only RN variant 2 sequence. Specific variable base sites (96 or 5.5%) were found within the 18S rRNA gene that can clearly distinguish the two 18S rDNA variants of RN, in 11 (25.0%) and 33 (75.0%) of the 44 RN clones, for RN_VAR1 and RN_VAR2, respectively. Neighbor-joining trees show that the RN_VAR1 is very similar to the previously existing R. reniformis sequence in GenBank, while the RN_VAR2 sequence is more divergent. This is the first report of the identification of two major variants of the 18S rRNA gene in the same single RN, and documents the specific base variation between the two variants, and hypothesizes on simultaneous co-existence of these two variants for this gene.
Characterization of the Two Intra-Individual Sequence Variants in the 18S rRNA Gene in the Plant Parasitic Nematode, Rotylenchulus reniformis

PubMed Central

Nyaku, Seloame T.; Sripathi, Venkateswara R.; Kantety, Ramesh V.; Gu, Yong Q.; Lawrence, Kathy; Sharma, Govind C.

2013-01-01

The 18S rRNA gene is fundamental to cellular and organismal protein synthesis and because of its stable persistence through generations it is also used in phylogenetic analysis among taxa. Sequence variation in this gene within a single species is rare, but it has been observed in few metazoan organisms. More frequently it has mostly been reported in the non-transcribed spacer region. Here, we have identified two sequence variants within the near full coding region of 18S rRNA gene from a single reniform nematode (RN) Rotylenchulus reniformis labeled as reniform nematode variant 1 (RN_VAR1) and variant 2 (RN_VAR2). All sequences from three of the four isolates had both RN variants in their sequences; however, isolate 13B had only RN variant 2 sequence. Specific variable base sites (96 or 5.5%) were found within the 18S rRNA gene that can clearly distinguish the two 18S rDNA variants of RN, in 11 (25.0%) and 33 (75.0%) of the 44 RN clones, for RN_VAR1 and RN_VAR2, respectively. Neighbor-joining trees show that the RN_VAR1 is very similar to the previously existing R. reniformis sequence in GenBank, while the RN_VAR2 sequence is more divergent. This is the first report of the identification of two major variants of the 18S rRNA gene in the same single RN, and documents the specific base variation between the two variants, and hypothesizes on simultaneous co-existence of these two variants for this gene. PMID:23593343
Origins of genes: "big bang" or continuous creation?

PubMed Central

Keese, P K; Gibbs, A

1992-01-01

Many protein families are common to all cellular organisms, indicating that many genes have ancient origins. Genetic variation is mostly attributed to processes such as mutation, duplication, and rearrangement of ancient modules. Thus it is widely assumed that much of present-day genetic diversity can be traced by common ancestry to a molecular "big bang." A rarely considered alternative is that proteins may arise continuously de novo. One mechanism of generating different coding sequences is by "overprinting," in which an existing nucleotide sequence is translated de novo in a different reading frame or from noncoding open reading frames. The clearest evidence for overprinting is provided when the original gene function is retained, as in overlapping genes. Analysis of their phylogenies indicates which are the original genes and which are their informationally novel partners. We report here the phylogenetic relationships of overlapping coding sequences from steroid-related receptor genes and from tymovirus, luteovirus, and lentivirus genomes. For each pair of overlapping coding sequences, one is confined to a single lineage, whereas the other is more widespread. This suggests that the phylogenetically restricted coding sequence arose only in the progenitor of that lineage by translating an out-of-frame sequence to yield the new polypeptide. The production of novel exons by alternative splicing in thyroid receptor and lentivirus genes suggests that introns can be a valuable evolutionary source for overprinting. New genes and their products may drive major evolutionary changes. PMID:1329098
An Exome Sequencing Study to Assess the Role of Rare Genetic Variation in Pulmonary Fibrosis.

PubMed

Petrovski, Slavé; Todd, Jamie L; Durheim, Michael T; Wang, Quanli; Chien, Jason W; Kelly, Fran L; Frankel, Courtney; Mebane, Caroline M; Ren, Zhong; Bridgers, Joshua; Urban, Thomas J; Malone, Colin D; Finlen Copeland, Ashley; Brinkley, Christie; Allen, Andrew S; O'Riordan, Thomas; McHutchison, John G; Palmer, Scott M; Goldstein, David B

2017-07-01

Idiopathic pulmonary fibrosis (IPF) is an increasingly recognized, often fatal lung disease of unknown etiology. The aim of this study was to use whole-exome sequencing to improve understanding of the genetic architecture of pulmonary fibrosis. We performed a case-control exome-wide collapsing analysis including 262 unrelated individuals with pulmonary fibrosis clinically classified as IPF according to American Thoracic Society/European Respiratory Society/Japanese Respiratory Society/Latin American Thoracic Association guidelines (81.3%), usual interstitial pneumonia secondary to autoimmune conditions (11.5%), or fibrosing nonspecific interstitial pneumonia (7.2%). The majority (87%) of case subjects reported no family history of pulmonary fibrosis. We searched 18,668 protein-coding genes for an excess of rare deleterious genetic variation using whole-exome sequence data from 262 case subjects with pulmonary fibrosis and 4,141 control subjects drawn from among a set of individuals of European ancestry. Comparing genetic variation across 18,668 protein-coding genes, we found a study-wide significant (P < 4.5 × 10 -7 ) case enrichment of qualifying variants in TERT, RTEL1, and PARN. A model qualifying ultrarare, deleterious, nonsynonymous variants implicated TERT and RTEL1, and a model specifically qualifying loss-of-function variants implicated RTEL1 and PARN. A subanalysis of 186 case subjects with sporadic IPF confirmed TERT, RTEL1, and PARN as study-wide significant contributors to sporadic IPF. Collectively, 11.3% of case subjects with sporadic IPF carried a qualifying variant in one of these three genes compared with the 0.3% carrier rate observed among control subjects (odds ratio, 47.7; 95% confidence interval, 21.5-111.6; P = 5.5 × 10 -22 ). We identified TERT, RTEL1, and PARN-three telomere-related genes previously implicated in familial pulmonary fibrosis-as significant contributors to sporadic IPF. These results support the idea that telomere dysfunction is involved in IPF pathogenesis.
Diversity and population structure of Plasmodium falciparum in Thailand based on the spatial and temporal haplotype patterns of the C-terminal 19-kDa domain of merozoite surface protein-1.

PubMed

Simpalipan, Phumin; Pattaradilokrat, Sittiporn; Siripoon, Napaporn; Seugorn, Aree; Kaewthamasorn, Morakot; Butcher, Robert D J; Harnyuttanakorn, Pongchai

2014-02-12

The 19-kDa C-terminal region of the merozoite surface protein-1 of the human malaria parasite Plasmodium falciparum (PfMSP-119) constitutes the major component on the surface of merozoites and is considered as one of the leading candidates for asexual blood stage vaccines. Because the protein exhibits a level of sequence variation that may compromise the effectiveness of a vaccine, the global sequence diversity of PfMSP-119 has been subjected to extensive research, especially in malaria endemic areas. In Thailand, PfMSP-119 sequences have been derived from a single parasite population in Tak province, located along the Thailand-Myanmar border, since 1995. However, the extent of sequence variation and the spatiotemporal patterns of the MSP-119 haplotypes along the Thai borders with Laos and Cambodia are unknown. Sixty-three isolates of P. falciparum from five geographically isolated populations along the Thai borders with Myanmar, Laos and Cambodia in three transmission seasons between 2002 and 2008 were collected and culture-adapted. The msp-1 gene block 17 was sequenced and analysed for the allelic diversity, frequency and distribution patterns of PfMSP-119 haplotypes in individual populations. The PfMSP-119 haplotype patterns were then compared between parasite populations to infer the population structure and genetic differentiation of the malaria parasite. Five conserved polymorphic positions, which accounted for five distinct haplotypes, of PfMSP-119 were identified. Differences in the prevalence of PfMSP-119 haplotypes were detected in different geographical regions, with the highest levels of genetic diversity being found in the Kanchanaburi and Ranong provinces along the Thailand-Myanmar border and Trat province located at the Thailand-Cambodia border. Despite this variability, the distribution patterns of individual PfMSP-119 haplotypes seemed to be very similar across the country and over the three malarial transmission seasons, suggesting that gene flow may operate between parasite populations circulating in Thailand and the three neighboring countries. The major MSP-119 haplotypes of P. falciparum populations in all endemic populations during three transmission seasons in Thailand were identified, providing basic information on the common haplotypes of MSP-119 that is of use for malaria vaccine development and inferring the population structure of P. falciparum populations in Thailand.
Proteome Characterization of Leaves in Common Bean

PubMed Central

Robison, Faith M.; Heuberger, Adam L.; Brick, Mark A.; Prenni, Jessica E.

2015-01-01

Dry edible bean (Phaseolus vulgaris L.) is a globally relevant food crop. The bean genome was recently sequenced and annotated allowing for proteomics investigations aimed at characterization of leaf phenotypes important to agriculture. The objective of this study was to utilize a shotgun proteomics approach to characterize the leaf proteome and to identify protein abundance differences between two bean lines with known variation in their physiological resistance to biotic stresses. Overall, 640 proteins were confidently identified. Among these are proteins known to be involved in a variety of molecular functions including oxidoreductase activity, binding peroxidase activity, and hydrolase activity. Twenty nine proteins were found to significantly vary in abundance (p-value < 0.05) between the two bean lines, including proteins associated with biotic stress. To our knowledge, this work represents the first large scale shotgun proteomic analysis of beans and our results lay the groundwork for future studies designed to investigate the molecular mechanisms involved in pathogen resistance. PMID:28248269
Vitamin K epoxide reductase complex subunit 1 (Vkorc1) haplotype diversity in mouse priority strains

PubMed Central

Song, Ying; Vera, Nicole; Kohn, Michael H

2008-01-01

Background Polymorphisms in the vitamin K-epoxide reductase complex subunit 1 gene, Vkorc1, could affect blood coagulation and other vitamin K-dependent proteins, such as osteocalcin (bone Gla protein, BGP). Here we sequenced the Vkorc1 gene in 40 mouse priority strains. We analyzed Vkorc1 haplotypes with respect to prothrombin time (PT) and bone mineral density and composition (BMD and BMC); phenotypes expected to be vitamin K-dependent and represented by data in the Mouse Phenome Database (MPD). Findings In the commonly used laboratory strains of Mus musculus domesticus we identified only four haplotypes differing in the intron or 5' region sequence of the Vkorc1. Six haplotypes differing by coding and non-coding polymorphisms were identified in the other subspecies of Mus. We detected no significant association of Vkorc1 haplotypes with PT, BMD and BMC within each subspecies of Mus. Vkorc1 haplotype sequences divergence between subspecies was associated with PT, BMD and BMC. Conclusion Phenotypic variation in PT, BMD and BMC within subspecies of Mus, while substantial, appears to be dominated by genetic variation in genes other than the Vkorc1. This was particularly evident for M. m. domesticus, where a single haplotype was observed in conjunction with virtually the entire range of PT, BMD and BMC values of all 5 subspecies of Mus included in this study. Differences in these phenotypes between subspecies also should not be attributed to Vkorc1 variants, but should be viewed as a result of genome wide genetic divergence. PMID:19046458
Personalized biochemistry and biophysics.

PubMed

Kroncke, Brett M; Vanoye, Carlos G; Meiler, Jens; George, Alfred L; Sanders, Charles R

2015-04-28

Whole human genome sequencing of individuals is becoming rapid and inexpensive, enabling new strategies for using personal genome information to help diagnose, treat, and even prevent human disorders for which genetic variations are causative or are known to be risk factors. Many of the exploding number of newly discovered genetic variations alter the structure, function, dynamics, stability, and/or interactions of specific proteins and RNA molecules. Accordingly, there are a host of opportunities for biochemists and biophysicists to participate in (1) developing tools to allow accurate and sometimes medically actionable assessment of the potential pathogenicity of individual variations and (2) establishing the mechanistic linkage between pathogenic variations and their physiological consequences, providing a rational basis for treatment or preventive care. In this review, we provide an overview of these opportunities and their associated challenges in light of the current status of genomic science and personalized medicine, the latter often termed precision medicine.
Personalized Biochemistry and Biophysics

PubMed Central

2016-01-01

Whole human genome sequencing of individuals is becoming rapid and inexpensive, enabling new strategies for using personal genome information to help diagnose, treat, and even prevent human disorders for which genetic variations are causative or are known to be risk factors. Many of the exploding number of newly discovered genetic variations alter the structure, function, dynamics, stability, and/or interactions of specific proteins and RNA molecules. Accordingly, there are a host of opportunities for biochemists and biophysicists to participate in (1) developing tools to allow accurate and sometimes medically actionable assessment of the potential pathogenicity of individual variations and (2) establishing the mechanistic linkage between pathogenic variations and their physiological consequences, providing a rational basis for treatment or preventive care. In this review, we provide an overview of these opportunities and their associated challenges in light of the current status of genomic science and personalized medicine, the latter often termed precision medicine. PMID:25856502
Mechanisms of Surface Antigenic Variation in the Human Pathogenic Fungus Pneumocystis jirovecii.

PubMed

Schmid-Siegert, Emanuel; Richard, Sophie; Luraschi, Amanda; Mühlethaler, Konrad; Pagni, Marco; Hauser, Philippe M

2017-11-07

Microbial pathogens commonly escape the human immune system by varying surface proteins. We investigated the mechanisms used for that purpose by Pneumocystis jirovecii This uncultivable fungus is an obligate pulmonary pathogen that in immunocompromised individuals causes pneumonia, a major life-threatening infection. Long-read PacBio sequencing was used to assemble a core of subtelomeres of a single P. jirovecii strain from a bronchoalveolar lavage fluid specimen from a single patient. A total of 113 genes encoding surface proteins were identified, including 28 pseudogenes. These genes formed a subtelomeric gene superfamily, which included five families encoding adhesive glycosylphosphatidylinositol (GPI)-anchored glycoproteins and one family encoding excreted glycoproteins. Numerical analyses suggested that diversification of the glycoproteins relies on mosaic genes created by ectopic recombination and occurs only within each family. DNA motifs suggested that all genes are expressed independently, except those of the family encoding the most abundant surface glycoproteins, which are subject to mutually exclusive expression. PCR analyses showed that exchange of the expressed gene of the latter family occurs frequently, possibly favored by the location of the genes proximal to the telomere because this allows concomitant telomere exchange. Our observations suggest that (i) the P. jirovecii cell surface is made of a complex mixture of different surface proteins, with a majority of a single isoform of the most abundant glycoprotein, (ii) genetic mosaicism within each family ensures variation of the glycoproteins, and (iii) the strategy of the fungus consists of the continuous production of new subpopulations composed of cells that are antigenically different. IMPORTANCE Pneumocystis jirovecii is a fungus causing severe pneumonia in immunocompromised individuals. It is the second most frequent life-threatening invasive fungal infection. We have studied the mechanisms of antigenic variation used by this pathogen to escape the human immune system, a strategy commonly used by pathogenic microorganisms. Using a new DNA sequencing technology generating long reads, we could characterize the highly repetitive gene families encoding the proteins that are present on the cellular surface of this pest. These gene families are localized in the regions close to the ends of all chromosomes, the subtelomeres. Such chromosomal localization was found to favor genetic recombinations between members of each gene family and to allow diversification of these proteins continuously over time. This pathogen seems to use a strategy of antigenic variation consisting of the continuous production of new subpopulations composed of cells that are antigenically different. Such a strategy is unique among human pathogens. Copyright © 2017 Schmid-Siegert et al.

Rare and Coding Region Genetic Variants Associated With Risk of Ischemic Stroke: The NHLBI Exome Sequence Project.

PubMed

Auer, Paul L; Nalls, Mike; Meschia, James F; Worrall, Bradford B; Longstreth, W T; Seshadri, Sudha; Kooperberg, Charles; Burger, Kathleen M; Carlson, Christopher S; Carty, Cara L; Chen, Wei-Min; Cupples, L Adrienne; DeStefano, Anita L; Fornage, Myriam; Hardy, John; Hsu, Li; Jackson, Rebecca D; Jarvik, Gail P; Kim, Daniel S; Lakshminarayan, Kamakshi; Lange, Leslie A; Manichaikul, Ani; Quinlan, Aaron R; Singleton, Andrew B; Thornton, Timothy A; Nickerson, Deborah A; Peters, Ulrike; Rich, Stephen S

2015-07-01

Stroke is the second leading cause of death and the third leading cause of years of life lost. Genetic factors contribute to stroke prevalence, and candidate gene and genome-wide association studies (GWAS) have identified variants associated with ischemic stroke risk. These variants often have small effects without obvious biological significance. Exome sequencing may discover predicted protein-altering variants with a potentially large effect on ischemic stroke risk. To investigate the contribution of rare and common genetic variants to ischemic stroke risk by targeting the protein-coding regions of the human genome. The National Heart, Lung, and Blood Institute (NHLBI) Exome Sequencing Project (ESP) analyzed approximately 6000 participants from numerous cohorts of European and African ancestry. For discovery, 365 cases of ischemic stroke (small-vessel and large-vessel subtypes) and 809 European ancestry controls were sequenced; for replication, 47 affected sibpairs concordant for stroke subtype and an African American case-control series were sequenced, with 1672 cases and 4509 European ancestry controls genotyped. The ESP's exome sequencing and genotyping started on January 1, 2010, and continued through June 30, 2012. Analyses were conducted on the full data set between July 12, 2012, and July 13, 2013. Discovery of new variants or genes contributing to ischemic stroke risk and subtype (primary analysis) and determination of support for protein-coding variants contributing to risk in previously published candidate genes (secondary analysis). We identified 2 novel genes associated with an increased risk of ischemic stroke: a protein-coding variant in PDE4DIP (rs1778155; odds ratio, 2.15; P = 2.63 × 10(-8)) with an intracellular signal transduction mechanism and in ACOT4 (rs35724886; odds ratio, 2.04; P = 1.24 × 10(-7)) with a fatty acid metabolism; confirmation of PDE4DIP was observed in affected sibpair families with large-vessel stroke subtype and in African Americans. Replication of protein-coding variants in candidate genes was observed for 2 previously reported GWAS associations: ZFHX3 (cardioembolic stroke) and ABCA1 (large-vessel stroke). Exome sequencing discovered 2 novel genes and mechanisms, PDE4DIP and ACOT4, associated with increased risk for ischemic stroke. In addition, ZFHX3 and ABCA1 were discovered to have protein-coding variants associated with ischemic stroke. These results suggest that genetic variation in novel pathways contributes to ischemic stroke risk and serves as a target for prediction, prevention, and therapy.
KinView: A visual comparative sequence analysis tool for integrated kinome research

PubMed Central

McSkimming, Daniel Ian; Dastgheib, Shima; Baffi, Timothy R.; Byrne, Dominic P.; Ferries, Samantha; Scott, Steven Thomas; Newton, Alexandra C.; Eyers, Claire E.; Kochut, Krzysztof J.; Eyers, Patrick A.

2017-01-01

Multiple sequence alignments (MSAs) are a fundamental analysis tool used throughout biology to investigate relationships between protein sequence, structure, function, evolutionary history, and patterns of disease-associated variants. However, their widespread application in systems biology research is currently hindered by the lack of user-friendly tools to simultaneously visualize, manipulate and query the information conceptualized in large sequence alignments, and the challenges in integrating MSAs with multiple orthogonal data such as cancer variants and post-translational modifications, which are often stored in heterogeneous data sources and formats. Here, we present the Multiple Sequence Alignment Ontology (MSAOnt), which represents a profile or consensus alignment in an ontological format. Subsets of the alignment are easily selected through the SPARQL Protocol and RDF Query Language for downstream statistical analysis or visualization. We have also created the Kinome Viewer (KinView), an interactive integrative visualization that places eukaryotic protein kinase cancer variants in the context of natural sequence variation and experimentally determined post-translational modifications, which play central roles in the regulation of cellular signaling pathways. Using KinView, we identified differential phosphorylation patterns between tyrosine and serine/threonine kinases in the activation segment, a major kinase regulatory region that is often mutated in proliferative diseases. We discuss cancer variants that disrupt phosphorylation sites in the activation segment, and show how KinView can be used as a comparative tool to identify differences and similarities in natural variation, cancer variants and post-translational modifications between kinase groups, families and subfamilies. Based on KinView comparisons, we identify and experimentally characterize a regulatory tyrosine (Y177PLK4) in the PLK4 C-terminal activation segment region termed the P+1 loop. To further demonstrate the application of KinView in hypothesis generation and testing, we formulate and validate a hypothesis explaining a novel predicted loss-of-function variant (D523NPKCβ) in the regulatory spine of PKCβ, a recently identified tumor suppressor kinase. KinView provides a novel, extensible interface for performing comparative analyses between subsets of kinases and for integrating multiple types of residue specific annotations in user friendly formats. PMID:27731453
Complete Sequence and Analysis of Coconut Palm (Cocos nucifera) Mitochondrial Genome

PubMed Central

Zhao, Yuhui; Zeng, Jingyao; Alamer, Ali; Alanazi, Ibrahim O.; Alawad, Abdullah O.; Al-Sadi, Abdullah M.; Hu, Songnian; Yu, Jun

2016-01-01

Coconut (Cocos nucifera L.), a member of the palm family (Arecaceae), is one of the most economically important crops in tropics, serving as an important source of food, drink, fuel, medicine, and construction material. Here we report an assembly of the coconut (C. nucifera, Oman local Tall cultivar) mitochondrial (mt) genome based on next-generation sequencing data. This genome, 678,653bp in length and 45.5% in GC content, encodes 72 proteins, 9 pseudogenes, 23 tRNAs, and 3 ribosomal RNAs. Within the assembly, we find that the chloroplast (cp) derived regions account for 5.07% of the total assembly length, including 13 proteins, 2 pseudogenes, and 11 tRNAs. The mt genome has a relatively large fraction of repeat content (17.26%), including both forward (tandem) and inverted (palindromic) repeats. Sequence variation analysis shows that the Ti/Tv ratio of the mt genome is lower as compared to that of the nuclear genome and neutral expectation. By combining public RNA-Seq data for coconut, we identify 734 RNA editing sites supported by at least two datasets. In summary, our data provides the second complete mt genome sequence in the family Arecaceae, essential for further investigations on mitochondrial biology of seed plants. PMID:27736909
Receptor-like genes in the major resistance locus of lettuce are subject to divergent selection.

PubMed Central

Meyers, B C; Shen, K A; Rohani, P; Gaut, B S; Michelmore, R W

1998-01-01

Disease resistance genes in plants are often found in complex multigene families. The largest known cluster of disease resistance specificities in lettuce contains the RGC2 family of genes. We compared the sequences of nine full-length genomic copies of RGC2 representing the diversity in the cluster to determine the structure of genes within this family and to examine the evolution of its members. The transcribed regions range from at least 7.0 to 13.1 kb, and the cDNAs contain deduced open reading frames of approximately 5. 5 kb. The predicted RGC2 proteins contain a nucleotide binding site and irregular leucine-rich repeats (LRRs) that are characteristic of resistance genes cloned from other species. Unique features of the RGC2 gene products include a bipartite LRR region with >40 repeats. At least eight members of this family are transcribed. The level of sequence diversity between family members varied in different regions of the gene. The ratio of nonsynonymous (Ka) to synonymous (Ks) nucleotide substitutions was lowest in the region encoding the nucleotide binding site, which is the presumed effector domain of the protein. The LRR-encoding region showed an alternating pattern of conservation and hypervariability. This alternating pattern of variation was also found in all comparisons within families of resistance genes cloned from other species. The Ka /Ks ratios indicate that diversifying selection has resulted in increased variation at these codons. The patterns of variation support the predicted structure of LRR regions with solvent-exposed hypervariable residues that are potentially involved in binding pathogen-derived ligands. PMID:9811792
A novel selection signature in stearoyl-coenzyme A desaturase (SCD) gene for enhanced milk fat content in Bubalus bubalis.

PubMed

Maryam, J; Babar, M E; Bao, Zhang; Nadeem, A

2016-10-01

Modern molecular interventions are dynamic gears for breeding animals with superior genetic make-up. These scientific efforts lead us toward sustainable dairy herds with improved milk production in terms of yield and quality. Many of candidate genes have been dissected at molecular level, and suitable genetic markers have been identified in cattle, but this work has not been validated in buffaloes so far. Stearoyl-coenzyme A desaturase (SCD) has been a potential candidate gene for fat content of milk. Genomic analysis of SCD revealed a total of six variations that were identified through DNA sequencing of animals with lower and higher butter fat %age. After statistical analysis, genotype AB of p.K158I could be associated (P value <0.0001) with higher milk fat %age (10.5 ± 0.5464). This SNP was validated on larger data set by cleaved amplified polymorphic sequences (CAPS) by using DdeI. To scrutinize the functional consequences of p.K158I, 3D protein structure of SCD was predicted by homology modeling and this variation was found located in the vicinity of functional domain and a part of transmembrane helix of this membrane integrated protein. This is a first report toward genetic screening of SCD gene at molecular level in buffalo. This report illustrates the implication of SCD gene and in particular p.K158I variation, in imparting its effect on milk fat %age, which can be targeted in selection of superior dairy buffaloes.
Genome sequencing of adzuki bean (Vigna angularis) provides insight into high starch and low fat accumulation and domestication.

PubMed

Yang, Kai; Tian, Zhixi; Chen, Chunhai; Luo, Longhai; Zhao, Bo; Wang, Zhuo; Yu, Lili; Li, Yisong; Sun, Yudong; Li, Weiyu; Chen, Yan; Li, Yongqiang; Zhang, Yueyang; Ai, Danjiao; Zhao, Jinyang; Shang, Cheng; Ma, Yong; Wu, Bin; Wang, Mingli; Gao, Li; Sun, Dongjing; Zhang, Peng; Guo, Fangfang; Wang, Weiwei; Li, Yuan; Wang, Jinlong; Varshney, Rajeev K; Wang, Jun; Ling, Hong-Qing; Wan, Ping

2015-10-27

Adzuki bean (Vigna angularis), an important legume crop, is grown in more than 30 countries of the world. The seed of adzuki bean, as an important source of starch, digestible protein, mineral elements, and vitamins, is widely used foods for at least a billion people. Here, we generated a high-quality draft genome sequence of adzuki bean by whole-genome shotgun sequencing. The assembled contig sequences reached to 450 Mb (83% of the genome) with an N50 of 38 kb, and the total scaffold sequences were 466.7 Mb with an N50 of 1.29 Mb. Of them, 372.9 Mb of scaffold sequences were assigned to the 11 chromosomes of adzuki bean by using a single nucleotide polymorphism genetic map. A total of 34,183 protein-coding genes were predicted. Functional analysis revealed that significant differences in starch and fat content between adzuki bean and soybean were likely due to transcriptional abundance, rather than copy number variations, of the genes related to starch and oil synthesis. We detected strong selection signals in domestication by the population analysis of 50 accessions including 11 wild, 11 semiwild, 17 landraces, and 11 improved varieties. In addition, the semiwild accessions were illuminated to have a closer relationship to the cultigen accessions than the wild type, suggesting that the semiwild adzuki bean might be a preliminary landrace and play some roles in the adzuki bean domestication. The genome sequence of adzuki bean will facilitate the identification of agronomically important genes and accelerate the improvement of adzuki bean.
Comparative genomics of citric-acid producing Aspergillus niger ATCC 1015 versus enzyme-producing CBS 513.88

DOE Office of Scientific and Technical Information (OSTI.GOV)

Andersen, Mikael R.; Salazar, Margarita; Schaap, Peter

2011-06-01

The filamentous fungus Aspergillus niger exhibits great diversity in its phenotype. It is found globally, both as marine and terrestrial strains, produces both organic acids and hydrolytic enzymes in high amounts, and some isolates exhibit pathogenicity. Although the genome of an industrial enzyme-producing A. niger strain (CBS 513.88) has already been sequenced, the versatility and diversity of this species compels additional exploration. We therefore undertook whole genome sequencing of the acidogenic A. niger wild type strain (ATCC 1015), and produced a genome sequence of very high quality. Only 15 gaps are present in the sequence and half the telomeric regionsmore » have been elucidated. Moreover, sequence information from ATCC 1015 was utilized to improve the genome sequence of CBS 513.88. Chromosome-level comparisons uncovered several genome rearrangements, deletions, a clear case of strain-specific horizontal gene transfer, and identification of 0.8 megabase of novel sequence. Single nucleotide polymorphisms per kilobase (SNPs/kb) between the two strains were found to be exceptionally high (average: 7.8, maximum: 160 SNPs/kb). High variation within the species was confirmed with exo-metabolite profiling and phylogenetics. Detailed lists of alleles were generated, and genotypic differences were observed to accumulate in metabolic pathways essential to acid production and protein synthesis. A transcriptome analysis revealed up-regulation of the electron transport chain, specifically the alternative oxidative pathway in ATCC 1015, while CBS 513.88 showed significant up regulation of genes associated with biosynthesis of amino acids that are abundant in glucoamylase A, tRNA-synthases and protein transporters.« less
Genome sequencing of adzuki bean (Vigna angularis) provides insight into high starch and low fat accumulation and domestication

PubMed Central

Yang, Kai; Tian, Zhixi; Chen, Chunhai; Luo, Longhai; Zhao, Bo; Wang, Zhuo; Yu, Lili; Li, Yisong; Sun, Yudong; Li, Weiyu; Chen, Yan; Li, Yongqiang; Zhang, Yueyang; Ai, Danjiao; Zhao, Jinyang; Shang, Cheng; Ma, Yong; Wu, Bin; Wang, Mingli; Gao, Li; Sun, Dongjing; Zhang, Peng; Guo, Fangfang; Wang, Weiwei; Li, Yuan; Wang, Jinlong; Varshney, Rajeev K.; Wang, Jun; Ling, Hong-Qing; Wan, Ping

2015-01-01

Adzuki bean (Vigna angularis), an important legume crop, is grown in more than 30 countries of the world. The seed of adzuki bean, as an important source of starch, digestible protein, mineral elements, and vitamins, is widely used foods for at least a billion people. Here, we generated a high-quality draft genome sequence of adzuki bean by whole-genome shotgun sequencing. The assembled contig sequences reached to 450 Mb (83% of the genome) with an N50 of 38 kb, and the total scaffold sequences were 466.7 Mb with an N50 of 1.29 Mb. Of them, 372.9 Mb of scaffold sequences were assigned to the 11 chromosomes of adzuki bean by using a single nucleotide polymorphism genetic map. A total of 34,183 protein-coding genes were predicted. Functional analysis revealed that significant differences in starch and fat content between adzuki bean and soybean were likely due to transcriptional abundance, rather than copy number variations, of the genes related to starch and oil synthesis. We detected strong selection signals in domestication by the population analysis of 50 accessions including 11 wild, 11 semiwild, 17 landraces, and 11 improved varieties. In addition, the semiwild accessions were illuminated to have a closer relationship to the cultigen accessions than the wild type, suggesting that the semiwild adzuki bean might be a preliminary landrace and play some roles in the adzuki bean domestication. The genome sequence of adzuki bean will facilitate the identification of agronomically important genes and accelerate the improvement of adzuki bean. PMID:26460024
Diversity in the origins of proteostasis networks- a driver for protein function in evolution

PubMed Central

Powers, Evan T.; Balch, William E.

2013-01-01

Although a protein’s primary sequence largely determines its function, proteins can adopt different folding states in response to changes in the environment, some of which may be deleterious to the organism. All organisms, including Bacteria, Archaea and Eukarya, have evolved a protein homeostasis network, or proteostasis network, that consists of chaperones and folding factors, degradation components, signalling pathways and specialized compartmentalized modules that manage protein folding in response to environmental stimuli and variation. Surveying the origins of proteostasis networks reveals that they have co-evolved with the proteome to regulate the physiological state of the cell, reflecting the unique stresses that different cells or organisms experience, and that they have a key role in driving evolution by closely managing the link between the phenotype and the genotype. PMID:23463216
Multiple intermediates on the energy landscape of a 15-HEAT-repeat protein

PubMed Central

Tsytlonok, Maksym; Craig, Patricio O.; Sivertsson, Elin; Serquera, David; Perrett, Sarah; Best, Robert B.; Wolynes, Peter G.; Itzhaki, Laura S.

2014-01-01

Repeat proteins are a special class of modular, non-globular proteins composed of small structural motifs arrayed to form elongated architectures and stabilised solely by short-range contacts. We find a remarkable complexity in the unfolding of the large HEAT repeat protein PR65/A. In contrast to what has been seen for small repeat proteins in which unfolding propagates from one end, the HEAT array of PR65/A ruptures at multiple distant sites, leading to intermediate states with non-contiguous folded subdomains. Kinetic analysis allows us to define a network of intermediates and to delineate the pathways that connect them. There is a dominant sequence of unfolding, reflecting a non-uniform distribution of stability across the repeat array; however the unfolding of certain intermediates is competitive, leading to parallel pathways. Theoretical models accounting for the heterogeneous contact density in the folded structure are able to rationalize the variation in stability across the array. This variation in stability also suggests how folding may direct function in a large repeat protein: The stability distribution enables certain regions to present rigid motifs for molecular recognition while affording others flexibility to broaden the search area as in a fly-casting mechanism. Thus PR65/A uses the two ends of the repeat array to bind diverse partners and thereby coordinate the dephosphorylation of many different substrates and of multiple sites within hyperphosphorylated substrates. PMID:24120762
Structural and Functional Insights into WRKY3 and WRKY4 Transcription Factors to Unravel the WRKY–DNA (W-Box) Complex Interaction in Tomato (Solanum lycopersicum L.). A Computational Approach

PubMed Central

Aamir, Mohd; Singh, Vinay K.; Meena, Mukesh; Upadhyay, Ram S.; Gupta, Vijai K.; Singh, Surendra

2017-01-01

The WRKY transcription factors (TFs), play crucial role in plant defense response against various abiotic and biotic stresses. The role of WRKY3 and WRKY4 genes in plant defense response against necrotrophic pathogens is well-reported. However, their functional annotation in tomato is largely unknown. In the present work, we have characterized the structural and functional attributes of the two identified tomato WRKY transcription factors, WRKY3 (SlWRKY3), and WRKY4 (SlWRKY4) using computational approaches. Arabidopsis WRKY3 (AtWRKY3: NP_178433) and WRKY4 (AtWRKY4: NP_172849) protein sequences were retrieved from TAIR database and protein BLAST was done for finding their sequential homologs in tomato. Sequence alignment, phylogenetic classification, and motif composition analysis revealed the remarkable sequential variation between, these two WRKYs. The tomato WRKY3 and WRKY4 clusters with Solanum pennellii showing the monophyletic origin and evolution from their wild homolog. The functional domain region responsible for sequence specific DNA-binding occupied in both proteins were modeled [using AtWRKY4 (PDB ID:1WJ2) and AtWRKY1 (PDBID:2AYD) as template protein structures] through homology modeling using Discovery Studio 3.0. The generated models were further evaluated for their accuracy and reliability based on qualitative and quantitative parameters. The modeled proteins were found to satisfy all the crucial energy parameters and showed acceptable Ramachandran statistics when compared to the experimentally resolved NMR solution structures and/or X-Ray diffracted crystal structures (templates). The superimposition of the functional WRKY domains from SlWRKY3 and SlWRKY4 revealed remarkable structural similarity. The sequence specific DNA binding for two WRKYs was explored through DNA-protein interaction using Hex Docking server. The interaction studies found that SlWRKY4 binds with the W-box DNA through WRKYGQK with Tyr408, Arg409, and Lys419 with the initial flanking sequences also get involved in binding. In contrast, the SlWRKY3 made interaction with RKYGQK along with the residues from zinc finger motifs. Protein-protein interactions studies were done using STRING version 10.0 to explore all the possible protein partners involved in associative functional interaction networks. The Gene ontology enrichment analysis revealed the functional dimension and characterized the identified WRKYs based on their functional annotation. PMID:28611792
Algorithm, applications and evaluation for protein comparison by Ramanujan Fourier transform.

PubMed

Zhao, Jian; Wang, Jiasong; Hua, Wei; Ouyang, Pingkai

2015-12-01

The amino acid sequence of a protein determines its chemical properties, chain conformation and biological functions. Protein sequence comparison is of great importance to identify similarities of protein structures and infer their functions. Many properties of a protein correspond to the low-frequency signals within the sequence. Low frequency modes in protein sequences are linked to the secondary structures, membrane protein types, and sub-cellular localizations of the proteins. In this paper, we present Ramanujan Fourier transform (RFT) with a fast algorithm to analyze the low-frequency signals of protein sequences. The RFT method is applied to similarity analysis of protein sequences with the Resonant Recognition Model (RRM). The results show that the proposed fast RFT method on protein comparison is more efficient than commonly used discrete Fourier transform (DFT). RFT can detect common frequencies as significant feature for specific protein families, and the RFT spectrum heat-map of protein sequences demonstrates the information conservation in the sequence comparison. The proposed method offers a new tool for pattern recognition, feature extraction and structural analysis on protein sequences. Copyright © 2015 Elsevier Ltd. All rights reserved.
Comparison of the Genome Sequence of the Poultry Pathogen Bordetella avium with Those of B. bronchiseptica, B. pertussis, and B. parapertussis Reveals Extensive Diversity in Surface Structures Associated with Host Interaction

PubMed Central

Sebaihia, Mohammed; Preston, Andrew; Maskell, Duncan J.; Kuzmiak, Holly; Connell, Terry D.; King, Natalie D.; Orndorff, Paul E.; Miyamoto, David M.; Thomson, Nicholas R.; Harris, David; Goble, Arlette; Lord, Angela; Murphy, Lee; Quail, Michael A.; Rutter, Simon; Squares, Robert; Squares, Steven; Woodward, John; Parkhill, Julian; Temple, Louise M.

2006-01-01

Bordetella avium is a pathogen of poultry and is phylogenetically distinct from Bordetella bronchiseptica, Bordetella pertussis, and Bordetella parapertussis, which are other species in the Bordetella genus that infect mammals. In order to understand the evolutionary relatedness of Bordetella species and further the understanding of pathogenesis, we obtained the complete genome sequence of B. avium strain 197N, a pathogenic strain that has been extensively studied. With 3,732,255 base pairs of DNA and 3,417 predicted coding sequences, it has the smallest genome and gene complement of the sequenced bordetellae. In this study, the presence or absence of previously reported virulence factors from B. avium was confirmed, and the genetic bases for growth characteristics were elucidated. Over 1,100 genes present in B. avium but not in B. bronchiseptica were identified, and most were predicted to encode surface or secreted proteins that are likely to define an organism adapted to the avian rather than the mammalian respiratory tracts. These include genes coding for the synthesis of a polysaccharide capsule, hemagglutinins, a type I secretion system adjacent to two very large genes for secreted proteins, and unique genes for both lipopolysaccharide and fimbrial biogenesis. Three apparently complete prophages are also present. The BvgAS virulence regulatory system appears to have polymorphisms at a poly(C) tract that is involved in phase variation in other bordetellae. A number of putative iron-regulated outer membrane proteins were predicted from the sequence, and this regulation was confirmed experimentally for five of these. PMID:16885469
CCAAT/enhancer-binding protein β is involved in the breed-dependent transcriptional regulation of 3β-hydroxysteroid dehydrogenase/Δ(5)-Δ(4)-isomerase in adrenal gland of preweaning piglets.

PubMed

Li, Xian; Li, Runsheng; Jia, Yimin; Sun, Zhiyuan; Yang, Xiaojing; Sun, Qinwei; Zhao, Ruqian

2013-11-01

The enzyme 3β-hydroxysteroid dehydrogenase/Δ(5)-Δ(4)-isomerase (3β-HSD) catalyzes the biosynthesis of all steroid hormones. The molecular mechanisms regulating porcine adrenal 3β-HSD expression in different breeds are still poorly understood. In this study, we aimed to compare the expression of 3β-HSD between preweaning purebred Large White (LW) and Erhualian (EHL) piglets and to explore the potential factors regulating 3β-HSD transcription. EHL had significantly higher serum levels of cortisol (P<0.01) and testosterone (P<0.01), which were associated with significantly higher expression of 3β-HSD mRNA (P<0.01) and protein (P<0.05) in the adrenal gland, compared with LW piglets. The 5' flanking region of the porcine 3β-HSD gene showed significant sequence variations between breeds, and the sequence of EHL demonstrated an elevated promoter activity (P<0.05) in luciferase reporter gene assay. Higher adrenal expression of 3β-HSD in EHL was accompanied with higher CCAAT/enhancer binding protein β (C/EBPβ) expression (P<0.05), enriched histone H3 acetylation (P<0.05) and C/EBPβ binding to 3β-HSD promoter (P<0.05). In addition, higher androgen receptor (AR) (P=0.06) and lower glucocorticoid receptor (GR) (P<0.05) were detected in EHL. Co-immunoprecipitation analysis revealed interactions of C/EBPβ with both AR and GR. These results indicate that the C/EBPβ binding to 3β-HSD promoter is responsible, at least in part, for the breed-dependent 3β-HSD expression in adrenal gland of piglets. The sequence variations of 3β-HSD promoter and the interactions of AR and/or GR with C/EBPβ may also participate in the regulation. Copyright © 2013 Elsevier Ltd. All rights reserved.
Whole-Genome Resequencing of Holstein Bulls for Indel Discovery and Identification of Genes Associated with Milk Composition Traits in Dairy Cattle.

PubMed

Jiang, Jianping; Gao, Yahui; Hou, Yali; Li, Wenhui; Zhang, Shengli; Zhang, Qin; Sun, Dongxiao

2016-01-01

The use of whole-genome resequencing to obtain more information on genetic variation could produce a range of benefits for the dairy cattle industry, especially with regard to increasing milk production and improving milk composition. In this study, we sequenced the genomes of eight Holstein bulls from four half- or full-sib families, with high and low estimated breeding values (EBVs) of milk protein percentage and fat percentage at an average effective depth of 10×, using Illumina sequencing. Over 0.9 million nonredundant short insertions and deletions (indels) [1-49 base pairs (bp)] were obtained. Among them, 3,625 indels that were polymorphic between the high and low groups of bulls were revealed and subjected to further analysis. The vast majority (76.67%) of these indels were novel. Follow-up validation assays confirmed that most (70%) of the randomly selected indels represented true variations. The indels that were polymorphic between the two groups were annotated based on the cattle genome sequence assembly (UMD3.1.69); as a result, nearly 1,137 of them were found to be located within 767 annotated genes, only 5 (0.138%) of which were located in exons. Then, by integrated analysis of the 767 genes with known quantitative trait loci (QTL); significant single-nucleotide polymorphisms (SNPs) previously identified by genome-wide association studies (GWASs) to be associated with bovine milk protein and fat traits; and the well-known pathways involved in protein, fat synthesis, and metabolism, we identified a total of 11 promising candidate genes potentially affecting milk composition traits. These were FCGR2B, CENPE, RETSAT, ACSBG2, NFKB2, TBC1D1, NLK, MAP3K1, SLC30A2, ANGPT1 and UGDH. Our findings provide a basis for further study and reveal key genes for milk composition traits in dairy cattle.
Val-->Ala mutations selectively alter helix-helix packing in the transmembrane segment of phage M13 coat protein.

PubMed Central

Deber, C M; Khan, A R; Li, Z; Joensson, C; Glibowicka, M; Wang, J

1993-01-01

Val-->Ala mutations within the effective transmembrane segment of a model single-spanning membrane protein, the 50-residue major coat (gene VIII) protein of bacteriophage M13, are shown to have sequence-dependent impacts on stabilization of membrane-embedded helical dimeric structures. Randomized mutagenesis performed on the coat protein hydrophobic segment 21-39 (YIGYAWAMV-VVIVGATIGI) produced a library of viable mutants which included those in which each of the four valine residues was replaced by an alanine residue. Significant variations found among these Val-->Ala mutants in the relative populations and thermal stabilities of monomeric and dimeric helical species observed on SDS/PAGE, and in the range of their alpha-helix-->beta-sheet transition temperatures confirmed that intramembranous valine residues are not simply universal contributors to membrane anchoring. Additional analyses of (i) nonmutatable sites in the mutant protein library, (ii) the properties of the double mutant V29A-V31A obtained by recycling mutant V31A DNA through mutagenesis procedures, and (iii) energy-minimized helical dimer structures of wild-type and mutant V31A transmembrane regions indicated that the transmembrane hydrophobic core helix of the M13 coat protein can be partitioned into alternating pairs of potential protein-interactive residues (V30, V31; G34, A35; G38, I39) and membrane-interactive residues (M28, V29; I32, V33; T36, I37). The overall results consitute an experimental approach to categorizing the distinctive contributions to structure of the residues comprising a protein-protein packing interface vs. those facing lipid and confirm the sequence-dependent capacity of specific residues within the transmembrane domain to modulate protein-protein interactions which underlie regulatory events in membrane proteins. Images Fig. 2 Fig. 4 PMID:8265602
Val-->Ala mutations selectively alter helix-helix packing in the transmembrane segment of phage M13 coat protein.

PubMed

Deber, C M; Khan, A R; Li, Z; Joensson, C; Glibowicka, M; Wang, J

1993-12-15

Val-->Ala mutations within the effective transmembrane segment of a model single-spanning membrane protein, the 50-residue major coat (gene VIII) protein of bacteriophage M13, are shown to have sequence-dependent impacts on stabilization of membrane-embedded helical dimeric structures. Randomized mutagenesis performed on the coat protein hydrophobic segment 21-39 (YIGYAWAMV-VVIVGATIGI) produced a library of viable mutants which included those in which each of the four valine residues was replaced by an alanine residue. Significant variations found among these Val-->Ala mutants in the relative populations and thermal stabilities of monomeric and dimeric helical species observed on SDS/PAGE, and in the range of their alpha-helix-->beta-sheet transition temperatures confirmed that intramembranous valine residues are not simply universal contributors to membrane anchoring. Additional analyses of (i) nonmutatable sites in the mutant protein library, (ii) the properties of the double mutant V29A-V31A obtained by recycling mutant V31A DNA through mutagenesis procedures, and (iii) energy-minimized helical dimer structures of wild-type and mutant V31A transmembrane regions indicated that the transmembrane hydrophobic core helix of the M13 coat protein can be partitioned into alternating pairs of potential protein-interactive residues (V30, V31; G34, A35; G38, I39) and membrane-interactive residues (M28, V29; I32, V33; T36, I37). The overall results consitute an experimental approach to categorizing the distinctive contributions to structure of the residues comprising a protein-protein packing interface vs. those facing lipid and confirm the sequence-dependent capacity of specific residues within the transmembrane domain to modulate protein-protein interactions which underlie regulatory events in membrane proteins.
DNA methylation Landscape of body size variation in sheep.

PubMed

Cao, Jiaxue; Wei, Caihong; Liu, Dongming; Wang, Huihua; Wu, Mingming; Xie, Zhiyuan; Capellini, Terence D; Zhang, Li; Zhao, Fuping; Li, Li; Zhong, Tao; Wang, Linjie; Lu, Jian; Liu, Ruizao; Zhang, Shifang; Du, Yongfei; Zhang, Hongping; Du, Lixin

2015-10-16

Sub-populations of Chinese Mongolian sheep exhibit significant variance in body mass. In the present study, we sequenced the whole genome DNA methylation in these breeds to detect whether DNA methylation plays a role in determining the body mass of sheep by Methylated DNA immunoprecipitation - sequencing method. A high quality methylation map of Chinese Mongolian sheep was obtained in this study. We identified 399 different methylated regions located in 93 human orthologs, which were previously reported as body size related genes in human genome-wide association studies. We tested three regions in LTBP1, and DNA methylation of two CpG sites showed significant correlation with its RNA expression. Additionally, a particular set of differentially methylated windows enriched in the "development process" (GO: 0032502) was identified as potential candidates for association with body mass variation. Next, we validated small part of these windows in 5 genes; DNA methylation of SMAD1, TSC1 and AKT1 showed significant difference across breeds, and six CpG were significantly correlated with RNA expression. Interestingly, two CpG sites showed significant correlation with TSC1 protein expression. This study provides a thorough understanding of body size variation in sheep from an epigenetic perspective.
Mutations in the C-terminus of CDKL5: proceed with caution

PubMed Central

Diebold, Bertrand; Delépine, Chloé; Gataullina, Svetlana; Delahaye, Andrée; Nectoux, Juliette; Bienvenu, Thierry

2014-01-01

Mutations in the cyclin-dependent kinase-like 5 (CDKL5) gene have been described in girls with Rett-like features and early-onset epileptic encephalopathy including infantile spasms. Milder phenotypes have been associated with sequence variations in the 3′-end of the CDKL5 gene. Identification of novel CDKL5 transcripts coding isoforms characterized by an altered C-terminal region strongly questions the eventual pathogenicity of sequence variations located in the 3′-end of the gene. We investigated a group of 30 female patients with a clinically heterogeneous phenotype ranging from nonspecific intellectual disability to a severe neonatal encephalopathy and identified two heterozygous CDKL5 missense mutations, the previously reported p.Val999Met and the novel mutation p.Pro944Thr. However, these mutations have also been detected in their healthy father. Considering our results and all data from the literature, we suggest that genetic variations beyond the codon 938 in human CDKL5115 protein may have minor or no significance. It is probable that screening of exons 19–21 of the CDKL5 gene is not useful in practical molecular diagnosis of atypical Rett syndrome. PMID:23756444
The new modern era of yeast genomics: community sequencing and the resulting annotation of multiple Saccharomyces cerevisiae strains at the Saccharomyces Genome Database

PubMed Central

Engel, Stacia R.; Cherry, J. Michael

2013-01-01

The first completed eukaryotic genome sequence was that of the yeast Saccharomyces cerevisiae, and the Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) is the original model organism database. SGD remains the authoritative community resource for the S. cerevisiae reference genome sequence and its annotation, and continues to provide comprehensive biological information correlated with S. cerevisiae genes and their products. A diverse set of yeast strains have been sequenced to explore commercial and laboratory applications, and a brief history of those strains is provided. The publication of these new genomes has motivated the creation of new tools, and SGD will annotate and provide comparative analyses of these sequences, correlating changes with variations in strain phenotypes and protein function. We are entering a new era at SGD, as we incorporate these new sequences and make them accessible to the scientific community, all in an effort to continue in our mission of educating researchers and facilitating discovery. Database URL: http://www.yeastgenome.org/ PMID:23487186

Folding and Stabilization of Native-Sequence-Reversed Proteins

PubMed Central

Zhang, Yuanzhao; Weber, Jeffrey K; Zhou, Ruhong

2016-01-01

Though the problem of sequence-reversed protein folding is largely unexplored, one might speculate that reversed native protein sequences should be significantly more foldable than purely random heteropolymer sequences. In this article, we investigate how the reverse-sequences of native proteins might fold by examining a series of small proteins of increasing structural complexity (α-helix, β-hairpin, α-helix bundle, and α/β-protein). Employing a tandem protein structure prediction algorithmic and molecular dynamics simulation approach, we find that the ability of reverse sequences to adopt native-like folds is strongly influenced by protein size and the flexibility of the native hydrophobic core. For β-hairpins with reverse-sequences that fail to fold, we employ a simple mutational strategy for guiding stable hairpin formation that involves the insertion of amino acids into the β-turn region. This systematic look at reverse sequence duality sheds new light on the problem of protein sequence-structure mapping and may serve to inspire new protein design and protein structure prediction protocols. PMID:27113844
Folding and Stabilization of Native-Sequence-Reversed Proteins

NASA Astrophysics Data System (ADS)

Zhang, Yuanzhao; Weber, Jeffrey K.; Zhou, Ruhong

2016-04-01

Though the problem of sequence-reversed protein folding is largely unexplored, one might speculate that reversed native protein sequences should be significantly more foldable than purely random heteropolymer sequences. In this article, we investigate how the reverse-sequences of native proteins might fold by examining a series of small proteins of increasing structural complexity (α-helix, β-hairpin, α-helix bundle, and α/β-protein). Employing a tandem protein structure prediction algorithmic and molecular dynamics simulation approach, we find that the ability of reverse sequences to adopt native-like folds is strongly influenced by protein size and the flexibility of the native hydrophobic core. For β-hairpins with reverse-sequences that fail to fold, we employ a simple mutational strategy for guiding stable hairpin formation that involves the insertion of amino acids into the β-turn region. This systematic look at reverse sequence duality sheds new light on the problem of protein sequence-structure mapping and may serve to inspire new protein design and protein structure prediction protocols.
Amino acid and structural variability of Yersinia pestis LcrV protein

DOE Office of Scientific and Technical Information (OSTI.GOV)

Anisimov, A P; Dentovskaya, S V; Panfertsev, E A

2009-11-09

The LcrV protein is a multifunctional virulence factor and protective antigen of the plague bacterium which is generally conserved between the epidemic strains of Yersinia pestis. They investigated the diversity in the LcrV sequences among non-epidemic Y. pestis strains which have a limited virulence in selected animal models and for humans. Sequencing of lcrV genes from ten Y. pestis strains belonging to different phylogenetic groups (subspecies) showed that the LcrV proteins possess four major variable hotspots at positions 18, 72, 273, and 324-326. These major variations, together with other minor substitutions in amino acid sequences, allowed them to classify themore » LcrV alleles into five sequence types (A-E). They observed that the strains of different Y. pestis subspecies can have the same typ of LcrV, and different types of LcrV can exist within the same natural plague focus. The LcrV polymorphisms were structurally analyzed by comparing the modeled structures of LcrV from all available strains. All changes except one occurred either in flexible regions or on the surface of the protein, but local chemical properties (i.e. those of a hydrophobic, hydrophilic, amphipathic, or charged nature) were conserved across all of the strains. Polymorphisms in flexible and surface regions are likely subject to less selective pressure, and have a limited impact on the structure. In contrast, the substitution of tryptophan at position 113 with either glutamic acid or glycine likely has a serious influence on the regional structure of the protein, and these mutations might have an effect on the function of LcrV. The polymorphisms at positions 18, 72 and 273 were accountable for differences in oligomerization of LcrV. The importance of the latter property in emergence of epidemic strains of Y. pestis during evolution of this pathogen will need to be further investigated.« less
Adaptive Covariation between the Coat and Movement Proteins of Prunus Necrotic Ringspot Virus

PubMed Central

Codoñer, Francisco M.; Fares, Mario A.; Elena, Santiago F.

2006-01-01

The relative functional and/or structural importance of different amino acid sites in a protein can be assessed by evaluating the selective constraints to which they have been subjected during the course of evolution. Here we explore such constraints at the linear and three-dimensional levels for the movement protein (MP) and coat protein (CP) encoded by RNA 3 of prunus necrotic ringspot ilarvirus (PNRSV). By a maximum-parsimony approach, the nucleotide sequences from 46 isolates of PNRSV varying in symptomatology, host tree, and geographic origin have been analyzed and sites under different selective pressures have been identified in both proteins. We have also performed covariation analyses to explore whether changes in certain amino acid sites condition subsequent variation in other sites of the same protein or the other protein. These covariation analyses shed light on which particular amino acids should be involved in the physical and functional interaction between MP and CP. Finally, we discuss these findings in the light of what is already known about the implication of certain sites and domains in structure and protein-protein and RNA-protein interactions. PMID:16731922
Adaptive covariation between the coat and movement proteins of prunus necrotic ringspot virus.

PubMed

Codoñer, Francisco M; Fares, Mario A; Elena, Santiago F

2006-06-01

The relative functional and/or structural importance of different amino acid sites in a protein can be assessed by evaluating the selective constraints to which they have been subjected during the course of evolution. Here we explore such constraints at the linear and three-dimensional levels for the movement protein (MP) and coat protein (CP) encoded by RNA 3 of prunus necrotic ringspot ilarvirus (PNRSV). By a maximum-parsimony approach, the nucleotide sequences from 46 isolates of PNRSV varying in symptomatology, host tree, and geographic origin have been analyzed and sites under different selective pressures have been identified in both proteins. We have also performed covariation analyses to explore whether changes in certain amino acid sites condition subsequent variation in other sites of the same protein or the other protein. These covariation analyses shed light on which particular amino acids should be involved in the physical and functional interaction between MP and CP. Finally, we discuss these findings in the light of what is already known about the implication of certain sites and domains in structure and protein-protein and RNA-protein interactions.
A single Alal 39-to-Glu substitution in the Renibacterium salmoninarum virulence-associated protein p57 results in antigenic variation and is associated with enhanced p57 binding to Chinook salmon leukocytes

USGS Publications Warehouse

Wiens, Gregory D.; Pascho, Ron; Winton, James R.

2002-01-01

The gram-positive bacterium Renibacterium salmoninarum produces relatively large amounts of a 57-kDa protein (p57) implicated in the pathogenesis of salmonid bacterial kidney disease. Antigenic variation in p57 was identified by using monoclonal antibody 4C11, which exhibited severely decreased binding to R. salmoninarum strain 684 p57 and bound robustly to the p57 proteins of seven other R. salmoninarum strains. This difference in binding was not due to alterations in p57 synthesis, secretion, or bacterial cell association. The molecular basis of the 4C11 epitope loss was determined by amplifying and sequencing the two identical genes encoding p57, msa1 and msa2. The 5′ and coding sequences of the 684 msa1 and msa2 genes were identical to those of the ATCC 33209 msa1and msa2 genes except for a single C-to-A nucleotide mutation. This mutation was identified in both the msa1 and msa2 genes of strain 684 and resulted in an Ala139-to-Glu substitution in the amino-terminal region of p57. We examined whether this mutation in p57 altered salmonid leukocyte and rabbit erythrocyte binding activities. R. salmoninarum strain 684 extracellular protein exhibited a twofold increase in agglutinating activity for chinook salmon leukocytes and rabbit erythrocytes compared to the activity of the ATCC 33209 extracellular protein. A specific and quantitative p57 binding assay confirmed the increased binding activity of 684 p57. Monoclonal antibody 4C11 blocked the agglutinating activity of the ATCC 33209 extracellular protein but not the agglutinating activity of the 684 extracellular protein. These results indicate that the Ala139-to-Glu substitution altered immune recognition and was associated with enhanced biological activity of R. salmoninarum 684 p57.
Variant discovery in the sheep milk transcriptome using RNA sequencing.

PubMed

Suárez-Vega, Aroa; Gutiérrez-Gil, Beatriz; Klopp, Christophe; Tosser-Klopp, Gwenola; Arranz, Juan José

2017-02-15

The identification of genetic variation underlying desired phenotypes is one of the main challenges of current livestock genetic research. High-throughput transcriptome sequencing (RNA-Seq) offers new opportunities for the detection of transcriptome variants (SNPs and short indels) in different tissues and species. In this study, we used RNA-Seq on Milk Sheep Somatic Cells (MSCs) with the goal of characterizing the genetic variation within the coding regions of the milk transcriptome in Churra and Assaf sheep, two common dairy sheep breeds farmed in Spain. A total of 216,637 variants were detected in the MSCs transcriptome of the eight ewes analyzed. Among them, a total of 57,795 variants were detected in the regions harboring Quantitative Trait Loci (QTL) for milk yield, protein percentage and fat percentage, of which 21.44% were novel variants. Among the total variants detected, 561 (2.52%) and 1,649 (7.42%) were predicted to produce high or moderate impact changes in the corresponding transcriptional unit, respectively. In the functional enrichment analysis of the genes positioned within selected QTL regions harboring novel relevant functional variants (high and moderate impact), the KEGG pathway with the highest enrichment was "protein processing in endoplasmic reticulum". Additionally, a total of 504 and 1,063 variants were identified in the genes encoding principal milk proteins and molecules involved in the lipid metabolism, respectively. Of these variants, 20 mutations were found to have putative relevant effects on the encoded proteins. We present herein the first transcriptomic approach aimed at identifying genetic variants of the genes expressed in the lactating mammary gland of sheep. Through the transcriptome analysis of variability within regions harboring QTL for milk yield, protein percentage and fat percentage, we have found several pathways and genes that harbor mutations that could affect dairy production traits. Moreover, remarkable variants were also found in candidate genes coding for major milk proteins and proteins related to milk fat metabolism. Several of the SNPs found in this study could be included as suitable markers in genotyping platforms or custom SNP arrays to perform association analyses in commercial populations and apply genomic selection protocols in the dairy production industry.
Genetic variation and population structure in Jamunapari goats using microsatellites, mitochondrial DNA, and milk protein genes.

PubMed

Rout, P K; Thangraj, K; Mandal, A; Roy, R

2012-01-01

Jamunapari, a dairy goat breed of India, has been gradually declining in numbers in its home tract over the years. We have analysed genetic variation and population history in Jamunapari goats based on 17 microsatellite loci, 2 milk protein loci, mitochondrial hypervariable region I (HVRI) sequencing, and three Y-chromosomal gene sequencing. We used the mitochondrial DNA (mtDNA) mismatch distribution, microsatellite data, and bottleneck tests to infer the population history and demography. The mean number of alleles per locus was 9.0 indicating that the allelic variation was high in all the loci and the mean heterozygosity was 0.769 at nuclear loci. Although the population size is smaller than 8,000 individuals, the amount of variability both in terms of allelic richness and gene diversity was high in all the microsatellite loci except ILST 005. The gene diversity and effective number of alleles at milk protein loci were higher than the 10 other Indian goat breeds that they were compared to. Mismatch analysis was carried out and the analysis revealed that the population curve was unimodal indicating the expansion of population. The genetic diversity of Y-chromosome genes was low in the present study. The observed mean M ratio in the population was above the critical significance value (Mc) and close to one indicating that it has maintained a slowly changing population size. The mode-shift test did not detect any distortion of allele frequency and the heterozygosity excess method showed that there was no significant departure from mutation-drift equilibrium detected in the population. However, the effects of genetic bottlenecks were observed in some loci due to decreased heterozygosity and lower level of M ratio. There were two observed genetic subdivisions in the population supporting the observations of farmers in different areas. This base line information on genetic diversity, bottleneck analysis, and mismatch analysis was obtained to assist the conservation decision and management of the breed.
Genetic Variation and Population Structure in Jamunapari Goats Using Microsatellites, Mitochondrial DNA, and Milk Protein Genes

PubMed Central

Rout, P. K.; Thangraj, K.; Mandal, A.; Roy, R.

2012-01-01

Jamunapari, a dairy goat breed of India, has been gradually declining in numbers in its home tract over the years. We have analysed genetic variation and population history in Jamunapari goats based on 17 microsatellite loci, 2 milk protein loci, mitochondrial hypervariable region I (HVRI) sequencing, and three Y-chromosomal gene sequencing. We used the mitochondrial DNA (mtDNA) mismatch distribution, microsatellite data, and bottleneck tests to infer the population history and demography. The mean number of alleles per locus was 9.0 indicating that the allelic variation was high in all the loci and the mean heterozygosity was 0.769 at nuclear loci. Although the population size is smaller than 8,000 individuals, the amount of variability both in terms of allelic richness and gene diversity was high in all the microsatellite loci except ILST 005. The gene diversity and effective number of alleles at milk protein loci were higher than the 10 other Indian goat breeds that they were compared to. Mismatch analysis was carried out and the analysis revealed that the population curve was unimodal indicating the expansion of population. The genetic diversity of Y-chromosome genes was low in the present study. The observed mean M ratio in the population was above the critical significance value (Mc) and close to one indicating that it has maintained a slowly changing population size. The mode-shift test did not detect any distortion of allele frequency and the heterozygosity excess method showed that there was no significant departure from mutation-drift equilibrium detected in the population. However, the effects of genetic bottlenecks were observed in some loci due to decreased heterozygosity and lower level of M ratio. There were two observed genetic subdivisions in the population supporting the observations of farmers in different areas. This base line information on genetic diversity, bottleneck analysis, and mismatch analysis was obtained to assist the conservation decision and management of the breed. PMID:22606053
Nearly neutral evolution in IFNL3 gene retains the immune function to detect and clear the viral infection in HCV.

PubMed

Singh, Pratichi; Dass, J Febin Prabhu

2018-05-07

IFNL3 gene plays a crucial role in immune defense against viruses. It induces the interferon stimulated genes (ISGs) with antiviral properties by activating the JAK-STAT pathway. In this study, we investigated the evolutionary force involved in shaping the IFNL3 gene to perform its downstream function as a regulatory gene in HCV clearance. We have selected 25 IFNL3 coding sequences with human gene as a reference sequence and constructed a phylogeny. Furthermore, rate of variation, substitution saturation test, phylogenetic informativeness and differential selection were also analysed. The codon evolution result suggests that nearly neutral mutation is the key pattern in shaping the IFNL3 evolution. The results were validated by subjecting the human IFNL3 protein variants to that of the native through a molecular dynamics simulation study. The molecular dynamics simulation clearly depicts the negative impact on the reported variants in human IFNL3 protein. However, these detrimental mutations (R157Q and R157W) were shown to be negatively selected in the evolutionary study of the mammals. Hence, the variation revealed a mild impact on the IFNL3 function and may be removed from the population through negative selection due to its high functional constraints. In a nutshell, our study may contribute the overall evidence in phylotyping and structural transformation that takes place in the non-synonymous substitutions of IFNL3 protein. Substantially, our obtained theoretical knowledge will lay the path to extend the experimental validation in HCV clearance. Copyright © 2018 Elsevier Ltd. All rights reserved.
Trade-offs with stability modulate innate and mutationally acquired drug-resistance in bacterial dihydrofolate reductase enzymes.

PubMed

Matange, Nishad; Bodkhe, Swapnil; Patel, Maitri; Shah, Pooja

2018-06-05

Structural stability is a major constraint on the evolution of protein sequences. However, under strong directional selection, mutations that confer novel phenotypes but compromise structural stability of proteins may be permissible. During the evolution of antibiotic resistance, mutations that confer drug resistance often have pleiotropic effects on the structure and function of antibiotic-target proteins, usually essential metabolic enzymes. In this study, we show that trimethoprim-resistant alleles of dihydrofolate reductase from Escherichia coli (EcDHFR) harbouring the Trp30Gly, Trp30Arg or Trp30Cys mutations are significantly less stable than the wild type making them prone to aggregation and proteolysis. This destabilization is associated with lower expression level resulting in a fitness cost and negative epistasis with other TMP-resistant mutations in EcDHFR. Using structure-based mutational analysis we show that perturbation of critical stabilizing hydrophobic interactions in wild type EcDHFR enzyme explains the phenotypes of Trp30 mutants. Surprisingly, though crucial for the stability of EcDHFR, significant sequence variation is found at this site among bacterial DHFRs. Mutational and computational analyses in EcDHFR as well as in DHFR enzymes from Staphylococcus aureus and Mycobacterium tuberculosis demonstrate that natural variation at this site and its interacting hydrophobic residues, modulates TMP-resistance in other bacterial DHFRs as well, and may explain the different susceptibilities of bacterial pathogens to trimethoprim. Our study demonstrates that trade-offs between structural stability and function can influence innate drug resistance as well as the potential for mutationally acquired drug resistance of an enzyme. ©2018 The Author(s).
Epidemiological survey of idiopathic scoliosis and sequence alignment analysis of multiple candidate genes.

PubMed

Yang, Tao; Jia, Quanzhang; Guo, Hong; Xu, Jianzhong; Bai, Yun; Yang, Kai; Luo, Fei; Zhang, Zehua; Hou, Tianyong

2012-06-01

To investigate the effects of genetic factors on idiopathic scoliosis (IS) and genetic modes through genetic epidemiological survey on IS in Chongqing City, China, and to determine whether SH3GL1, GADD45B, and FGF22 in the chromosome 19p13.3 are the pathogenic genes of IS through genetic sequence analysis. 214 nuclear families were investigated to analyse the age incidence, familial aggregation, and heritability. SH3GL1, GADD45B, and FGF22 were chosen as candidate genes for mutation screening in 56 IS patients of 214 families. The sequence alignment analysis was performed to determine mutations and predict the protein structure. The average age of onset of 10.8 years suggests that IS is a early onset disease. Incidences of IS in first-, second-, third-degree relatives and the overall incidence in families (5.68%) were also significantly higher than that of the general population (1.04%). The U test indicated a significant difference, suggesting that IS has a familial aggregation. The heritability of first-degree relatives (77.68 ±10.39%), second-degree relatives (69.89 ±3.14%), and third-degree relatives (62.14 ±11.92%) illustrated that genetic factors play an important role in IS pathogenesis. The incidence of first-degree relatives (10.01%), second-degree relatives (2.55%) and third-degree relatives (1.76%) illustrated that IS is not in simple accord with monogenic Mendel's law but manifests as traits of multifactorial hereditary diseases. Sequence alignment of exons of SH3GL1, GADD45B, and FGF22 showed 17 base mutations, of which 16 mutations do not induce open reading frame (ORF) shift or amino acid changes whereas one mutation (C→T)occurred in SH3GL1 results in formation of the termination codon, which induces variation of protein reading frame. Prediction analysis of protein sequence showed that the SH3GL1 mutant encoded a truncated protein, thus affecting the protein structure. IS is a multifactorial genetic disease and SH3GL1 may be one of the pathogenic genes for IS.
MHC class I loci of the Bar-Headed goose (Anser indicus)

PubMed Central

2010-01-01

MHC class I proteins mediate functions in anti-pathogen defense. MHC diversity has already been investigated by many studies in model avian species, but here we chose the bar-headed goose, a worldwide migrant bird, as a non-model avian species. Sequences from exons encoding the peptide-binding region (PBR) of MHC class I molecules were isolated from liver genomic DNA, to investigate variation in these genes. These are the first MHC class I partial sequences of the bar-headed goose to be reported. A preliminary analysis suggests the presence of at least four MHC class I genes, which share great similarity with those of the goose and duck. A phylogenetic analysis of bar-headed goose, goose and duck MHC class I sequences using the NJ method supports the idea that they all cluster within the anseriforms clade. PMID:21637434
Database-independent Protein Sequencing (DiPS) Enables Full-length de Novo Protein and Antibody Sequence Determination.

PubMed

Savidor, Alon; Barzilay, Rotem; Elinger, Dalia; Yarden, Yosef; Lindzen, Moshit; Gabashvili, Alexandra; Adiv Tal, Ophir; Levin, Yishai

2017-06-01

Traditional "bottom-up" proteomic approaches use proteolytic digestion, LC-MS/MS, and database searching to elucidate peptide identities and their parent proteins. Protein sequences absent from the database cannot be identified, and even if present in the database, complete sequence coverage is rarely achieved even for the most abundant proteins in the sample. Thus, sequencing of unknown proteins such as antibodies or constituents of metaproteomes remains a challenging problem. To date, there is no available method for full-length protein sequencing, independent of a reference database, in high throughput. Here, we present Database-independent Protein Sequencing, a method for unambiguous, rapid, database-independent, full-length protein sequencing. The method is a novel combination of non-enzymatic, semi-random cleavage of the protein, LC-MS/MS analysis, peptide de novo sequencing, extraction of peptide tags, and their assembly into a consensus sequence using an algorithm named "Peptide Tag Assembler." As proof-of-concept, the method was applied to samples of three known proteins representing three size classes and to a previously un-sequenced, clinically relevant monoclonal antibody. Excluding leucine/isoleucine and glutamic acid/deamidated glutamine ambiguities, end-to-end full-length de novo sequencing was achieved with 99-100% accuracy for all benchmarking proteins and the antibody light chain. Accuracy of the sequenced antibody heavy chain, including the entire variable region, was also 100%, but there was a 23-residue gap in the constant region sequence. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.
Heterogeneity of the Epstein-Barr Virus (EBV) Major Internal Repeat Reveals Evolutionary Mechanisms of EBV and a Functional Defect in the Prototype EBV Strain B95-8.

PubMed

Ba Abdullah, Mohammed M; Palermo, Richard D; Palser, Anne L; Grayson, Nicholas E; Kellam, Paul; Correia, Samantha; Szymula, Agnieszka; White, Robert E

2017-12-01

Epstein-Barr virus (EBV) is a ubiquitous pathogen of humans that can cause several types of lymphoma and carcinoma. Like other herpesviruses, EBV has diversified through both coevolution with its host and genetic exchange between virus strains. Sequence analysis of the EBV genome is unusually challenging because of the large number and lengths of repeat regions within the virus. Here we describe the sequence assembly and analysis of the large internal repeat 1 of EBV (IR1; also known as the BamW repeats) for more than 70 strains. The diversity of the latency protein EBV nuclear antigen leader protein (EBNA-LP) resides predominantly within the exons downstream of IR1. The integrity of the putative BWRF1 open reading frame (ORF) is retained in over 80% of strains, and deletions truncating IR1 always spare BWRF1. Conserved regions include the IR1 latency promoter (Wp) and one zone upstream of and two within BWRF1. IR1 is heterogeneous in 70% of strains, and this heterogeneity arises from sequence exchange between strains as well as from spontaneous mutation, with interstrain recombination being more common in tumor-derived viruses. This genetic exchange often incorporates regions of <1 kb, and allelic gene conversion changes the frequency of small regions within the repeat but not close to the flanks. These observations suggest that IR1-and, by extension, EBV-diversifies through both recombination and breakpoint repair, while concerted evolution of IR1 is driven by gene conversion of small regions. Finally, the prototype EBV strain B95-8 contains four nonconsensus variants within a single IR1 repeat unit, including a stop codon in the EBNA-LP gene. Repairing IR1 improves EBNA-LP levels and the quality of transformation by the B95-8 bacterial artificial chromosome (BAC). IMPORTANCE Epstein-Barr virus (EBV) infects the majority of the world population but causes illness in only a small minority of people. Nevertheless, over 1% of cancers worldwide are attributable to EBV. Recent sequencing projects investigating virus diversity to see if different strains have different disease impacts have excluded regions of repeating sequence, as they are more technically challenging. Here we analyze the sequence of the largest repeat in EBV (IR1). We first characterized the variations in protein sequences encoded across IR1. In studying variations within the repeat of each strain, we identified a mutation in the main laboratory strain of EBV that impairs virus function, and we suggest that tumor-associated viruses may be more likely to contain DNA mixed from two strains. The patterns of this mixing suggest that sequences can spread between strains (and also within the repeat) by copying sequence from another strain (or repeat unit) to repair DNA damage. Copyright © 2017 Ba abdullah et al.
Exploration of sequence space as the basis of viral RNA genome segmentation.

PubMed

Moreno, Elena; Ojosnegros, Samuel; García-Arriaza, Juan; Escarmís, Cristina; Domingo, Esteban; Perales, Celia

2014-05-06

The mechanisms of viral RNA genome segmentation are unknown. On extensive passage of foot-and-mouth disease virus in baby hamster kidney-21 cells, the virus accumulated multiple point mutations and underwent a transition akin to genome segmentation. The standard single RNA genome molecule was replaced by genomes harboring internal in-frame deletions affecting the L- or capsid-coding region. These genomes were infectious and killed cells by complementation. Here we show that the point mutations in the nonstructural protein-coding region (P2, P3) that accumulated in the standard genome before segmentation increased the relative fitness of the segmented version relative to the standard genome. Fitness increase was documented by intracellular expression of virus-coded proteins and infectious progeny production by RNAs with the internal deletions placed in the sequence context of the parental and evolved genome. The complementation activity involved several viral proteins, one of them being the leader proteinase L. Thus, a history of genetic drift with accumulation of point mutations was needed to allow a major variation in the structure of a viral genome. Thus, exploration of sequence space by a viral genome (in this case an unsegmented RNA) can reach a point of the space in which a totally different genome structure (in this case, a segmented RNA) is favored over the form that performed the exploration.
Functional characterization of recombinant bromelain of Ananas comosus expressed in a prokaryotic system.

PubMed

George, Susan; Bhasker, Salini; Madhav, Harish; Nair, Archana; Chinnamma, Mohankumar

2014-02-01

Bromelain (BRM) is a defense protein present in the fruit and stem of pineapple (Ananas comosus) and it is grouped as a cysteine protease enzyme with diversified medicinal uses. Based on its therapeutic applications, bromelain has got sufficient attention in pharmaceutical industries. In the present study, the full coding gene of bromelain in pineapple stem (1,093 bp) was amplified by RT-PCR. The PCR product was cloned, sequenced, and characterized. The sequence analysis of the gene revealed the single nucleotide polymorphism and its phylogenetic relatedness. The peptide sequence deduced from the gene showed the amino acid variations, physicochemical properties and secondary and tertiary structural features of the protein. The full BRM gene was transformed to prokaryotic vector pET32b and expressed in Escherichia coli BL21 DE3pLysS host cells successfully. The identity of the recombinant bromelain (rBRM) protein was confirmed by Western blot analysis using anti-BRM-rabbit IgG antibody. The activity of recombinant bromelain compared with purified native bromelain was determined by protease assay. The inhibitory effect of rBRM compared with native BRM in the growth of Gram-positive and Gram-negative strains of Streptococcus agalactiae and Escherichia coli O111 was evident from the antibacterial sensitivity test. To the best of our knowledge, this is the first report showing the bactericidal property of rBRM expressed in a prokaryotic system.
Evaluation of the Contributions of Individual Viral Genes to Newcastle Disease Virus Virulence and Pathogenesis

PubMed Central

Paldurai, Anandan; Kim, Shin-Hee; Nayak, Baibaswata; Xiao, Sa; Shive, Heather; Collins, Peter L.

2014-01-01

ABSTRACT Naturally occurring Newcastle disease virus (NDV) strains vary greatly in virulence. The presence of multibasic residues at the proteolytic cleavage site of the fusion (F) protein has been shown to be a primary determinant differentiating virulent versus avirulent strains. However, there is wide variation in virulence among virulent strains. There also are examples of incongruity between cleavage site sequence and virulence. These observations suggest that additional viral factors contribute to virulence. In this study, we evaluated the contribution of each viral gene to virulence individually and in different combinations by exchanging genes between velogenic (highly virulent) strain GB Texas (GBT) and mesogenic (moderately virulent) strain Beaudette C (BC). These two strains are phylogenetically closely related, and their F proteins contain identical cleavage site sequences, 112RRQKR↓F117. A total of 20 chimeric viruses were constructed and evaluated in vitro, in 1-day-old chicks, and in 2-week-old chickens. The results showed that both the envelope-associated and polymerase-associated proteins contribute to the difference in virulence between rBC and rGBT, with the envelope-associated proteins playing the greater role. The F protein was the major individual contributor and was sometimes augmented by the homologous M and HN proteins. The dramatic effect of F was independent of its cleavage site sequence since that was identical in the two strains. The polymerase L protein was the next major individual contributor and was sometimes augmented by the homologous N and P proteins. The leader and trailer regions did not appear to contribute to the difference in virulence between BC and GBT. IMPORTANCE This study is the first comprehensive and systematic study of NDV virulence and pathogenesis. Genetic exchanges between a mesogenic and a velogenic strain revealed that the fusion glycoprotein is the major virulence determinant regardless of the identical virulence protease cleavage site sequence present in both strains. The contribution of the large polymerase protein to NDV virulence is second only to that of the fusion glycoprotein. The identification of virulence determinants is of considerable importance, because of the potential to generate better live attenuated NDV vaccines. It may also be possible to apply these findings to other paramyxoviruses. PMID:24850737
FOXP2 variation in great ape populations offers insight into the evolution of communication skills.

PubMed

Staes, Nicky; Sherwood, Chet C; Wright, Katharine; de Manuel, Marc; Guevara, Elaine E; Marques-Bonet, Tomas; Krützen, Michael; Massiah, Michael; Hopkins, William D; Ely, John J; Bradley, Brenda J

2017-12-04

The gene coding for the forkhead box protein P2 (FOXP2) is associated with human language disorders. Evolutionary changes in this gene are hypothesized to have contributed to the emergence of speech and language in the human lineage. Although FOXP2 is highly conserved across most mammals, humans differ at two functional amino acid substitutions from chimpanzees, bonobos and gorillas, with an additional fixed substitution found in orangutans. However, FOXP2 has been characterized in only a small number of apes and no publication to date has examined the degree of natural variation in large samples of unrelated great apes. Here, we analyzed the genetic variation in the FOXP2 coding sequence in 63 chimpanzees, 11 bonobos, 48 gorillas, 37 orangutans and 2 gibbons and observed undescribed variation in great apes. We identified two variable polyglutamine microsatellites in chimpanzees and orangutans and found three nonsynonymous single nucleotide polymorphisms, one in chimpanzees, one in gorillas and one in orangutans with derived allele frequencies of 0.01, 0.26 and 0.29, respectively. Structural and functional protein modeling indicate a biochemical effect of the substitution in orangutans, and because of its presence solely in the Sumatran orangutan species, the mutation may be associated with reported population differences in vocalizations.
Sequence Complexity of Amyloidogenic Regions in Intrinsically Disordered Human Proteins

PubMed Central

Das, Swagata; Pal, Uttam; Das, Supriya; Bagga, Khyati; Roy, Anupam; Mrigwani, Arpita; Maiti, Nakul C.

2014-01-01

An amyloidogenic region (AR) in a protein sequence plays a significant role in protein aggregation and amyloid formation. We have investigated the sequence complexity of AR that is present in intrinsically disordered human proteins. More than 80% human proteins in the disordered protein databases (DisProt+IDEAL) contained one or more ARs. With decrease of protein disorder, AR content in the protein sequence was decreased. A probability density distribution analysis and discrete analysis of AR sequences showed that ∼8% residue in a protein sequence was in AR and the region was in average 8 residues long. The residues in the AR were high in sequence complexity and it seldom overlapped with low complexity regions (LCR), which was largely abundant in disorder proteins. The sequences in the AR showed mixed conformational adaptability towards α-helix, β-sheet/strand and coil conformations. PMID:24594841

Whole exome sequencing of rare variants in EIF4G1 and VPS35 in Parkinson disease

PubMed Central

Nuytemans, Karen; Bademci, Guney; Inchausti, Vanessa; Dressen, Amy; Kinnamon, Daniel D.; Mehta, Arpit; Wang, Liyong; Züchner, Stephan; Beecham, Gary W.; Martin, Eden R.; Scott, William K.

2013-01-01

Objective: Recently, vacuolar protein sorting 35 (VPS35) and eukaryotic translation initiation factor 4 gamma 1 (EIF4G1) have been identified as 2 causal Parkinson disease (PD) genes. We used whole exome sequencing for rapid, parallel analysis of variations in these 2 genes. Methods: We performed whole exome sequencing in 213 patients with PD and 272 control individuals. Those rare variants (RVs) with <5% frequency in the exome variant server database and our own control data were considered for analysis. We performed joint gene-based tests for association using RVASSOC and SKAT (Sequence Kernel Association Test) as well as single-variant test statistics. Results: We identified 3 novel VPS35 variations that changed the coded amino acid (nonsynonymous) in 3 cases. Two variations were in multiplex families and neither segregated with PD. In EIF4G1, we identified 11 (9 nonsynonymous and 2 small indels) RVs including the reported pathogenic mutation p.R1205H, which segregated in all affected members of a large family, but also in 1 unaffected 86-year-old family member. Two additional RVs were found in isolated patients only. Whereas initial association studies suggested an association (p = 0.04) with all RVs in EIF4G1, subsequent testing in a second dataset for the driving variant (p.F1461) suggested no association between RVs in the gene and PD. Conclusions: We confirm that the specific EIF4G1 variation p.R1205H seems to be a strong PD risk factor, but is nonpenetrant in at least one 86-year-old. A few other select RVs in both genes could not be ruled out as causal. However, there was no evidence for an overall contribution of genetic variability in VPS35 or EIF4G1 to PD development in our dataset. PMID:23408866
An Artificial Functional Family Filter in Homolog Searching in Next-generation Sequencing Metagenomics

PubMed Central

Du, Ruofei; Mercante, Donald; Fang, Zhide

2013-01-01

In functional metagenomics, BLAST homology search is a common method to classify metagenomic reads into protein/domain sequence families such as Clusters of Orthologous Groups of proteins (COGs) in order to quantify the abundance of each COG in the community. The resulting functional profile of the community is then used in downstream analysis to correlate the change in abundance to environmental perturbation, clinical variation, and so on. However, the short read length coupled with next-generation sequencing technologies poses a barrier in this approach, essentially because similarity significance cannot be discerned by searching with short reads. Consequently, artificial functional families are produced, in which those with a large number of reads assigned decreases the accuracy of functional profile dramatically. There is no method available to address this problem. We intended to fill this gap in this paper. We revealed that BLAST similarity scores of homologues for short reads from COG protein members coding sequences are distributed differently from the scores of those derived elsewhere. We showed that, by choosing an appropriate score cut-off, we are able to filter out most artificial families and simultaneously to preserve sufficient information in order to build the functional profile. We also showed that, by incorporated application of BLAST and RPS-BLAST, some artificial families with large read counts can be further identified after the score cutoff filtration. Evaluated on three experimental metagenomic datasets with different coverages, we found that the proposed method is robust against read coverage and consistently outperforms the other E-value cutoff methods currently used in literatures. PMID:23516532
New enzymes from environmental cassette arrays: Functional attributes of a phosphotransferase and an RNA-methyltransferase

PubMed Central

Nield, Blair S.; Willows, Robert D.; Torda, Andrew E.; Gillings, Michael R.; Holmes, Andrew J.; Nevalainen, K.M. Helena; Stokes, H.W.; Mabbutt, Bridget C.

2004-01-01

By targeting gene cassettes by polymerase chain reaction (PCR) directly from environmentally derived DNA, we are able to amplify entire open reading frames (ORFs) independently of prior sequence knowledge. Approximately 10% of the mobile genes recovered by these means can be attributed to known protein families. Here we describe the characterization of two ORFs which show moderate homology to known proteins: (1) an aminoglycoside phosphotransferase displaying 25% sequence identity with APH(7″) from Streptomyces hygroscopicus, and (2) an RNA methyltransferase sharing 25%–28% identity with a group of recently defined bacterial RNA methyltransferases distinct from the SpoU enzyme family. Our novel genes were expressed as recombinant products and assayed for appropriate enzyme activity. The aminoglycoside phosphotransferase displayed ATPase activity, consistent with the presence of characteristic Mg2+-binding residues. Unlike related APH(4) or APH(7″) enzymes, however, this activity was not enhanced by hygromycin B or kanamycin, suggesting the normal substrate to be a different aminoglycoside. The RNA methyltransferase contains sequence motifs of the RNA methyltransferase superfamily, and our recombinant version showed methyltransferase activity with RNA. Our data confirm that gene cassettes present in the environment encode folded enzymes with novel sequence variation and demonstrable catalytic activity. Our PCR approach (cassette PCR) may be used to identify a diverse range of ORFs from any environmental sample, as well as to directly access the gene pool found in mobile gene cassettes commonly associated with integrons. This gene pool can be accessed from both cultured and uncultured microbial samples as a source of new enzymes and proteins. PMID:15152095
New enzymes from environmental cassette arrays: functional attributes of a phosphotransferase and an RNA-methyltransferase.

PubMed

Nield, Blair S; Willows, Robert D; Torda, Andrew E; Gillings, Michael R; Holmes, Andrew J; Nevalainen, K M Helena; Stokes, H W; Mabbutt, Bridget C

2004-06-01

By targeting gene cassettes by polymerase chain reaction (PCR) directly from environmentally derived DNA, we are able to amplify entire open reading frames (ORFs) independently of prior sequence knowledge. Approximately 10% of the mobile genes recovered by these means can be attributed to known protein families. Here we describe the characterization of two ORFs which show moderate homology to known proteins: (1) an aminoglycoside phosphotransferase displaying 25% sequence identity with APH(7") from Streptomyces hygroscopicus, and (2) an RNA methyltransferase sharing 25%-28% identity with a group of recently defined bacterial RNA methyltransferases distinct from the SpoU enzyme family. Our novel genes were expressed as recombinant products and assayed for appropriate enzyme activity. The aminoglycoside phosphotransferase displayed ATPase activity, consistent with the presence of characteristic Mg(2+)-binding residues. Unlike related APH(4) or APH(7") enzymes, however, this activity was not enhanced by hygromycin B or kanamycin, suggesting the normal substrate to be a different aminoglycoside. The RNA methyltransferase contains sequence motifs of the RNA methyltransferase superfamily, and our recombinant version showed methyltransferase activity with RNA. Our data confirm that gene cassettes present in the environment encode folded enzymes with novel sequence variation and demonstrable catalytic activity. Our PCR approach (cassette PCR) may be used to identify a diverse range of ORFs from any environmental sample, as well as to directly access the gene pool found in mobile gene cassettes commonly associated with integrons. This gene pool can be accessed from both cultured and uncultured microbial samples as a source of new enzymes and proteins.
THE SMALL ACID SOLUBLE PROTEINS (SASP α and SASP β) OF BACILLUS WEIHENSTEPHANENSIS AND B. MYCOIDES GROUP 2 ARE THE MOST DISTINCT AMONG THE B. CEREUS GROUP

PubMed Central

Callahan, Courtney; Fox, Karen; Fox, Alvin

2009-01-01

The Bacillus cereus group includes Bacillus anthracis, Bacillus cereus, Bacillus thuringiensis, Bacillus mycoides and Bacillus weihenstephanensis. The small acid-soluble spore protein (SASP) β has been previously demonstrated to be among the biomarkers differentiating B. anthracis and B. cereus; SASP β of B. cereus most commonly exhibits one or two amino acid substitutions when compared to B. anthracis. SASP α is conserved in sequence among these two species. Neither SASP α nor β for B. thuringiensis, B. mycoides and B. weihenstephanensis have been previously characterized as taxonomic discriminators. In the current work molecular weight (MW) variation of these SASPs were determined by matrix assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI TOF MS) for representative strains of the 5 species within the B. cereus group. The measured MWs also correlate with calculated MWs of translated amino acid sequences generated from whole genome sequencing projects. SASP α and β demonstrated consistent MW among B. cereus, B. thuringiensis, and B. mycoides strains (group 1). However B. mycoides (group 2) and B. weihenstephanensis SASP α and β were quite distinct making them unique among the B. cereus group. Limited sequence changes were observed in SASP α (at most 3 substitutions and 2 deletions) indicating it is a more conserved protein than SASP β (up to 6 substitutions and a deletion). Another even more conserved SASP, SASP α-β type, was described here for the first time. PMID:19616612
The evolution of subtype B HIV-1 tat in the Netherlands during 1985-2012.

PubMed

van der Kuyl, Antoinette C; Vink, Monique; Zorgdrager, Fokla; Bakker, Margreet; Wymant, Chris; Hall, Matthew; Gall, Astrid; Blanquart, François; Berkhout, Ben; Fraser, Christophe; Cornelissen, Marion

2018-05-02

For the production of viral genomic RNA, HIV-1 is dependent on an early viral protein, Tat, which is required for high-level transcription. The quantity of viral RNA detectable in blood of HIV-1 infected individuals varies dramatically, and a factor involved could be the efficiency of Tat protein variants to stimulate RNA transcription. HIV-1 virulence, measured by set-point viral load, has been observed to increase over time in the Netherlands and elsewhere. Investigation of tat gene evolution in clinical isolates could discover a role of Tat in this changing virulence. A dataset of 291 Dutch HIV-1 subtype B tat genes, derived from full-length HIV-1 genome sequences from samples obtained between 1985-2012, was used to analyse the evolution of Tat. Twenty-two patient-derived tat genes, and the control Tat HXB2 were analysed for their capacity to stimulate expression of an LTR-luciferase reporter gene construct in diverse cell lines, as well as for their ability to complement a tat-defective HIV-1 LAI clone. Analysis of 291 historical tat sequences from the Netherlands showed ample amino acid (aa) variation between isolates, although no specific mutations were selected for over time. Of note, however, the encoded protein varied its length over the years through the loss or gain of stop codons in the second exon. In transmission clusters, a selection against the shorter Tat86 ORF was apparent in favour of the more common Tat101 version, likely due to negative selection against Tat86 itself, although random drift, transmission bottlenecks, or linkage to other variants could also explain the observation. There was no correlation between Tat length and set-point viral load; however, the number of non-intermediate variants in our study was small. In addition, variation in the length of Tat did not significantly change its capacity to stimulate transcription. From 1985 till 2012, variation in the length of the HIV-1 subtype B tat gene is increasingly found in the Dutch epidemic. However, as Tat proteins did not differ significantly in their capacity to stimulate transcription elongation in vitro, the increased HIV-1 virulence seen in recent years could not be linked to an evolving viral Tat protein. Crown Copyright © 2018. Published by Elsevier B.V. All rights reserved.
An integrative strategy to identify the entire protein coding potential of prokaryotic genomes by proteogenomics.

PubMed

Omasits, Ulrich; Varadarajan, Adithi R; Schmid, Michael; Goetze, Sandra; Melidis, Damianos; Bourqui, Marc; Nikolayeva, Olga; Québatte, Maxime; Patrignani, Andrea; Dehio, Christoph; Frey, Juerg E; Robinson, Mark D; Wollscheid, Bernd; Ahrens, Christian H

2017-12-01

Accurate annotation of all protein-coding sequences (CDSs) is an essential prerequisite to fully exploit the rapidly growing repertoire of completely sequenced prokaryotic genomes. However, large discrepancies among the number of CDSs annotated by different resources, missed functional short open reading frames (sORFs), and overprediction of spurious ORFs represent serious limitations. Our strategy toward accurate and complete genome annotation consolidates CDSs from multiple reference annotation resources, ab initio gene prediction algorithms and in silico ORFs (a modified six-frame translation considering alternative start codons) in an integrated proteogenomics database (iPtgxDB) that covers the entire protein-coding potential of a prokaryotic genome. By extending the PeptideClassifier concept of unambiguous peptides for prokaryotes, close to 95% of the identifiable peptides imply one distinct protein, largely simplifying downstream analysis. Searching a comprehensive Bartonella henselae proteomics data set against such an iPtgxDB allowed us to unambiguously identify novel ORFs uniquely predicted by each resource, including lipoproteins, differentially expressed and membrane-localized proteins, novel start sites and wrongly annotated pseudogenes. Most novelties were confirmed by targeted, parallel reaction monitoring mass spectrometry, including unique ORFs and single amino acid variations (SAAVs) identified in a re-sequenced laboratory strain that are not present in its reference genome. We demonstrate the general applicability of our strategy for genomes with varying GC content and distinct taxonomic origin. We release iPtgxDBs for B. henselae , Bradyrhizobium diazoefficiens and Escherichia coli and the software to generate both proteogenomics search databases and integrated annotation files that can be viewed in a genome browser for any prokaryote. © 2017 Omasits et al.; Published by Cold Spring Harbor Laboratory Press.
Novel Sequence-Based Mapping of Recently Emerging H5NX Influenza Viruses Reveals Pandemic Vaccine Candidates

PubMed Central

Anderson, Christopher S.; DeDiego, Marta L.; Thakar, Juilee; Topham, David J.

2016-01-01

Recently, an avian influenza virus, H5NX subclade 2.3.4.4, emerged and spread to North America. This subclade has frequently reassorted, leading to multiple novel viruses capable of human infection. Four cases of human infections, three leading to death, have already occurred. Existing vaccine strains do not protect against these new viruses, raising a need to identify new vaccine candidate strains. We have developed a novel sequence-based mapping (SBM) tool capable of visualizing complex protein sequence data sets using a single intuitive map. We applied SBM on the complete set of avian H5 viruses in order to better understand hemagglutinin protein variance amongst H5 viruses and identify any patterns associated with this variation. The analysis successfully identified the original reassortments that lead to the emergence of this new subclade of H5 viruses, as well as their known unusual ability to re-assort among neuraminidase subtypes. In addition, our analysis revealed distinct clusters of 2.3.4.4 variants that would not be covered by existing strains in the H5 vaccine stockpile. The results suggest that our method may be useful for pandemic candidate vaccine virus selection. PMID:27494186
Phyloscan: locating transcription-regulating binding sites in mixed aligned and unaligned sequence data.

PubMed

Palumbo, Michael J; Newberg, Lee A

2010-07-01

The transcription of a gene from its DNA template into an mRNA molecule is the first, and most heavily regulated, step in gene expression. Especially in bacteria, regulation is typically achieved via the binding of a transcription factor (protein) or small RNA molecule to the chromosomal region upstream of a regulated gene. The protein or RNA molecule recognizes a short, approximately conserved sequence within a gene's promoter region and, by binding to it, either enhances or represses expression of the nearby gene. Since the sought-for motif (pattern) is short and accommodating to variation, computational approaches that scan for binding sites have trouble distinguishing functional sites from look-alikes. Many computational approaches are unable to find the majority of experimentally verified binding sites without also finding many false positives. Phyloscan overcomes this difficulty by exploiting two key features of functional binding sites: (i) these sites are typically more conserved evolutionarily than are non-functional DNA sequences; and (ii) these sites often occur two or more times in the promoter region of a regulated gene. The website is free and open to all users, and there is no login requirement. Address: (http://bayesweb.wadsworth.org/phyloscan/).
Complete Chloroplast Genome of Pinus massoniana (Pinaceae): Gene Rearrangements, Loss of ndh Genes, and Short Inverted Repeats Contraction, Expansion.

PubMed

Ni, ZhouXian; Ye, YouJu; Bai, Tiandao; Xu, Meng; Xu, Li-An

2017-09-11

The chloroplast genome (CPG) of Pinus massoniana belonging to the genus Pinus (Pinaceae), which is a primary source of turpentine, was sequenced and analyzed in terms of gene rearrangements, ndh genes loss, and the contraction and expansion of short inverted repeats (IRs). P. massoniana CPG has a typical quadripartite structure that includes large single copy (LSC) (65,563 bp), small single copy (SSC) (53,230 bp) and two IRs (IRa and IRb, 485 bp). The 108 unique genes were identified, including 73 protein-coding genes, 31 tRNAs, and 4 rRNAs. Most of the 81 simple sequence repeats (SSRs) identified in CPG were mononucleotides motifs of A/T types and located in non-coding regions. Comparisons with related species revealed an inversion (21,556 bp) in the LSC region; P. massoniana CPG lacks all 11 intact ndh genes (four ndh genes lost completely; the five remained truncated as pseudogenes; and the other two ndh genes remain as pseudogenes because of short insertions or deletions). A pair of short IRs was found instead of large IRs, and size variations among pine species were observed, which resulted from short insertions or deletions and non-synchronized variations between "IRa" and "IRb". The results of phylogenetic analyses based on whole CPG sequences of 16 conifers indicated that the whole CPG sequences could be used as a powerful tool in phylogenetic analyses.
The attachment of α -synuclein to a fiber: A coarse-grain approach

NASA Astrophysics Data System (ADS)

Ilie, Ioana M.; den Otter, Wouter K.; Briels, Wim J.

2017-03-01

We present simulations of the amyloidogenic core of α-synuclein, the protein causing Parkinson's disease, as a short chain of coarse-grain patchy particles. Each particle represents a sequence of about a dozen amino acids. The fluctuating secondary structure of this intrinsically disordered protein is modelled by dynamic variations of the shape and interaction characteristics of the patchy particles, ranging from spherical with weak isotropic attractions for the disordered state to spherocylindrical with strong directional interactions for a β-sheet. Flexible linkers between the particles enable sampling of the tertiary structure. This novel model is applied here to study the growth of an amyloid fibril, by calculating the free energy profile of a protein attaching to the end of a fibril. The simulation results suggest that the attaching protein readily becomes trapped in a mis-folded state, thereby inhibiting further growth of the fibril until the protein has readjusted to conform to the fibril structure, in line with experimental findings and previous simulations on small fragments of other proteins.
Protein design on computers. Five new proteins: Shpilka, Grendel, Fingerclasp, Leather, and Aida.

PubMed

Sander, C; Vriend, G; Bazan, F; Horovitz, A; Nakamura, H; Ribas, L; Finkelstein, A V; Lockhart, A; Merkl, R; Perry, L J

1992-02-01

What is the current state of the art in protein design? This question was approached in a recent two-week protein design workshop sponsored by EMBO and held at the EMBL in Heidelberg. The goals were to test available design tools and to explore new design strategies. Five novel proteins were designed: Shpilka, a sandwich of two four-stranded beta-sheets, a scaffold on which to explore variations in loop topology; Grendel, a four-helical membrane anchor, ready for fusion to water-soluble functional domains; Finger-clasp, a dimer of interdigitating beta-beta-alpha units, the simplest variant of the "handshake" structural class; Aida, an antibody binding surface intended to be specific for flavodoxin; Leather--a minimal NAD binding domain, extracted from a larger protein. Each design is available as a set of three-dimensional coordinates, the corresponding amino acid sequence and a set of analytical results. The designs are placed in the public domain for scrutiny, improvement, and possible experimental verification.
Genetic stability of a dengue vaccine based on chimeric yellow fever/dengue viruses.

PubMed

Mantel, N; Girerd, Y; Geny, C; Bernard, I; Pontvianne, J; Lang, J; Barban, V

2011-09-02

A tetravalent dengue vaccine based on four live, attenuated, chimeric viruses (CYD1-4), constructed by replacing the genes coding for premembrane (prM) and envelope (E) proteins of the yellow fever (YF)-17D vaccine strain with those of the four serotypes of dengue virus, is in clinical phase III evaluation. We assessed the vaccine's genetic stability by fully sequencing each vaccine virus throughout the development and manufacturing process. The four viruses displayed complete genetic stability, with no change from premaster seed lots to bulk lots. When pursuing the virus growth beyond bulk lots, a few genetic variations were observed. Usually both the initial nucleotide and the new one persisted, and mutations appeared after a relatively high number of virus duplication cycles (65-200, depending on position). Variations were concentrated in the prM-E and non-structural (NS)4B regions. PrM-E variations had no impact on lysis-plaque size or neurovirulence in mice. None of the variations located in the YF-17D-derived genes corresponded with reversion to the wild-type Yellow Fever sequence. Variations in NS4B likely reflect virus adaptation to Vero cells growth. A low to undetectable viremia has been reported previously [1-3] in vaccinated non-human and human primates. Combined with the data reported here about the genetic stability of the vaccine strains, the probability of in vivo emergence of mutant viruses appears very low. Copyright © 2011 Elsevier Ltd. All rights reserved.
Rare loss-of-function mutations in ANGPTL family members contribute to plasma triglyceride levels in humans

PubMed Central

Romeo, Stefano; Yin, Wu; Kozlitina, Julia; Pennacchio, Len A.; Boerwinkle, Eric; Hobbs, Helen H.; Cohen, Jonathan C.

2008-01-01

The relative activity of lipoprotein lipase (LPL) in different tissues controls the partitioning of lipoprotein-derived fatty acids between sites of fat storage (adipose tissue) and oxidation (heart and skeletal muscle). Here we used a reverse genetic strategy to test the hypothesis that 4 angiopoietin-like proteins (ANGPTL3, -4, -5, and -6) play key roles in triglyceride (TG) metabolism in humans. We re-sequenced the coding regions of the genes encoding these proteins and identified multiple rare nonsynonymous (NS) sequence variations that were associated with low plasma TG levels but not with other metabolic phenotypes. Functional studies revealed that all mutant alleles of ANGPTL3 and ANGPTL4 that were associated with low plasma TG levels interfered either with the synthesis or secretion of the protein or with the ability of the ANGPTL protein to inhibit LPL. A total of 1% of the Dallas Heart Study population and 4% of those participants with a plasma TG in the lowest quartile had a rare loss-of-function mutation in ANGPTL3, ANGPTL4, or ANGPTL5. Thus, ANGPTL3, ANGPTL4, and ANGPTL5, but not ANGPTL6, play nonredundant roles in TG metabolism, and multiple alleles at these loci cumulatively contribute to variability in plasma TG levels in humans. PMID:19075393
MPIC: a mitochondrial protein import components database for plant and non-plant species.

PubMed

Murcha, Monika W; Narsai, Reena; Devenish, James; Kubiszewski-Jakubiak, Szymon; Whelan, James

2015-01-01

In the 2 billion years since the endosymbiotic event that gave rise to mitochondria, variations in mitochondrial protein import have evolved across different species. With the genomes of an increasing number of plant species sequenced, it is possible to gain novel insights into mitochondrial protein import pathways. We have generated the Mitochondrial Protein Import Components (MPIC) Database (DB; http://www.plantenergy.uwa.edu.au/applications/mpic) providing searchable information on the protein import apparatus of plant and non-plant mitochondria. An in silico analysis was carried out, comparing the mitochondrial protein import apparatus from 24 species representing various lineages from Saccharomyces cerevisiae (yeast) and algae to Homo sapiens (human) and higher plants, including Arabidopsis thaliana (Arabidopsis), Oryza sativa (rice) and other more recently sequenced plant species. Each of these species was extensively searched and manually assembled for analysis in the MPIC DB. The database presents an interactive diagram in a user-friendly manner, allowing users to select their import component of interest. The MPIC DB presents an extensive resource facilitating detailed investigation of the mitochondrial protein import machinery and allowing patterns of conservation and divergence to be recognized that would otherwise have been missed. To demonstrate the usefulness of the MPIC DB, we present a comparative analysis of the mitochondrial protein import machinery in plants and non-plant species, revealing plant-specific features that have evolved. © The Author 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Temperature inducible β-sheet structure in the transactivation domains of retroviral regulatory proteins of the Rev family

NASA Astrophysics Data System (ADS)

Thumb, Werner; Graf, Christine; Parslow, Tristram; Schneider, Rainer; Auer, Manfred

1999-11-01

The interaction of the human immunodeficiency virus type 1 (HIV-1) regulatory protein Rev with cellular cofactors is crucial for the viral life cycle. The HIV-1 Rev transactivation domain is functionally interchangeable with analog regions of Rev proteins of other retroviruses suggesting common folding patterns. In order to obtain experimental evidence for similar structural features mediating protein-protein contacts we investigated activation domain peptides from HIV-1, HIV-2, VISNA virus, feline immunodeficiency virus (FIV) and equine infectious anemia virus (EIAV) by CD spectroscopy, secondary structure prediction and sequence analysis. Although different in polarity and hydrophobicity, all peptides showed a similar behavior with respect to solution conformation, concentration dependence and variations in ionic strength and pH. Temperature studies revealed an unusual induction of β-structure with rising temperatures in all activation domain peptides. The high stability of β-structure in this region was demonstrated in three different peptides of the activation domain of HIV-1 Rev in solutions containing 40% hexafluoropropanol, a reagent usually known to induce α-helix into amino acid sequences. Sequence alignments revealed similarities between the polar effector domains from FIV and EIAV and the leucine rich (hydrophobic) effector domains found in HIV-1, HIV-2 and VISNA. Studies on activation domain peptides of two dominant negative HIV-1 Rev mutants, M10 and M32, pointed towards different reasons for the biological behavior. Whereas the peptide containing the M10 mutation (L 78E 79→D 78L 79) showed wild-type structure, the M32 mutant peptide (L 78L 81L 83→A 78A 81A 83) revealed a different protein fold to be the reason for the disturbed binding to cellular cofactors. From our data, we conclude, that the activation domain of Rev proteins from different viral origins adopt a similar fold and that a β-structural element is involved in binding to a cellular cofactor.
High-Throughput Sequencing and Copy Number Variation Detection Using Formalin Fixed Embedded Tissue in Metastatic Gastric Cancer

PubMed Central

Hong, Min Eui; Do, In-Gu; Kang, So Young; Ha, Sang Yun; Kim, Seung Tae; Park, Se Hoon; Kang, Won Ki; Choi, Min-Gew; Lee, Jun Ho; Sohn, Tae Sung; Bae, Jae Moon; Kim, Sung; Kim, Duk-Hwan; Kim, Kyoung-Mee

2014-01-01

In the era of targeted therapy, mutation profiling of cancer is a crucial aspect of making therapeutic decisions. To characterize cancer at a molecular level, the use of formalin-fixed paraffin-embedded tissue is important. We tested the Ion AmpliSeq Cancer Hotspot Panel v2 and nCounter Copy Number Variation Assay in 89 formalin-fixed paraffin-embedded gastric cancer samples to determine whether they are applicable in archival clinical samples for personalized targeted therapies. We validated the results with Sanger sequencing, real-time quantitative PCR, fluorescence in situ hybridization and immunohistochemistry. Frequently detected somatic mutations included TP53 (28.17%), APC (10.1%), PIK3CA (5.6%), KRAS (4.5%), SMO (3.4%), STK11 (3.4%), CDKN2A (3.4%) and SMAD4 (3.4%). Amplifications of HER2, CCNE1, MYC, KRAS and EGFR genes were observed in 8 (8.9%), 4 (4.5%), 2 (2.2%), 1 (1.1%) and 1 (1.1%) cases, respectively. In the cases with amplification, fluorescence in situ hybridization for HER2 verified gene amplification and immunohistochemistry for HER2, EGFR and CCNE1 verified the overexpression of proteins in tumor cells. In conclusion, we successfully performed semiconductor-based sequencing and nCounter copy number variation analyses in formalin-fixed paraffin-embedded gastric cancer samples. High-throughput screening in archival clinical samples enables faster, more accurate and cost-effective detection of hotspot mutations or amplification in genes. PMID:25372287
The Lyssavirus glycoprotein: A key to cross-immunity.

PubMed

Buthelezi, Sindisiwe G; Dirr, Heini W; Chakauya, Ereck; Chikwamba, Rachel; Martens, Lennart; Tsekoa, Tsepo L; Stoychev, Stoyan H; Vandermarliere, Elien

2016-11-01

Rabies is an acute viral encephalomyelitis in warm-blooded vertebrates, caused by viruses belonging to Rhabdovirus family and genus Lyssavirus. Although rabies is categorised as a neglected disease, the rabies virus (RABV) is the most studied amongst Lyssaviruses which show nearly identical infection patterns. In efforts to improving post-exposure prophylaxis, several anti-rabies monoclonal antibodies (mAbs) targeting the glycoprotein (G protein) sites I, II, III and G5 have been characterized. To explore cross-neutralization capacity of available mAbs and discover new possible B-cell epitopes, we have analyzed all available glycoprotein sequences from Lyssaviruses with a focus on sequence variation and conservation. This information was mapped on the structure of a representative G protein. We proposed several possible cross-neutralizing B-cell epitopes (GUVTTTF, WLRTV, REECLD and EHLVVEEL) in complement to the already well-characterized antigenic sites. The research could facilitate development of novel cross-reactive mAbs against RABV and even more broad, against possibly all Lyssavirus members. Copyright © 2016 Elsevier Inc. All rights reserved.
Evolution of sfbI Encoding Streptococcal Fibronectin-Binding Protein I: Horizontal Genetic Transfer and Gene Mosaic Structure

PubMed Central

Towers, Rebecca J.; Fagan, Peter K.; Talay, Susanne R.; Currie, Bart J.; Sriprakash, Kadaba S.; Walker, Mark J.; Chhatwal, Gursharan S.

2003-01-01

Streptococcal fibronectin-binding protein is an important virulence factor involved in colonization and invasion of epithelial cells and tissues by Streptococcus pyogenes. In order to investigate the mechanisms involved in the evolution of sfbI, the sfbI genes from 54 strains were sequenced. Thirty-four distinct alleles were identified. Three principal mechanisms appear to have been involved in the evolution of sfbI. The amino-terminal aromatic amino acid-rich domain is the most variable region and is apparently generated by intergenic recombination of horizontally acquired DNA cassettes, resulting in a genetic mosaic in this region. Two distinct and divergent sequence types that shared only 61 to 70% identity were identified in the central proline-rich region, while variation at the 3′ end of the gene is due to deletion or duplication of defined repeat units. Potential antigenic and functional variabilities in SfbI imply significant selective pressure in vivo with direct implications for the microbial pathogenesis of S. pyogenes. PMID:14662917
Phylogenetic analysis of VP2 gene of canine parvovirus and comparison with Indian and world isolates.

PubMed

Kaur, G; Chandra, M; Dwivedi, P N

2016-03-01

Canine parvovirus (CPV) causes hemorrhagic enteritis, especially in young dogs, leading to high morbidity and mortality. It has four main antigenic types CPV-2, CPV-2a, CPV-2b and CPV-2c. Virus protein 2 (VP2) is the main capsid protein and mutations affecting VP2 gene are responsible for the evolution of various antigenic types of CPV. Full length VP2 gene from field isolates was amplified and cloned for sequence analysis. The sequences were submitted to the GenBank and were assigned Acc. Nos., viz. KP406928.1 for P12, KP406927.1 for P15, KP406930.1 for P32, KP406926.1 for Megavac-6 and KP406929.1 for NobivacDHPPi. Phylogenetic analysis indicated that the samples were forming a separate clad with vaccine strains. When the samples were compared with the world and Indian isolates, it was observed that samples formed a separate node indicating regional genetic variation in CPV.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.