sequence motifs shared: Topics by Science.gov

Sample records for sequence motifs shared

Gibbs motif sampling: detection of bacterial outer membrane protein repeats.

PubMed Central

Neuwald, A. F.; Liu, J. S.; Lawrence, C. E.

1995-01-01

The detection and alignment of locally conserved regions (motifs) in multiple sequences can provide insight into protein structure, function, and evolution. A new Gibbs sampling algorithm is described that detects motif-encoding regions in sequences and optimally partitions them into distinct motif models; this is illustrated using a set of immunoglobulin fold proteins. When applied to sequences sharing a single motif, the sampler can be used to classify motif regions into related submodels, as is illustrated using helix-turn-helix DNA-binding proteins. Other statistically based procedures are described for searching a database for sequences matching motifs found by the sampler. When applied to a set of 32 very distantly related bacterial integral outer membrane proteins, the sampler revealed that they share a subtle, repetitive motif. Although BLAST (Altschul SF et al., 1990, J Mol Biol 215:403-410) fails to detect significant pairwise similarity between any of the sequences, the repeats present in these outer membrane proteins, taken as a whole, are highly significant (based on a generally applicable statistical test for motifs described here). Analysis of bacterial porins with known trimeric beta-barrel structure and related proteins reveals a similar repetitive motif corresponding to alternating membrane-spanning beta-strands. These beta-strands occur on the membrane interface (as opposed to the trimeric interface) of the beta-barrel. The broad conservation and structural location of these repeats suggests that they play important functional roles. PMID:8520488
Dipeptide frequency/bias analysis identifies conserved sites of nonrandomness shared by cysteine-rich motifs.

PubMed

Campion, S R; Ameen, A S; Lai, L; King, J M; Munzenmaier, T N

2001-08-15

This report describes the application of a simple computational tool, AAPAIR.TAB, for the systematic analysis of the cysteine-rich EGF, Sushi, and Laminin motif/sequence families at the two-amino acid level. Automated dipeptide frequency/bias analysis detects preferences in the distribution of amino acids in established protein families, by determining which "ordered dipeptides" occur most frequently in comprehensive motif-specific sequence data sets. Graphic display of the dipeptide frequency/bias data revealed family-specific preferences for certain dipeptides, but more importantly detected a shared preference for employment of the ordered dipeptides Gly-Tyr (GY) and Gly-Phe (GF) in all three protein families. The dipeptide Asn-Gly (NG) also exhibited high-frequency and bias in the EGF and Sushi motif families, whereas Asn-Thr (NT) was distinguished in the Laminin family. Evaluation of the distribution of dipeptides identified by frequency/bias analysis subsequently revealed the highly restricted localization of the G(F/Y) and N(G/T) sequence elements at two separate sites of extreme conservation in the consensus sequence of all three sequence families. The similar employment of the high-frequency/bias dipeptides in three distinct protein sequence families was further correlated with the concurrence of these shared molecular determinants at similar positions within the distinctive scaffolds of three structurally divergent, but similarly employed, motif modules.
Function-based classification of carbohydrate-active enzymes by recognition of short, conserved peptide motifs.

PubMed

Busk, Peter Kamp; Lange, Lene

2013-06-01

Functional prediction of carbohydrate-active enzymes is difficult due to low sequence identity. However, similar enzymes often share a few short motifs, e.g., around the active site, even when the overall sequences are very different. To exploit this notion for functional prediction of carbohydrate-active enzymes, we developed a simple algorithm, peptide pattern recognition (PPR), that can divide proteins into groups of sequences that share a set of short conserved sequences. When this method was used on 118 glycoside hydrolase 5 proteins with 9% average pairwise identity and representing four characterized enzymatic functions, 97% of the proteins were sorted into groups correlating with their enzymatic activity. Furthermore, we analyzed 8,138 glycoside hydrolase 13 proteins including 204 experimentally characterized enzymes with 28 different functions. There was a 91% correlation between group and enzyme activity. These results indicate that the function of carbohydrate-active enzymes can be predicted with high precision by finding short, conserved motifs in their sequences. The glycoside hydrolase 61 family is important for fungal biomass conversion, but only a few proteins of this family have been functionally characterized. Interestingly, PPR divided 743 glycoside hydrolase 61 proteins into 16 subfamilies useful for targeted investigation of the function of these proteins and pinpointed three conserved motifs with putative importance for enzyme activity. Furthermore, the conserved sequences were useful for cloning of new, subfamily-specific glycoside hydrolase 61 proteins from 14 fungi. In conclusion, identification of conserved sequence motifs is a new approach to sequence analysis that can predict carbohydrate-active enzyme functions with high precision.
Simple Shared Motifs (SSM) in conserved region of promoters: a new approach to identify co-regulation patterns.

PubMed

Gruel, Jérémy; LeBorgne, Michel; LeMeur, Nolwenn; Théret, Nathalie

2011-09-12

Regulation of gene expression plays a pivotal role in cellular functions. However, understanding the dynamics of transcription remains a challenging task. A host of computational approaches have been developed to identify regulatory motifs, mainly based on the recognition of DNA sequences for transcription factor binding sites. Recent integration of additional data from genomic analyses or phylogenetic footprinting has significantly improved these methods. Here, we propose a different approach based on the compilation of Simple Shared Motifs (SSM), groups of sequences defined by their length and similarity and present in conserved sequences of gene promoters. We developed an original algorithm to search and count SSM in pairs of genes. An exceptional number of SSM is considered as a common regulatory pattern. The SSM approach is applied to a sample set of genes and validated using functional gene-set enrichment analyses. We demonstrate that the SSM approach selects genes that are over-represented in specific biological categories (Ontology and Pathways) and are enriched in co-expressed genes. Finally we show that genes co-expressed in the same tissue or involved in the same biological pathway have increased SSM values. Using unbiased clustering of genes, Simple Shared Motifs analysis constitutes an original contribution to provide a clearer definition of expression networks.
Simple Shared Motifs (SSM) in conserved region of promoters: a new approach to identify co-regulation patterns

PubMed Central

2011-01-01

Background Regulation of gene expression plays a pivotal role in cellular functions. However, understanding the dynamics of transcription remains a challenging task. A host of computational approaches have been developed to identify regulatory motifs, mainly based on the recognition of DNA sequences for transcription factor binding sites. Recent integration of additional data from genomic analyses or phylogenetic footprinting has significantly improved these methods. Results Here, we propose a different approach based on the compilation of Simple Shared Motifs (SSM), groups of sequences defined by their length and similarity and present in conserved sequences of gene promoters. We developed an original algorithm to search and count SSM in pairs of genes. An exceptional number of SSM is considered as a common regulatory pattern. The SSM approach is applied to a sample set of genes and validated using functional gene-set enrichment analyses. We demonstrate that the SSM approach selects genes that are over-represented in specific biological categories (Ontology and Pathways) and are enriched in co-expressed genes. Finally we show that genes co-expressed in the same tissue or involved in the same biological pathway have increased SSM values. Conclusions Using unbiased clustering of genes, Simple Shared Motifs analysis constitutes an original contribution to provide a clearer definition of expression networks. PMID:21910886
Dissecting protein loops with a statistical scalpel suggests a functional implication of some structural motifs.

PubMed

Regad, Leslie; Martin, Juliette; Camproux, Anne-Claude

2011-06-20

One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function. Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet), which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i) ubiquitous motifs, shared by several superfamilies and (ii) superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P) and SAH/SAM. Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins.
Dissecting protein loops with a statistical scalpel suggests a functional implication of some structural motifs

PubMed Central

2011-01-01

Background One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function. Results Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet), which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i) ubiquitous motifs, shared by several superfamilies and (ii) superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P) and SAH/SAM. Conclusions Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins. PMID:21689388
Motif discovery with data mining in 3D protein structure databases: discovery, validation and prediction of the U-shape zinc binding ("Huf-Zinc") motif.

PubMed

Maurer-Stroh, Sebastian; Gao, He; Han, Hao; Baeten, Lies; Schymkowitz, Joost; Rousseau, Frederic; Zhang, Louxin; Eisenhaber, Frank

2013-02-01

Data mining in protein databases, derivatives from more fundamental protein 3D structure and sequence databases, has considerable unearthed potential for the discovery of sequence motif--structural motif--function relationships as the finding of the U-shape (Huf-Zinc) motif, originally a small student's project, exemplifies. The metal ion zinc is critically involved in universal biological processes, ranging from protein-DNA complexes and transcription regulation to enzymatic catalysis and metabolic pathways. Proteins have evolved a series of motifs to specifically recognize and bind zinc ions. Many of these, so called zinc fingers, are structurally independent globular domains with discontinuous binding motifs made up of residues mostly far apart in sequence. Through a systematic approach starting from the BRIX structure fragment database, we discovered that there exists another predictable subset of zinc-binding motifs that not only have a conserved continuous sequence pattern but also share a characteristic local conformation, despite being included in totally different overall folds. While this does not allow general prediction of all Zn binding motifs, a HMM-based web server, Huf-Zinc, is available for prediction of these novel, as well as conventional, zinc finger motifs in protein sequences. The Huf-Zinc webserver can be freely accessed through this URL (http://mendel.bii.a-star.edu.sg/METHODS/hufzinc/).
CompariMotif: quick and easy comparisons of sequence motifs.

PubMed

Edwards, Richard J; Davey, Norman E; Shields, Denis C

2008-05-15

CompariMotif is a novel tool for making motif-motif comparisons, identifying and describing similarities between regular expression motifs. CompariMotif can identify a number of different relationships between motifs, including exact matches, variants of degenerate motifs and complex overlapping motifs. Motif relationships are scored using shared information content, allowing the best matches to be easily identified in large comparisons. Many input and search options are available, enabling a list of motifs to be compared to itself (to identify recurring motifs) or to datasets of known motifs. CompariMotif can be run online at http://bioware.ucd.ie/ and is freely available for academic use as a set of open source Python modules under a GNU General Public License from http://bioinformatics.ucd.ie/shields/software/comparimotif/
Space-related pharma-motifs for fast search of protein binding motifs and polypharmacological targets

PubMed Central

2012-01-01

Background To discover a compound inhibiting multiple proteins (i.e. polypharmacological targets) is a new paradigm for the complex diseases (e.g. cancers and diabetes). In general, the polypharmacological proteins often share similar local binding environments and motifs. As the exponential growth of the number of protein structures, to find the similar structural binding motifs (pharma-motifs) is an emergency task for drug discovery (e.g. side effects and new uses for old drugs) and protein functions. Results We have developed a Space-Related Pharmamotifs (called SRPmotif) method to recognize the binding motifs by searching against protein structure database. SRPmotif is able to recognize conserved binding environments containing spatially discontinuous pharma-motifs which are often short conserved peptides with specific physico-chemical properties for protein functions. Among 356 pharma-motifs, 56.5% interacting residues are highly conserved. Experimental results indicate that 81.1% and 92.7% polypharmacological targets of each protein-ligand complex are annotated with same biological process (BP) and molecular function (MF) terms, respectively, based on Gene Ontology (GO). Our experimental results show that the identified pharma-motifs often consist of key residues in functional (active) sites and play the key roles for protein functions. The SRPmotif is available at http://gemdock.life.nctu.edu.tw/SRP/. Conclusions SRPmotif is able to identify similar pharma-interfaces and pharma-motifs sharing similar binding environments for polypharmacological targets by rapidly searching against the protein structure database. Pharma-motifs describe the conservations of binding environments for drug discovery and protein functions. Additionally, these pharma-motifs provide the clues for discovering new sequence-based motifs to predict protein functions from protein sequence databases. We believe that SRPmotif is useful for elucidating protein functions and drug discovery. PMID:23281852
Space-related pharma-motifs for fast search of protein binding motifs and polypharmacological targets.

PubMed

Chiu, Yi-Yuan; Lin, Chun-Yu; Lin, Chih-Ta; Hsu, Kai-Cheng; Chang, Li-Zen; Yang, Jinn-Moon

2012-01-01

To discover a compound inhibiting multiple proteins (i.e. polypharmacological targets) is a new paradigm for the complex diseases (e.g. cancers and diabetes). In general, the polypharmacological proteins often share similar local binding environments and motifs. As the exponential growth of the number of protein structures, to find the similar structural binding motifs (pharma-motifs) is an emergency task for drug discovery (e.g. side effects and new uses for old drugs) and protein functions. We have developed a Space-Related Pharmamotifs (called SRPmotif) method to recognize the binding motifs by searching against protein structure database. SRPmotif is able to recognize conserved binding environments containing spatially discontinuous pharma-motifs which are often short conserved peptides with specific physico-chemical properties for protein functions. Among 356 pharma-motifs, 56.5% interacting residues are highly conserved. Experimental results indicate that 81.1% and 92.7% polypharmacological targets of each protein-ligand complex are annotated with same biological process (BP) and molecular function (MF) terms, respectively, based on Gene Ontology (GO). Our experimental results show that the identified pharma-motifs often consist of key residues in functional (active) sites and play the key roles for protein functions. The SRPmotif is available at http://gemdock.life.nctu.edu.tw/SRP/. SRPmotif is able to identify similar pharma-interfaces and pharma-motifs sharing similar binding environments for polypharmacological targets by rapidly searching against the protein structure database. Pharma-motifs describe the conservations of binding environments for drug discovery and protein functions. Additionally, these pharma-motifs provide the clues for discovering new sequence-based motifs to predict protein functions from protein sequence databases. We believe that SRPmotif is useful for elucidating protein functions and drug discovery.
A discrete artificial bee colony algorithm for detecting transcription factor binding sites in DNA sequences.

PubMed

Karaboga, D; Aslan, S

2016-04-27

The great majority of biological sequences share significant similarity with other sequences as a result of evolutionary processes, and identifying these sequence similarities is one of the most challenging problems in bioinformatics. In this paper, we present a discrete artificial bee colony (ABC) algorithm, which is inspired by the intelligent foraging behavior of real honey bees, for the detection of highly conserved residue patterns or motifs within sequences. Experimental studies on three different data sets showed that the proposed discrete model, by adhering to the fundamental scheme of the ABC algorithm, produced competitive or better results than other metaheuristic motif discovery techniques.
Alignment-free sequence comparison (II): theoretical power of comparison statistics.

PubMed

Wan, Lin; Reinert, Gesine; Sun, Fengzhu; Waterman, Michael S

2010-11-01

Rapid methods for alignment-free sequence comparison make large-scale comparisons between sequences increasingly feasible. Here we study the power of the statistic D2, which counts the number of matching k-tuples between two sequences, as well as D2*, which uses centralized counts, and D2S, which is a self-standardized version, both from a theoretical viewpoint and numerically, providing an easy to use program. The power is assessed under two alternative hidden Markov models; the first one assumes that the two sequences share a common motif, whereas the second model is a pattern transfer model; the null model is that the two sequences are composed of independent and identically distributed letters and they are independent. Under the first alternative model, the means of the tuple counts in the individual sequences change, whereas under the second alternative model, the marginal means are the same as under the null model. Using the limit distributions of the count statistics under the null and the alternative models, we find that generally, asymptotically D2S has the largest power, followed by D2*, whereas the power of D2 can even be zero in some cases. In contrast, even for sequences of length 140,000 bp, in simulations D2* generally has the largest power. Under the first alternative model of a shared motif, the power of D2*approaches 100% when sufficiently many motifs are shared, and we recommend the use of D2* for such practical applications. Under the second alternative model of pattern transfer,the power for all three count statistics does not increase with sequence length when the sequence is sufficiently long, and hence none of the three statistics under consideration canbe recommended in such a situation. We illustrate the approach on 323 transcription factor binding motifs with length at most 10 from JASPAR CORE (October 12, 2009 version),verifying that D2* is generally more powerful than D2. The program to calculate the power of D2, D2* and D2S can be downloaded from http://meta.cmb.usc.edu/d2. Supplementary Material is available at www.liebertonline.com/cmb.
Discovery of T Cell Receptor β Motifs Specific to HLA-B27-Positive Ankylosing Spondylitis by Deep Repertoire Sequence Analysis.

PubMed

Faham, Malek; Carlton, Victoria; Moorhead, Martin; Zheng, Jianbiao; Klinger, Mark; Pepin, Francois; Asbury, Thomas; Vignali, Marissa; Emerson, Ryan O; Robins, Harlan S; Ireland, James; Baechler-Gillespie, Emily; Inman, Robert D

2017-04-01

Ankylosing spondylitis (AS), a chronic inflammatory disorder, has a notable association with HLA-B27. One hypothesis suggests that a common antigen that binds to HLA-B27 is important for AS disease pathogenesis. This study was undertaken to determine sequences and motifs that are shared among HLA-B27-positive AS patients, using T cell repertoire next-generation sequencing. To identify motifs enriched among B27-positive AS patients, we performed T cell receptor β (TCRβ) repertoire sequencing on samples from 191 B27-positive AS patients, 43 B27-negative AS patients, and 227 controls, and we obtained >77 million TCRβ clonotype sequences. First, we assessed whether any of 50 previously published sequences were enriched in B27-positive AS patients. We then used training and test cohorts to identify discovered motifs that were enriched in B27-positive AS patients versus controls. Six previously published and 11 discovered motifs were enriched in the B27-positive AS samples as compared to controls. After combining motifs related by sequence, we identified a total of 15 independent motifs. Both the full set of 15 motifs and a set of 6 published motifs were enriched in the B27-positive AS patients as compared to B27-positive healthy individuals (P = 0.049 and P = 0.001, respectively). Using an independent cohort, we validated that at least some of these motifs were associated with AS, and not simply with B27-positive status. We identified TCRβ motifs that are enriched in B27-positive AS patients as compared to B27-positive healthy controls. This suggests that a common antigen, presented by HLA-B27 and detected by CD8+ T cells, may be associated with AS disease pathogenesis. © 2016, American College of Rheumatology.
Identification of Human Lineage-Specific Transcriptional Coregulators Enabled by a Glossary of Binding Modules and Tunable Genomic Backgrounds.

PubMed

Mariani, Luca; Weinand, Kathryn; Vedenko, Anastasia; Barrera, Luis A; Bulyk, Martha L

2017-09-27

Transcription factors (TFs) control cellular processes by binding specific DNA motifs to modulate gene expression. Motif enrichment analysis of regulatory regions can identify direct and indirect TF binding sites. Here, we created a glossary of 108 non-redundant TF-8mer "modules" of shared specificity for 671 metazoan TFs from publicly available and new universal protein binding microarray data. Analysis of 239 ENCODE TF chromatin immunoprecipitation sequencing datasets and associated RNA sequencing profiles suggest the 8mer modules are more precise than position weight matrices in identifying indirect binding motifs and their associated tethering TFs. We also developed GENRE (genomically equivalent negative regions), a tunable tool for construction of matched genomic background sequences for analysis of regulatory regions. GENRE outperformed four state-of-the-art approaches to background sequence construction. We used our TF-8mer glossary and GENRE in the analysis of the indirect binding motifs for the co-occurrence of tethering factors, suggesting novel TF-TF interactions. We anticipate that these tools will aid in elucidating tissue-specific gene-regulatory programs. Copyright © 2017 Elsevier Inc. All rights reserved.
A comparative hidden Markov model analysis pipeline identifies proteins characteristic of cereal-infecting fungi

PubMed Central

2013-01-01

Background Fungal pathogens cause devastating losses in economically important cereal crops by utilising pathogen proteins to infect host plants. Secreted pathogen proteins are referred to as effectors and have thus far been identified by selecting small, cysteine-rich peptides from the secretome despite increasing evidence that not all effectors share these attributes. Results We take advantage of the availability of sequenced fungal genomes and present an unbiased method for finding putative pathogen proteins and secreted effectors in a query genome via comparative hidden Markov model analyses followed by unsupervised protein clustering. Our method returns experimentally validated fungal effectors in Stagonospora nodorum and Fusarium oxysporum as well as the N-terminal Y/F/WxC-motif from the barley powdery mildew pathogen. Application to the cereal pathogen Fusarium graminearum reveals a secreted phosphorylcholine phosphatase that is characteristic of hemibiotrophic and necrotrophic cereal pathogens and shares an ancient selection process with bacterial plant pathogens. Three F. graminearum protein clusters are found with an enriched secretion signal. One of these putative effector clusters contains proteins that share a [SG]-P-C-[KR]-P sequence motif in the N-terminal and show features not commonly associated with fungal effectors. This motif is conserved in secreted pathogenic Fusarium proteins and a prime candidate for functional testing. Conclusions Our pipeline has successfully uncovered conservation patterns, putative effectors and motifs of fungal pathogens that would have been overlooked by existing approaches that identify effectors as small, secreted, cysteine-rich peptides. It can be applied to any pathogenic proteome data, such as microbial pathogen data of plants and other organisms. PMID:24252298
Characterization of a chimeric foot-and-mouth disease virus bearing bovine rhinitis B virus leader proteinase

USDA-ARS?s Scientific Manuscript database

Our recent study has shown that bovine rhinovirus type 2 (BRV2), a new member of the Aphthovirus genus, shares many motifs and sequence similarities with foot-and-mouth disease virus (FMDV). Despite low sequence conservation (36percent amino acid identity) and N- and C-terminus folding differences,...
Sequence-Specific Recognition of DNA by Proteins: Binding Motifs Discovered Using a Novel Statistical/Computational Analysis

PubMed Central

Jakubec, David; Laskowski, Roman A.; Vondrasek, Jiri

2016-01-01

Decades of intensive experimental studies of the recognition of DNA sequences by proteins have provided us with a view of a diverse and complicated world in which few to no features are shared between individual DNA-binding protein families. The originally conceived direct readout of DNA residue sequences by amino acid side chains offers very limited capacity for sequence recognition, while the effects of the dynamic properties of the interacting partners remain difficult to quantify and almost impossible to generalise. In this work we investigated the energetic characteristics of all DNA residue—amino acid side chain combinations in the conformations found at the interaction interface in a very large set of protein—DNA complexes by the means of empirical potential-based calculations. General specificity-defining criteria were derived and utilised to look beyond the binding motifs considered in previous studies. Linking energetic favourability to the observed geometrical preferences, our approach reveals several additional amino acid motifs which can distinguish between individual DNA bases. Our results remained valid in environments with various dielectric properties. PMID:27384774
A structural-alphabet-based strategy for finding structural motifs across protein families

PubMed Central

Wu, Chih Yuan; Chen, Yao Chi; Lim, Carmay

2010-01-01

Proteins with insignificant sequence and overall structure similarity may still share locally conserved contiguous structural segments; i.e. structural/3D motifs. Most methods for finding 3D motifs require a known motif to search for other similar structures or functionally/structurally crucial residues. Here, without requiring a query motif or essential residues, a fully automated method for discovering 3D motifs of various sizes across protein families with different folds based on a 16-letter structural alphabet is presented. It was applied to structurally non-redundant proteins bound to DNA, RNA, obligate/non-obligate proteins as well as free DNA-binding proteins (DBPs) and proteins with known structures but unknown function. Its usefulness was illustrated by analyzing the 3D motifs found in DBPs. A non-specific motif was found with a ‘corner’ architecture that confers a stable scaffold and enables diverse interactions, making it suitable for binding not only DNA but also RNA and proteins. Furthermore, DNA-specific motifs present ‘only’ in DBPs were discovered. The motifs found can provide useful guidelines in detecting binding sites and computational protein redesign. PMID:20525797
First complete genome sequence of vanilla mosaic strain of Dasheen mosaic virus isolated from the Cook Islands.

PubMed

Puli'uvea, Christopher; Khan, Subuhi; Chang, Wee-Leong; Valmonte, Gardette; Pearson, Michael N; Higgins, Colleen M

2017-02-01

We present the first complete genome of vanilla mosaic virus (VanMV). The VanMV genomic structure is consistent with that of a potyvirus, containing a single open reading frame (ORF) encoding a polyprotein of 3139 amino acids. Motif analyses indicate the polyprotein can be cleaved into the expected ten individual proteins; other recognised potyvirus motifs are also present. As expected, the VanMV genome shows high sequence similarity to the published Dasheen mosaic virus (DsMV) genome sequences; comparisons with DsMV continue to support VanMV as a vanilla infecting strain of DsMV. Phylogenetic analyses indicate that VanMV and DsMV share a common ancestor, with VanMV having the closest relationship with DsMV strains from the South Pacific.

The combinatorial PP1-binding consensus Motif (R/K)x( (0,1))V/IxFxx(R/K)x(R/K) is a new apoptotic signature.

PubMed

Godet, Angélique N; Guergnon, Julien; Maire, Virginie; Croset, Amélie; Garcia, Alphonse

2010-04-01

Previous studies established that PP1 is a target for Bcl-2 proteins and an important regulator of apoptosis. The two distinct functional PP1 consensus docking motifs, R/Kx((0,1))V/IxF and FxxR/KxR/K, involved in PP1 binding and cell death were previously characterized in the BH1 and BH3 domains of some Bcl-2 proteins. In this study, we demonstrate that DPT-AIF(1), a peptide containing the AIF(562-571) sequence located in a c-terminal domain of AIF, is a new PP1 interacting and cell penetrating molecule. We also showed that DPT-AIF(1) provoked apoptosis in several human cell lines. Furthermore, DPT-APAF(1) a bi-partite cell penetrating peptide containing APAF-1(122-131), a non penetrating sequence from APAF-1 protein, linked to our previously described DPT-sh1 peptide shuttle, is also a PP1-interacting death molecule. Both AIF(562-571) and APAF-1(122-131) sequences contain a common R/Kx((0,1))V/IxFxxR/KxR/K motif, shared by several proteins involved in control of cell survival pathways. This motif combines the two distinct PP1c consensus docking motifs initially identified in some Bcl-2 proteins. Interestingly DPT-AIF(2) and DPT-APAF(2) that carry a F to A mutation within this combinatorial motif, no longer exhibited any PP1c binding or apoptotic effects. Moreover the F to A mutation in DPT-AIF(2) also suppressed cell penetration. These results indicate that the combinatorial PP1c docking motif R/Kx((0,1))V/IxFxxR/KxR/K, deduced from AIF(562-571) and APAF-1(122-131) sequences, is a new PP1c-dependent Apoptotic Signature. This motif is also a new tool for drug design that could be used to characterize potential anti-tumour molecules.
Detection and Preliminary Analysis of Motifs in Promoters of Anaerobically Induced Genes of Different Plant Species

PubMed Central

MOHANTY, BIJAYALAXMI; KRISHNAN, S. P. T.; SWARUP, SANJAY; BAJIC, VLADIMIR B.

2005-01-01

• Background and Aims Plants can suffer from oxygen limitation during flooding or more complete submergence and may therefore switch from Kreb's cycle respiration to fermentation in association with the expression of anaerobically inducible genes coding for enzymes involved in glycolysis and fermentation. The aim of this study was to clarify mechanisms of transcriptional regulation of these anaerobic genes by identifying motifs shared by their promoter regions. • Methods Statistically significant motifs were detected by an in silico method from 13 promoters of anaerobic genes. The selected motifs were common for the majority of analysed promoters. Their significance was evaluated by searching for their presence in transcription factor-binding site databases (TRANSFAC, PlantCARE and PLACE). Using several negative control data sets, it was tested whether the motifs found were specific to the anaerobic group. • Key Results Previously, anaerobic response elements have been identified in maize (Zea mays) and arabidopsis (Arabidopsis thaliana) genes. Known functional motifs were detected, such as GT and GC motifs, but also other motifs shared by most of the genes examined. Five motifs detected have not been found in plants hitherto but are present in the promoters of animal genes with various functions. The consensus sequences of these novel motifs are 5′-AAACAAA-3′, 5′-AGCAGC-3′, 5′-TCATCAC-3′, 5′-GTTT(A/C/T)GCAA-3′ and 5′-TTCCCTGTT-3′. • Conclusions It is believed that the promoter motifs identified could be functional by conferring anaerobic sensitivity to the genes that possess them. This proposal now requires experimental verification. PMID:16027132
Identification and analysis of pig chimeric mRNAs using RNA sequencing data

PubMed Central

2012-01-01

Background Gene fusion is ubiquitous over the course of evolution. It is expected to increase the diversity and complexity of transcriptomes and proteomes through chimeric sequence segments or altered regulation. However, chimeric mRNAs in pigs remain unclear. Here we identified some chimeric mRNAs in pigs and analyzed the expression of them across individuals and breeds using RNA-sequencing data. Results The present study identified 669 putative chimeric mRNAs in pigs, of which 251 chimeric candidates were detected in a set of RNA-sequencing data. The 618 candidates had clear trans-splicing sites, 537 of which obeyed the canonical GU-AG splice rule. Only two putative pig chimera variants whose fusion junction was overlapped with that of a known human chimeric mRNA were found. A set of unique chimeric events were considered middle variances in the expression across individuals and breeds, and revealed non-significant variance between sexes. Furthermore, the genomic region of the 5′ partner gene shares a similar DNA sequence with that of the 3′ partner gene for 458 putative chimeric mRNAs. The 81 of those shared DNA sequences significantly matched the known DNA-binding motifs in the JASPAR CORE database. Four DNA motifs shared in parental genomic regions had significant similarity with known human CTCF binding sites. Conclusions The present study provided detailed information on some pig chimeric mRNAs. We proposed a model that trans-acting factors, such as CTCF, induced the spatial organisation of parental genes to the same transcriptional factory so that parental genes were coordinatively transcribed to give birth to chimeric mRNAs. PMID:22925561
Classification of proteins with shared motifs and internal repeats in the ECOD database

PubMed Central

Kinch, Lisa N.; Liao, Yuxing

2016-01-01

Abstract Proteins and their domains evolve by a set of events commonly including the duplication and divergence of small motifs. The presence of short repetitive regions in domains has generally constituted a difficult case for structural domain classifications and their hierarchies. We developed the Evolutionary Classification Of protein Domains (ECOD) in part to implement a new schema for the classification of these types of proteins. Here we document the ways in which ECOD classifies proteins with small internal repeats, widespread functional motifs, and assemblies of small domain‐like fragments in its evolutionary schema. We illustrate the ways in which the structural genomics project impacted the classification and characterization of new structural domains and sequence families over the decade. PMID:26833690
Definition of the HLA-A29 peptide ligand motif allows prediction of potential T-cell epitopes from the retinal soluble antigen, a candidate autoantigen in birdshot retinopathy.

PubMed Central

Boisgerault, F; Khalil, I; Tieng, V; Connan, F; Tabary, T; Cohen, J H; Choppin, J; Charron, D; Toubert, A

1996-01-01

The peptide-binding motif of HLA-A29, the predisposing allele for birdshot retinopathy, was determined after acid-elution of endogenous peptides from purified HLA-A29 molecules. Individual and pooled HPLC fractions were sequenced by Edman degradation. Major anchor residues could be defined as glutamate at the second position of the peptide and as tyrosine at the carboxyl terminus. In vitro binding of polyglycine synthetic peptides to purified HLA-A29 molecules also revealed the need for an auxiliary anchor residue at the third position, preferably phenylalanine. By using this motif, we synthesized six peptides from the retinal soluble antigen, a candidate autoantigen in autoimmune uveoretinitis. Their in vitro binding was tested on HLA-A29 and also on HLA-B44 and HLA-B61, two alleles sharing close peptide-binding motifs. Two peptides derived from the carboxyl-terminal sequence of the human retinal soluble antigen bound efficiently to HLA-A29. This study could contribute to the prediction of T-cell epitopes from retinal autoantigens implicated in birdshot retinopathy. PMID:8622959
A Genome-Wide Survey of the Microsatellite Content of the Globe Artichoke Genome and the Development of a Web-Based Database

PubMed Central

Portis, Ezio; Portis, Flavio; Valente, Luisa; Moglia, Andrea; Barchi, Lorenzo; Lanteri, Sergio; Acquadro, Alberto

2016-01-01

The recently acquired genome sequence of globe artichoke (Cynara cardunculus var. scolymus) has been used to catalog the genome’s content of simple sequence repeat (SSR) markers. More than 177,000 perfect SSRs were revealed, equivalent to an overall density across the genome of 244.5 SSRs/Mbp, but some 224,000 imperfect SSRs were also identified. About 21% of these SSRs were complex (two stretches of repeats separated by <100 nt). Some 73% of the SSRs were composed of dinucleotide motifs. The SSRs were categorized for the numbers of repeats present, their overall length and were allocated to their linkage group. A total of 4,761 perfect and 6,583 imperfect SSRs were present in 3,781 genes (14.11% of the total), corresponding to an overall density across the gene space of 32,5 and 44,9 SSRs/Mbp for perfect and imperfect motifs, respectively. A putative function has been assigned, using the gene ontology approach, to the set of genes harboring at least one SSR. The same search parameters were applied to reveal the SSR content of 14 other plant species for which genome sequence is available. Certain species-specific SSR motifs were identified, along with a hexa-nucleotide motif shared only with the other two Compositae species (sunflower (Helianthus annuus) and horseweed (Conyza canadensis)) included in the study. Finally, a database, called “Cynara cardunculus MicroSatellite DataBase” (CyMSatDB) was developed to provide a searchable interface to the SSR data. CyMSatDB facilitates the retrieval of SSR markers, as well as suggested forward and reverse primers, on the basis of genomic location, genomic vs genic context, perfect vs imperfect repeat, motif type, motif sequence and repeat number. The SSR markers were validated via an in silico based PCR analysis adopting two available assembled transcriptomes, derived from contrasting globe artichoke accessions, as templates. PMID:27648830
A reciprocal HLA-Disease Association in Rheumatoid Arthritis and Pemphigus Vulgaris

PubMed Central

van Drongelen, Vincent; Holoshitz, Joseph

2017-01-01

Human leukocyte antigens (HLA) have been extensively studied as being antigen presenting receptors, but many aspects of their function remain elusive, especially their association with various autoimmune diseases. Here we discuss an illustrative case of the reciprocal relationship between certain HLA-DRB1 alleles and two diseases, rheumatoid arthritis (RA) and pemphigus vulgaris (PV). RA is strongly associated with HLA-DRB1 alleles that encode a five amino acid sequence motif in the 70-74 region of the DRβ chain, called the shared epitope (SE), while PV is associated with the HLA-DRB1*04:02 allele that encodes a different sequence motif in the same region. Interestingly, while HLA-DRB1*04:02 confers susceptibility to PV, this and other alleles that encode the same sequence motif in the 70-74 region of the DRβ chain are protective against RA. Currently, no convincing explanation for this antagonistic effect is present. Here we briefly review the immunology and immunogenetics of both diseases, identify remaining gaps in our understanding of their association with HLA, and propose the possibility that the 70-74 DRβ epitope may contribute to disease risk by mechanisms other than antigen presentation. PMID:27814654
Functional Analysis of Light-harvesting-like Protein 3 (LIL3) and Its Light-harvesting Chlorophyll-binding Motif in Arabidopsis*

PubMed Central

Takahashi, Kaori; Takabayashi, Atsushi; Tanaka, Ayumi; Tanaka, Ryouichi

2014-01-01

The light-harvesting complex (LHC) constitutes the major light-harvesting antenna of photosynthetic eukaryotes. LHC contains a characteristic sequence motif, termed LHC motif, consisting of 25–30 mostly hydrophobic amino acids. This motif is shared by a number of transmembrane proteins from oxygenic photoautotrophs that are termed light-harvesting-like (LIL) proteins. To gain insights into the functions of LIL proteins and their LHC motifs, we functionally characterized a plant LIL protein, LIL3. This protein has been shown previously to stabilize geranylgeranyl reductase (GGR), a key enzyme in phytol biosynthesis. It is hypothesized that LIL3 functions to anchor GGR to membranes. First, we conjugated the transmembrane domain of LIL3 or that of ascorbate peroxidase to GGR and expressed these chimeric proteins in an Arabidopsis mutant lacking LIL3 protein. As a result, the transgenic plants restored phytol-synthesizing activity. These results indicate that GGR is active as long as it is anchored to membranes, even in the absence of LIL3. Subsequently, we addressed the question why the LHC motif is conserved in the LIL3 sequences. We modified the transmembrane domain of LIL3, which contains the LHC motif, by substituting its conserved amino acids (Glu-171, Asn-174, and Asp-189) with alanine. As a result, the Arabidopsis transgenic plants partly recovered the phytol-biosynthesizing activity. However, in these transgenic plants, the LIL3-GGR complexes were partially dissociated. Collectively, these results indicate that the LHC motif of LIL3 is involved in the complex formation of LIL3 and GGR, which might contribute to the GGR reaction. PMID:24275650
DLocalMotif: a discriminative approach for discovering local motifs in protein sequences.

PubMed

Mehdi, Ahmed M; Sehgal, Muhammad Shoaib B; Kobe, Bostjan; Bailey, Timothy L; Bodén, Mikael

2013-01-01

Local motifs are patterns of DNA or protein sequences that occur within a sequence interval relative to a biologically defined anchor or landmark. Current protein motif discovery methods do not adequately consider such constraints to identify biologically significant motifs that are only weakly over-represented but spatially confined. Using negatives, i.e. sequences known to not contain a local motif, can further increase the specificity of their discovery. This article introduces the method DLocalMotif that makes use of positional information and negative data for local motif discovery in protein sequences. DLocalMotif combines three scoring functions, measuring degrees of motif over-representation, entropy and spatial confinement, specifically designed to discriminatively exploit the availability of negative data. The method is shown to outperform current methods that use only a subset of these motif characteristics. We apply the method to several biological datasets. The analysis of peroxisomal targeting signals uncovers several novel motifs that occur immediately upstream of the dominant peroxisomal targeting signal-1 signal. The analysis of proline-tyrosine nuclear localization signals uncovers multiple novel motifs that overlap with C2H2 zinc finger domains. We also evaluate the method on classical nuclear localization signals and endoplasmic reticulum retention signals and find that DLocalMotif successfully recovers biologically relevant sequence properties. http://bioinf.scmb.uq.edu.au/dlocalmotif/
A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs

PubMed Central

2012-01-01

Background Discovery of functionally significant short, statistically overrepresented subsequence patterns (motifs) in a set of sequences is a challenging problem in bioinformatics. Oftentimes, not all sequences in the set contain a motif. These non-motif-containing sequences complicate the algorithmic discovery of motifs. Filtering the non-motif-containing sequences from the larger set of sequences while simultaneously determining the identity of the motif is, therefore, desirable and a non-trivial problem in motif discovery research. Results We describe MotifCatcher, a framework that extends the sensitivity of existing motif-finding tools by employing random sampling to effectively remove non-motif-containing sequences from the motif search. We developed two implementations of our algorithm; each built around a commonly used motif-finding tool, and applied our algorithm to three diverse chromatin immunoprecipitation (ChIP) data sets. In each case, the motif finder with the MotifCatcher extension demonstrated improved sensitivity over the motif finder alone. Our approach organizes candidate functionally significant discovered motifs into a tree, which allowed us to make additional insights. In all cases, we were able to support our findings with experimental work from the literature. Conclusions Our framework demonstrates that additional processing at the sequence entry level can significantly improve the performance of existing motif-finding tools. For each biological data set tested, we were able to propose novel biological hypotheses supported by experimental work from the literature. Specifically, in Escherichia coli, we suggested binding site motifs for 6 non-traditional LexA protein binding sites; in Saccharomyces cerevisiae, we hypothesize 2 disparate mechanisms for novel binding sites of the Cse4p protein; and in Halobacterium sp. NRC-1, we discoverd subtle differences in a general transcription factor (GTF) binding site motif across several data sets. We suggest that small differences in our discovered motif could confer specificity for one or more homologous GTF proteins. We offer a free implementation of the MotifCatcher software package at http://www.bme.ucdavis.edu/facciotti/resources_data/software/. PMID:23181585
Active diuretic peptidomimetic insect kinin analogs that contain Beta-turn mimetic motif 4-aminopyroglutamate and lack native peptide bonds

USDA-ARS?s Scientific Manuscript database

The multifunctional arthropod 'insect kinins' share the evolutionarily conserved C-terminal pentapeptide core sequence Phe-X1-X2-Trp-Gly-NH2, where X1 = His, Asn, Ser, or Tyr and X2 = Ser, Pro, or Ala. Insect kinins regulate diuresis in many species of insects, including the cricket. Insect kinins...
ssHMM: extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data

PubMed Central

Krestel, Ralf; Ohler, Uwe; Vingron, Martin; Marsico, Annalisa

2017-01-01

Abstract RNA-binding proteins (RBPs) play an important role in RNA post-transcriptional regulation and recognize target RNAs via sequence-structure motifs. The extent to which RNA structure influences protein binding in the presence or absence of a sequence motif is still poorly understood. Existing RNA motif finders either take the structure of the RNA only partially into account, or employ models which are not directly interpretable as sequence-structure motifs. We developed ssHMM, an RNA motif finder based on a hidden Markov model (HMM) and Gibbs sampling which fully captures the relationship between RNA sequence and secondary structure preference of a given RBP. Compared to previous methods which output separate logos for sequence and structure, it directly produces a combined sequence-structure motif when trained on a large set of sequences. ssHMM’s model is visualized intuitively as a graph and facilitates biological interpretation. ssHMM can be used to find novel bona fide sequence-structure motifs of uncharacterized RBPs, such as the one presented here for the YY1 protein. ssHMM reaches a high motif recovery rate on synthetic data, it recovers known RBP motifs from CLIP-Seq data, and scales linearly on the input size, being considerably faster than MEMERIS and RNAcontext on large datasets while being on par with GraphProt. It is freely available on Github and as a Docker image. PMID:28977546
Rapid motif compliance scoring with match weight sets.

PubMed

Venezia, D; O'Hara, P J

1993-02-01

Most current implementations of motif matching in biological sequences have sacrificed the generality of weight matrix scoring for shorter runtimes. The program MOTIF incorporates a weight matrix and a rapid, backtracking tree-search algorithm to score motif compliance with greatly enhanced performance while placing no constraints on the motif. In addition, any positions within a motif can be marked as 'inviolate', thereby requiring an exact match. MOTIF allows a choice of regular expression formats and can use both motif and sequence libraries as either targets or queries. Nucleic acid sequences can optionally be translated by MOTIF in any frame(s) and used against peptide motifs.
BayesMotif: de novo protein sorting motif discovery from impure datasets.

PubMed

Hu, Jianjun; Zhang, Fan

2010-01-18

Protein sorting is the process that newly synthesized proteins are transported to their target locations within or outside of the cell. This process is precisely regulated by protein sorting signals in different forms. A major category of sorting signals are amino acid sub-sequences usually located at the N-terminals or C-terminals of protein sequences. Genome-wide experimental identification of protein sorting signals is extremely time-consuming and costly. Effective computational algorithms for de novo discovery of protein sorting signals is needed to improve the understanding of protein sorting mechanisms. We formulated the protein sorting motif discovery problem as a classification problem and proposed a Bayesian classifier based algorithm (BayesMotif) for de novo identification of a common type of protein sorting motifs in which a highly conserved anchor is present along with a less conserved motif regions. A false positive removal procedure is developed to iteratively remove sequences that are unlikely to contain true motifs so that the algorithm can identify motifs from impure input sequences. Experiments on both implanted motif datasets and real-world datasets showed that the enhanced BayesMotif algorithm can identify anchored sorting motifs from pure or impure protein sequence dataset. It also shows that the false positive removal procedure can help to identify true motifs even when there is only 20% of the input sequences containing true motif instances. We proposed BayesMotif, a novel Bayesian classification based algorithm for de novo discovery of a special category of anchored protein sorting motifs from impure datasets. Compared to conventional motif discovery algorithms such as MEME, our algorithm can find less-conserved motifs with short highly conserved anchors. Our algorithm also has the advantage of easy incorporation of additional meta-sequence features such as hydrophobicity or charge of the motifs which may help to overcome the limitations of PWM (position weight matrix) motif model.
Discovery of novel antimicrobial peptides with unusual cysteine motifs in dandelion Taraxacum officinale Wigg. flowers.

PubMed

Astafieva, A A; Rogozhin, E A; Odintsova, T I; Khadeeva, N V; Grishin, E V; Egorov, Ts A

2012-08-01

Three novel antimicrobial peptides designated ToAMP1, ToAMP2 and ToAMP3 were purified from Taraxacum officinale flowers. Their amino acid sequences were determined. The peptides are cationic and cysteine-rich and consist of 38, 44 and 42 amino acid residues for ToAMP1, ToAMP2 and ToAMP3, respectively. Importantly, according to cysteine motifs, the peptides are representatives of two novel previously unknown families of plant antimicrobial peptides. ToAMP1 and ToAMP2 share high sequence identity and belong to 6-Cys-containing antimicrobial peptides, while ToAMP3 is a member of a distinct 8-Cys family. The peptides were shown to display high antimicrobial activity both against fungal and bacterial pathogens, and therefore represent new promising molecules for biotechnological and medicinal applications. Crown Copyright © 2012. Published by Elsevier Inc. All rights reserved.
Statistical tests to compare motif count exceptionalities

PubMed Central

Robin, Stéphane; Schbath, Sophie; Vandewalle, Vincent

2007-01-01

Background Finding over- or under-represented motifs in biological sequences is now a common task in genomics. Thanks to p-value calculation for motif counts, exceptional motifs are identified and represent candidate functional motifs. The present work addresses the related question of comparing the exceptionality of one motif in two different sequences. Just comparing the motif count p-values in each sequence is indeed not sufficient to decide if this motif is significantly more exceptional in one sequence compared to the other one. A statistical test is required. Results We develop and analyze two statistical tests, an exact binomial one and an asymptotic likelihood ratio test, to decide whether the exceptionality of a given motif is equivalent or significantly different in two sequences of interest. For that purpose, motif occurrences are modeled by Poisson processes, with a special care for overlapping motifs. Both tests can take the sequence compositions into account. As an illustration, we compare the octamer exceptionalities in the Escherichia coli K-12 backbone versus variable strain-specific loops. Conclusion The exact binomial test is particularly adapted for small counts. For large counts, we advise to use the likelihood ratio test which is asymptotic but strongly correlated with the exact binomial test and very simple to use. PMID:17346349
DMINDA: an integrated web server for DNA motif identification and analyses

PubMed Central

Ma, Qin; Zhang, Hanyuan; Mao, Xizeng; Zhou, Chuan; Liu, Bingqiang; Chen, Xin; Xu, Ying

2014-01-01

DMINDA (DNA motif identification and analyses) is an integrated web server for DNA motif identification and analyses, which is accessible at http://csbl.bmb.uga.edu/DMINDA/. This web site is freely available to all users and there is no login requirement. This server provides a suite of cis-regulatory motif analysis functions on DNA sequences, which are important to elucidation of the mechanisms of transcriptional regulation: (i) de novo motif finding for a given set of promoter sequences along with statistical scores for the predicted motifs derived based on information extracted from a control set, (ii) scanning motif instances of a query motif in provided genomic sequences, (iii) motif comparison and clustering of identified motifs, and (iv) co-occurrence analyses of query motifs in given promoter sequences. The server is powered by a backend computer cluster with over 150 computing nodes, and is particularly useful for motif prediction and analyses in prokaryotic genomes. We believe that DMINDA, as a new and comprehensive web server for cis-regulatory motif finding and analyses, will benefit the genomic research community in general and prokaryotic genome researchers in particular. PMID:24753419
Identification of the sequence motif of glycoside hydrolase 13 family members

PubMed Central

Kumar, Vikash

2011-01-01

A bioinformatics analysis of sequences of enzymes of the glycoside hydrolase (GH) 13 family members such as α-amylase, cyclodextrin glycosyltransferase (CGTase), branching enzyme and cyclomaltodextrinase has been carried out in order to find out the sequence motifs that govern the reactions specificities of these enzymes by using hidden Markov model (HMM) profile. This analysis suggests the existence of such sequence motifs and residues of these motifs constituting the −1 to +3 catalytic subsites of the enzyme. Hence, by introducing mutations in the residues of these four subsites, one can change the reaction specificities of the enzymes. In general it has been observed that α -amylase sequence motif have low sequence conservation than rest of the motifs of the GH13 family members. PMID:21544166
Transcriptional regulation of human eosinophil RNases by an evolutionary- conserved sequence motif in primate genome

PubMed Central

Wang, Hsiu-Yu; Chang, Hao-Teng; Pai, Tun-Wen; Wu, Chung-I; Lee, Yuan-Hung; Chang, Yen-Hsin; Tai, Hsiu-Ling; Tang, Chuan-Yi; Chou, Wei-Yao; Chang, Margaret Dah-Tsyr

2007-01-01

Background Human eosinophil-derived neurotoxin (edn) and eosinophil cationic protein (ecp) are members of a subfamily of primate ribonuclease (rnase) genes. Although they are generated by gene duplication event, distinct edn and ecp expression profile in various tissues have been reported. Results In this study, we obtained the upstream promoter sequences of several representative primate eosinophil rnases. Bioinformatic analysis revealed the presence of a shared 34-nucleotide (nt) sequence stretch located at -81 to -48 in all edn promoters and macaque ecp promoter. Such a unique sequence motif constituted a region essential for transactivation of human edn in hepatocellular carcinoma cells. Gel electrophoretic mobility shift assay, transient transfection and scanning mutagenesis experiments allowed us to identify binding sites for two transcription factors, Myc-associated zinc finger protein (MAZ) and SV-40 protein-1 (Sp1), within the 34-nt segment. Subsequent in vitro and in vivo binding assays demonstrated a direct molecular interaction between this 34-nt region and MAZ and Sp1. Interestingly, overexpression of MAZ and Sp1 respectively repressed and enhanced edn promoter activity. The regulatory transactivation motif was mapped to the evolutionarily conserved -74/-65 region of the edn promoter, which was guanidine-rich and critical for recognition by both transcription factors. Conclusion Our results provide the first direct evidence that MAZ and Sp1 play important roles on the transcriptional activation of the human edn promoter through specific binding to a 34-nt segment present in representative primate eosinophil rnase promoters. PMID:17927842
BlockLogo: visualization of peptide and sequence motif conservation

PubMed Central

Olsen, Lars Rønn; Kudahl, Ulrich Johan; Simon, Christian; Sun, Jing; Schönbach, Christian; Reinherz, Ellis L.; Zhang, Guang Lan; Brusic, Vladimir

2013-01-01

BlockLogo is a web-server application for visualization of protein and nucleotide fragments, continuous protein sequence motifs, and discontinuous sequence motifs using calculation of block entropy from multiple sequence alignments. The user input consists of a multiple sequence alignment, selection of motif positions, type of sequence, and output format definition. The output has BlockLogo along with the sequence logo, and a table of motif frequencies. We deployed BlockLogo as an online application and have demonstrated its utility through examples that show visualization of T-cell epitopes and B-cell epitopes (both continuous and discontinuous). Our additional example shows a visualization and analysis of structural motifs that determine specificity of peptide binding to HLA-DR molecules. The BlockLogo server also employs selected experimentally validated prediction algorithms to enable on-the-fly prediction of MHC binding affinity to 15 common HLA class I and class II alleles as well as visual analysis of discontinuous epitopes from multiple sequence alignments. It enables the visualization and analysis of structural and functional motifs that are usually described as regular expressions. It provides a compact view of discontinuous motifs composed of distant positions within biological sequences. BlockLogo is available at: http://research4.dfci.harvard.edu/cvc/blocklogo/ and http://methilab.bu.edu/blocklogo/ PMID:24001880

MHC class II genes in European wolves: a comparison with dogs.

PubMed

Seddon, Jennifer M; Ellegren, Hans

2002-10-01

The genome of the grey wolf, one of the most widely distributed land mammal species, has been subjected to both stochastic factors, including biogeographical subdivision and population fragmentation, and strong selection during the domestication of the dog. To explore the effects of drift and selection on the partitioning of MHC variation in the diversification of species, we present nine DQA, 10 DQB, and 17 DRB1 sequences of the second exon for European wolves and compare them with sequences of North American wolves and dogs. The relatively large number of class II alleles present in both European and North American wolves attests to their large historical population sizes, yet there are few alleles shared between these regions at DQB and DRB1. Similarly, the dog has an extensive array of class II MHC alleles, a consequence of a genetically diverse origin, but allelic overlap with wolves only at DQA. Although we might expect a progression from shared alleles to shared allelic lineages during differentiation, the partitioning of diversity between wolves and dogs at DQB and DRB1 differs from that at DQA. Furthermore, an extensive region of nucleotide sequence shared between DRB1 and DQB alleles and a shared motif suggests intergenic recombination may have contributed to MHC diversity in the Canidae.
Identifying novel sequence variants of RNA 3D motifs

PubMed Central

Zirbel, Craig L.; Roll, James; Sweeney, Blake A.; Petrov, Anton I.; Pirrung, Meg; Leontis, Neocles B.

2015-01-01

Predicting RNA 3D structure from sequence is a major challenge in biophysics. An important sub-goal is accurately identifying recurrent 3D motifs from RNA internal and hairpin loop sequences extracted from secondary structure (2D) diagrams. We have developed and validated new probabilistic models for 3D motif sequences based on hybrid Stochastic Context-Free Grammars and Markov Random Fields (SCFG/MRF). The SCFG/MRF models are constructed using atomic-resolution RNA 3D structures. To parameterize each model, we use all instances of each motif found in the RNA 3D Motif Atlas and annotations of pairwise nucleotide interactions generated by the FR3D software. Isostericity relations between non-Watson–Crick basepairs are used in scoring sequence variants. SCFG techniques model nested pairs and insertions, while MRF ideas handle crossing interactions and base triples. We use test sets of randomly-generated sequences to set acceptance and rejection thresholds for each motif group and thus control the false positive rate. Validation was carried out by comparing results for four motif groups to RMDetect. The software developed for sequence scoring (JAR3D) is structured to automatically incorporate new motifs as they accumulate in the RNA 3D Motif Atlas when new structures are solved and is available free for download. PMID:26130723
Comparative Analysis of P450 Signature Motifs EXXR and CXG in the Large and Diverse Kingdom of Fungi: Identification of Evolutionarily Conserved Amino Acid Patterns Characteristic of P450 Family

PubMed Central

Syed, Khajamohiddin; Mashele, Samson Sitheni

2014-01-01

Cytochrome P450 monooxygenases (P450s) are heme-thiolate proteins distributed across the biological kingdoms. P450s are catalytically versatile and play key roles in organisms primary and secondary metabolism. Identification of P450s across the biological kingdoms depends largely on the identification of two P450 signature motifs, EXXR and CXG, in the protein sequence. Once a putative protein has been identified as P450, it will be assigned to a family and subfamily based on the criteria that P450s within a family share more than 40% homology and members of subfamilies share more than 55% homology. However, to date, no evidence has been presented that can distinguish members of a P450 family. Here, for the first time we report the identification of EXXR- and CXG-motifs-based amino acid patterns that are characteristic of the P450 family. Analysis of P450 signature motifs in the under-explored fungal P450s from four different phyla, ascomycota, basidiomycota, zygomycota and chytridiomycota, indicated that the EXXR motif is highly variable and the CXG motif is somewhat variable. The amino acids threonine and leucine are preferred as second and third amino acids in the EXXR motif and proline and glycine are preferred as second and third amino acids in the CXG motif in fungal P450s. Analysis of 67 P450 families from biological kingdoms such as plants, animals, bacteria and fungi showed conservation of a set of amino acid patterns characteristic of a particular P450 family in EXXR and CXG motifs. This suggests that during the divergence of P450 families from a common ancestor these amino acids patterns evolve and are retained in each P450 family as a signature of that family. The role of amino acid patterns characteristic of a P450 family in the structural and/or functional aspects of members of the P450 family is a topic for future research. PMID:24743800
The BaMM web server for de-novo motif discovery and regulatory sequence analysis.

PubMed

Kiesel, Anja; Roth, Christian; Ge, Wanwan; Wess, Maximilian; Meier, Markus; Söding, Johannes

2018-05-28

The BaMM web server offers four tools: (i) de-novo discovery of enriched motifs in a set of nucleotide sequences, (ii) scanning a set of nucleotide sequences with motifs to find motif occurrences, (iii) searching with an input motif for similar motifs in our BaMM database with motifs for >1000 transcription factors, trained from the GTRD ChIP-seq database and (iv) browsing and keyword searching the motif database. In contrast to most other servers, we represent sequence motifs not by position weight matrices (PWMs) but by Bayesian Markov Models (BaMMs) of order 4, which we showed previously to perform substantially better in ROC analyses than PWMs or first order models. To address the inadequacy of P- and E-values as measures of motif quality, we introduce the AvRec score, the average recall over the TP-to-FP ratio between 1 and 100. The BaMM server is freely accessible without registration at https://bammmotif.mpibpc.mpg.de.
Molecular population dynamics of DNA structures in a bcl-2 promoter sequence is regulated by small molecules and the transcription factor hnRNP LL

PubMed Central

Cui, Yunxi; Koirala, Deepak; Kang, HyunJin; Dhakal, Soma; Yangyuoru, Philip; Hurley, Laurence H.; Mao, Hanbin

2014-01-01

Minute difference in free energy change of unfolding among structures in an oligonucleotide sequence can lead to a complex population equilibrium, which is rather challenging for ensemble techniques to decipher. Herein, we introduce a new method, molecular population dynamics (MPD), to describe the intricate equilibrium among non-B deoxyribonucleic acid (DNA) structures. Using mechanical unfolding in laser tweezers, we identified six DNA species in a cytosine (C)-rich bcl-2 promoter sequence. Population patterns of these species with and without a small molecule (IMC-76 or IMC-48) or the transcription factor hnRNP LL are compared to reveal the MPD of different species. With a pattern recognition algorithm, we found that IMC-48 and hnRNP LL share 80% similarity in stabilizing i-motifs with 60 s incubation. In contrast, IMC-76 demonstrates an opposite behavior, preferring flexible DNA hairpins. With 120–180 s incubation, IMC-48 and hnRNP LL destabilize i-motifs, which has been previously proposed to activate bcl-2 transcriptions. These results provide strong support, from the population equilibrium perspective, that small molecules and hnRNP LL can modulate bcl-2 transcription through interaction with i-motifs. The excellent agreement with biochemical results firmly validates the MPD analyses, which, we expect, can be widely applicable to investigate complex equilibrium of biomacromolecules. PMID:24609386
The Malarial Host-Targeting Signal Is Conserved in the Irish Potato Famine Pathogen

PubMed Central

Liolios, Konstantinos; Win, Joe; Kanneganti, Thirumala-Devi; Young, Carolyn; Kamoun, Sophien; Haldar, Kasturi

2006-01-01

Animal and plant eukaryotic pathogens, such as the human malaria parasite Plasmodium falciparum and the potato late blight agent Phytophthora infestans, are widely divergent eukaryotic microbes. Yet they both produce secretory virulence and pathogenic proteins that alter host cell functions. In P. falciparum, export of parasite proteins to the host erythrocyte is mediated by leader sequences shown to contain a host-targeting (HT) motif centered on an RxLx (E, D, or Q) core: this motif appears to signify a major pathogenic export pathway with hundreds of putative effectors. Here we show that a secretory protein of P. infestans, which is perceived by plant disease resistance proteins and induces hypersensitive plant cell death, contains a leader sequence that is equivalent to the Plasmodium HT-leader in its ability to export fusion of green fluorescent protein (GFP) from the P. falciparum parasite to the host erythrocyte. This export is dependent on an RxLR sequence conserved in P. infestans leaders, as well as in leaders of all ten secretory oomycete proteins shown to function inside plant cells. The RxLR motif is also detected in hundreds of secretory proteins of P. infestans, Phytophthora sojae, and Phytophthora ramorum and has high value in predicting host-targeted leaders. A consensus motif further reveals E/D residues enriched within ~25 amino acids downstream of the RxLR, which are also needed for export. Together the data suggest that in these plant pathogenic oomycetes, a consensus HT motif may reside in an extended sequence of ~25–30 amino acids, rather than in a short linear sequence. Evidence is presented that although the consensus is much shorter in P. falciparum, information sufficient for vacuolar export is contained in a region of ~30 amino acids, which includes sequences flanking the HT core. Finally, positional conservation between Phytophthora RxLR and P. falciparum RxLx (E, D, Q) is consistent with the idea that the context of their presentation is constrained. These studies provide the first evidence to our knowledge that eukaryotic microbes share equivalent pathogenic HT signals and thus conserved mechanisms to access host cells across plant and animal kingdoms that may present unique targets for prophylaxis across divergent pathogens. PMID:16733545
DMINDA: an integrated web server for DNA motif identification and analyses.

PubMed

Ma, Qin; Zhang, Hanyuan; Mao, Xizeng; Zhou, Chuan; Liu, Bingqiang; Chen, Xin; Xu, Ying

2014-07-01

DMINDA (DNA motif identification and analyses) is an integrated web server for DNA motif identification and analyses, which is accessible at http://csbl.bmb.uga.edu/DMINDA/. This web site is freely available to all users and there is no login requirement. This server provides a suite of cis-regulatory motif analysis functions on DNA sequences, which are important to elucidation of the mechanisms of transcriptional regulation: (i) de novo motif finding for a given set of promoter sequences along with statistical scores for the predicted motifs derived based on information extracted from a control set, (ii) scanning motif instances of a query motif in provided genomic sequences, (iii) motif comparison and clustering of identified motifs, and (iv) co-occurrence analyses of query motifs in given promoter sequences. The server is powered by a backend computer cluster with over 150 computing nodes, and is particularly useful for motif prediction and analyses in prokaryotic genomes. We believe that DMINDA, as a new and comprehensive web server for cis-regulatory motif finding and analyses, will benefit the genomic research community in general and prokaryotic genome researchers in particular. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Simultaneously learning DNA motif along with its position and sequence rank preferences through expectation maximization algorithm.

PubMed

Zhang, ZhiZhuo; Chang, Cheng Wei; Hugo, Willy; Cheung, Edwin; Sung, Wing-Kin

2013-03-01

Although de novo motifs can be discovered through mining over-represented sequence patterns, this approach misses some real motifs and generates many false positives. To improve accuracy, one solution is to consider some additional binding features (i.e., position preference and sequence rank preference). This information is usually required from the user. This article presents a de novo motif discovery algorithm called SEME (sampling with expectation maximization for motif elicitation), which uses pure probabilistic mixture model to model the motif's binding features and uses expectation maximization (EM) algorithms to simultaneously learn the sequence motif, position, and sequence rank preferences without asking for any prior knowledge from the user. SEME is both efficient and accurate thanks to two important techniques: the variable motif length extension and importance sampling. Using 75 large-scale synthetic datasets, 32 metazoan compendium benchmark datasets, and 164 chromatin immunoprecipitation sequencing (ChIP-Seq) libraries, we demonstrated the superior performance of SEME over existing programs in finding transcription factor (TF) binding sites. SEME is further applied to a more difficult problem of finding the co-regulated TF (coTF) motifs in 15 ChIP-Seq libraries. It identified significantly more correct coTF motifs and, at the same time, predicted coTF motifs with better matching to the known motifs. Finally, we show that the learned position and sequence rank preferences of each coTF reveals potential interaction mechanisms between the primary TF and the coTF within these sites. Some of these findings were further validated by the ChIP-Seq experiments of the coTFs. The application is available online.
RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach.

PubMed

Pan, Xiaoyong; Shen, Hong-Bin

2017-02-28

RNAs play key roles in cells through the interactions with proteins known as the RNA-binding proteins (RBP) and their binding motifs enable crucial understanding of the post-transcriptional regulation of RNAs. How the RBPs correctly recognize the target RNAs and why they bind specific positions is still far from clear. Machine learning-based algorithms are widely acknowledged to be capable of speeding up this process. Although many automatic tools have been developed to predict the RNA-protein binding sites from the rapidly growing multi-resource data, e.g. sequence, structure, their domain specific features and formats have posed significant computational challenges. One of current difficulties is that the cross-source shared common knowledge is at a higher abstraction level beyond the observed data, resulting in a low efficiency of direct integration of observed data across domains. The other difficulty is how to interpret the prediction results. Existing approaches tend to terminate after outputting the potential discrete binding sites on the sequences, but how to assemble them into the meaningful binding motifs is a topic worth of further investigation. In viewing of these challenges, we propose a deep learning-based framework (iDeep) by using a novel hybrid convolutional neural network and deep belief network to predict the RBP interaction sites and motifs on RNAs. This new protocol is featured by transforming the original observed data into a high-level abstraction feature space using multiple layers of learning blocks, where the shared representations across different domains are integrated. To validate our iDeep method, we performed experiments on 31 large-scale CLIP-seq datasets, and our results show that by integrating multiple sources of data, the average AUC can be improved by 8% compared to the best single-source-based predictor; and through cross-domain knowledge integration at an abstraction level, it outperforms the state-of-the-art predictors by 6%. Besides the overall enhanced prediction performance, the convolutional neural network module embedded in iDeep is also able to automatically capture the interpretable binding motifs for RBPs. Large-scale experiments demonstrate that these mined binding motifs agree well with the experimentally verified results, suggesting iDeep is a promising approach in the real-world applications. The iDeep framework not only can achieve promising performance than the state-of-the-art predictors, but also easily capture interpretable binding motifs. iDeep is available at http://www.csbio.sjtu.edu.cn/bioinf/iDeep.
WebMOTIFS: automated discovery, filtering and scoring of DNA sequence motifs using multiple programs and Bayesian approaches

PubMed Central

Romer, Katherine A.; Kayombya, Guy-Richard; Fraenkel, Ernest

2007-01-01

WebMOTIFS provides a web interface that facilitates the discovery and analysis of DNA-sequence motifs. Several studies have shown that the accuracy of motif discovery can be significantly improved by using multiple de novo motif discovery programs and using randomized control calculations to identify the most significant motifs or by using Bayesian approaches. WebMOTIFS makes it easy to apply these strategies. Using a single submission form, users can run several motif discovery programs and score, cluster and visualize the results. In addition, the Bayesian motif discovery program THEME can be used to determine the class of transcription factors that is most likely to regulate a set of sequences. Input can be provided as a list of gene or probe identifiers. Used with the default settings, WebMOTIFS accurately identifies biologically relevant motifs from diverse data in several species. WebMOTIFS is freely available at http://fraenkel.mit.edu/webmotifs. PMID:17584794
Specific interaction of mutant p53 with regions of matrix attachment region DNA elements (MARs) with a high potential for base-unpairing

PubMed Central

Will, Katrin; Warnecke, Gabriele; Wiesmüller, Lisa; Deppert, Wolfgang

1998-01-01

Mutant, but not wild-type p53 binds with high affinity to a variety of MAR-DNA elements (MARs), suggesting that MAR-binding of mutant p53 relates to the dominant-oncogenic activities proposed for mutant p53. MARs recognized by mutant p53 share AT richness and contain variations of an AATATATTT “DNA-unwinding motif,” which enhances the structural dynamics of chromatin and promotes regional DNA base-unpairing. Mutant p53 specifically interacted with MAR-derived oligonucleotides carrying such unwinding motifs, catalyzing DNA strand separation when this motif was located within a structurally labile sequence environment. Addition of GC-clamps to the respective MAR-oligonucleotides or introducing mutations into the unwinding motif strongly reduced DNA strand separation, but supported the formation of tight complexes between mutant p53 and such oligonucleotides. We conclude that the specific interaction of mutant p53 with regions of MAR-DNA with a high potential for base-unpairing provides the basis for the high-affinity binding of mutant p53 to MAR-DNA. PMID:9811860
Genome-wide analysis of putative peroxiredoxin in unicellular and filamentous cyanobacteria.

PubMed

Cui, Hongli; Wang, Yipeng; Wang, Yinchu; Qin, Song

2012-11-16

Cyanobacteria are photoautotrophic prokaryotes with wide variations in genome sizes and ecological habitats. Peroxiredoxin (PRX) is an important protein that plays essential roles in protecting own cells against reactive oxygen species (ROS). PRXs have been identified from mammals, fungi and higher plants. However, knowledge on cyanobacterial PRXs still remains obscure. With the availability of 37 sequenced cyanobacterial genomes, we performed a comprehensive comparative analysis of PRXs and explored their diversity, distribution, domain structure and evolution. Overall 244 putative prx genes were identified, which were abundant in filamentous diazotrophic cyanobacteria, Acaryochloris marina MBIC 11017, and unicellular cyanobacteria inhabiting freshwater and hot-springs, while poor in all Prochlorococcus and marine Synechococcus strains. Among these putative genes, 25 open reading frames (ORFs) encoding hypothetical proteins were identified as prx gene family members and the others were already annotated as prx genes. All 244 putative PRXs were classified into five major subfamilies (1-Cys, 2-Cys, BCP, PRX5_like, and PRX-like) according to their domain structures. The catalytic motifs of the cyanobacterial PRXs were similar to those of eukaryotic PRXs and highly conserved in all but the PRX-like subfamily. Classical motif (CXXC) of thioredoxin was detected in protein sequences from the PRX-like subfamily. Phylogenetic tree constructed of catalytic domains coincided well with the domain structures of PRXs and the phylogenies based on 16s rRNA. The distribution of genes encoding PRXs in different unicellular and filamentous cyanobacteria especially those sub-families like PRX-like or 1-Cys PRX correlate with the genome size, eco-physiology, and physiological properties of the organisms. Cyanobacterial and eukaryotic PRXs share similar conserved motifs, indicating that cyanobacteria adopt similar catalytic mechanisms as eukaryotes. All cyanobacterial PRX proteins share highly similar structures, implying that these genes may originate from a common ancestor. In this study, a general framework of the sequence-structure-function connections of the PRXs was revealed, which may facilitate functional investigations of PRXs in various organisms.
Genome-wide analysis of putative peroxiredoxin in unicellular and filamentous cyanobacteria

PubMed Central

2012-01-01

Background Cyanobacteria are photoautotrophic prokaryotes with wide variations in genome sizes and ecological habitats. Peroxiredoxin (PRX) is an important protein that plays essential roles in protecting own cells against reactive oxygen species (ROS). PRXs have been identified from mammals, fungi and higher plants. However, knowledge on cyanobacterial PRXs still remains obscure. With the availability of 37 sequenced cyanobacterial genomes, we performed a comprehensive comparative analysis of PRXs and explored their diversity, distribution, domain structure and evolution. Results Overall 244 putative prx genes were identified, which were abundant in filamentous diazotrophic cyanobacteria, Acaryochloris marina MBIC 11017, and unicellular cyanobacteria inhabiting freshwater and hot-springs, while poor in all Prochlorococcus and marine Synechococcus strains. Among these putative genes, 25 open reading frames (ORFs) encoding hypothetical proteins were identified as prx gene family members and the others were already annotated as prx genes. All 244 putative PRXs were classified into five major subfamilies (1-Cys, 2-Cys, BCP, PRX5_like, and PRX-like) according to their domain structures. The catalytic motifs of the cyanobacterial PRXs were similar to those of eukaryotic PRXs and highly conserved in all but the PRX-like subfamily. Classical motif (CXXC) of thioredoxin was detected in protein sequences from the PRX-like subfamily. Phylogenetic tree constructed of catalytic domains coincided well with the domain structures of PRXs and the phylogenies based on 16s rRNA. Conclusions The distribution of genes encoding PRXs in different unicellular and filamentous cyanobacteria especially those sub-families like PRX-like or 1-Cys PRX correlate with the genome size, eco-physiology, and physiological properties of the organisms. Cyanobacterial and eukaryotic PRXs share similar conserved motifs, indicating that cyanobacteria adopt similar catalytic mechanisms as eukaryotes. All cyanobacterial PRX proteins share highly similar structures, implying that these genes may originate from a common ancestor. In this study, a general framework of the sequence-structure-function connections of the PRXs was revealed, which may facilitate functional investigations of PRXs in various organisms. PMID:23157370
An effective approach for annotation of protein families with low sequence similarity and conserved motifs: identifying GDSL hydrolases across the plant kingdom.

PubMed

Vujaklija, Ivan; Bielen, Ana; Paradžik, Tina; Biđin, Siniša; Goldstein, Pavle; Vujaklija, Dušica

2016-02-18

The massive accumulation of protein sequences arising from the rapid development of high-throughput sequencing, coupled with automatic annotation, results in high levels of incorrect annotations. In this study, we describe an approach to decrease annotation errors of protein families characterized by low overall sequence similarity. The GDSL lipolytic family comprises proteins with multifunctional properties and high potential for pharmaceutical and industrial applications. The number of proteins assigned to this family has increased rapidly over the last few years. In particular, the natural abundance of GDSL enzymes reported recently in plants indicates that they could be a good source of novel GDSL enzymes. We noticed that a significant proportion of annotated sequences lack specific GDSL motif(s) or catalytic residue(s). Here, we applied motif-based sequence analyses to identify enzymes possessing conserved GDSL motifs in selected proteomes across the plant kingdom. Motif-based HMM scanning (Viterbi decoding-VD and posterior decoding-PD) and the here described PD/VD protocol were successfully applied on 12 selected plant proteomes to identify sequences with GDSL motifs. A significant number of identified GDSL sequences were novel. Moreover, our scanning approach successfully detected protein sequences lacking at least one of the essential motifs (171/820) annotated by Pfam profile search (PfamA) as GDSL. Based on these analyses we provide a curated list of GDSL enzymes from the selected plants. CLANS clustering and phylogenetic analysis helped us to gain a better insight into the evolutionary relationship of all identified GDSL sequences. Three novel GDSL subfamilies as well as unreported variations in GDSL motifs were discovered in this study. In addition, analyses of selected proteomes showed a remarkable expansion of GDSL enzymes in the lycophyte, Selaginella moellendorffii. Finally, we provide a general motif-HMM scanner which is easily accessible through the graphical user interface ( http://compbio.math.hr/ ). Our results show that scanning with a carefully parameterized motif-HMM is an effective approach for annotation of protein families with low sequence similarity and conserved motifs. The results of this study expand current knowledge and provide new insights into the evolution of the large GDSL-lipase family in land plants.
TCOF1 gene encodes a putative nucleolar phosphoprotein that exhibits mutations in Treacher Collins Syndrome throughout its coding region.

PubMed

Wise, C A; Chiang, L C; Paznekas, W A; Sharma, M; Musy, M M; Ashley, J A; Lovett, M; Jabs, E W

1997-04-01

Treacher Collins Syndrome (TCS) is the most common of the human mandibulofacial dysostosis disorders. Recently, a partial TCOF1 cDNA was identified and shown to contain mutations in TCS families. Here we present the entire exon/intron genomic structure and the complete coding sequence of TCOF1. TCOF1 encodes a low complexity protein of 1,411 amino acids, whose predicted protein structure reveals repeated motifs that mirror the organization of its exons. These motifs are shared with nucleolar trafficking proteins in other species and are predicted to be highly phosphorylated by casein kinase. Consistent with this, the full-length TCOF1 protein sequence also contains putative nuclear and nucleolar localization signals. Throughout the open reading frame, we detected an additional eight mutations in TCS families and several polymorphisms. We postulate that TCS results from defects in a nucleolar trafficking protein that is critically required during human craniofacial development.
TCOF1 gene encodes a putative nucleolar phosphoprotein that exhibits mutations in Treacher Collins Syndrome throughout its coding region

PubMed Central

Wise, Carol A.; Chiang, Lydia C.; Paznekas, William A.; Sharma, Mridula; Musy, Maurice M.; Ashley, Jennifer A.; Lovett, Michael; Jabs, Ethylin W.

1997-01-01

Treacher Collins Syndrome (TCS) is the most common of the human mandibulofacial dysostosis disorders. Recently, a partial TCOF1 cDNA was identified and shown to contain mutations in TCS families. Here we present the entire exon/intron genomic structure and the complete coding sequence of TCOF1. TCOF1 encodes a low complexity protein of 1,411 amino acids, whose predicted protein structure reveals repeated motifs that mirror the organization of its exons. These motifs are shared with nucleolar trafficking proteins in other species and are predicted to be highly phosphorylated by casein kinase. Consistent with this, the full-length TCOF1 protein sequence also contains putative nuclear and nucleolar localization signals. Throughout the open reading frame, we detected an additional eight mutations in TCS families and several polymorphisms. We postulate that TCS results from defects in a nucleolar trafficking protein that is critically required during human craniofacial development. PMID:9096354
Automatic annotation of protein motif function with Gene Ontology terms.

PubMed

Lu, Xinghua; Zhai, Chengxiang; Gopalakrishnan, Vanathi; Buchanan, Bruce G

2004-09-02

Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist foridentifying candidate protein motifs at the whole genome level. However, a much needed and important task is to determine the functions of the newly identified protein motifs. The Gene Ontology (GO) project is an endeavor to annotate the function of genes or protein sequences with terms from a dynamic, controlled vocabulary and these annotations serve well as a knowledge base. This paper presents methods to mine the GO knowledge base and use the association between the GO terms assigned to a sequence and the motifs matched by the same sequence as evidence for predicting the functions of novel protein motifs automatically. The task of assigning GO terms to protein motifs is viewed as both a binary classification and information retrieval problem, where PROSITE motifs are used as samples for mode training and functional prediction. The mutual information of a motif and aGO term association is found to be a very useful feature. We take advantage of the known motifs to train a logistic regression classifier, which allows us to combine mutual information with other frequency-based features and obtain a probability of correct association. The trained logistic regression model has intuitively meaningful and logically plausible parameter values, and performs very well empirically according to our evaluation criteria. In this research, different methods for automatic annotation of protein motifs have been investigated. Empirical result demonstrated that the methods have a great potential for detecting and augmenting information about the functions of newly discovered candidate protein motifs.
A common structural motif in immunopotentiating peptides with sequences present in human autoantigens. Elicitation of a response mediated by monocytes and Th1 cells.

PubMed

López-Moratalla, N; Ruíz, E; López-Zabalza, M J; Santiago, E

1996-12-16

We have found a common structural motif in human autoantigens, heat shock proteins and viral proteins. Peptides modelled after sequences present in those molecules were synthesized and immunomodulating properties tested. They share a core of 15 amino acid residues and a common pattern ('2-6-11' motif) characterized by requirements at fixed positions with respect to a Pro (position 6); an apolar residue or a Lys at position 2; and a Glu, Asp or Lys at position 11. Any of these peptides, when added to cultures of lymphomononuclear cells, caused the activation of monocytes manifested by a release of IL-1 alpha, IL-1 beta and TNF alpha. A release of INF gamma and IL-2 took also place; this release was abolished by anti-DR antibodies. Neither IL-4 nor IL-5 could be detected. This suggests a presentation by APCs and the appearance of cells with a Th1 phenotype. Monocytes and Th1 cells freshly obtained from 12 patients of Graves' disease, 8 of Hashimoto's disease and 8 of primary biliary cirrhosis exhibited activation features similar to those found in cells from healthy subjects incubated in the presence of peptides with a "2-6-11' motif and representing fragments of autoantigens. Their immunopotentiating properties suggest their involvement in the initiation or progression of the autoimmune response mediated by activated monocytes and Th1 cells.
Subgroup 4 R2R3-MYBs in conifer trees: gene family expansion and contribution to the isoprenoid- and flavonoid-oriented responses

PubMed Central

Bedon, Frank; Bomal, Claude; Caron, Sébastien; Levasseur, Caroline; Boyle, Brian; Mansfield, Shawn D.; Schmidt, Axel; Gershenzon, Jonathan; Grima-Pettenati, Jacqueline; Séguin, Armand; MacKay, John

2010-01-01

Transcription factors play a fundamental role in plants by orchestrating temporal and spatial gene expression in response to environmental stimuli. Several R2R3-MYB genes of the Arabidopsis subgroup 4 (Sg4) share a C-terminal EAR motif signature recently linked to stress response in angiosperm plants. It is reported here that nearly all Sg4 MYB genes in the conifer trees Picea glauca (white spruce) and Pinus taeda (loblolly pine) form a monophyletic clade (Sg4C) that expanded following the split of gymnosperm and angiosperm lineages. Deeper sequencing in P. glauca identified 10 distinct Sg4C sequences, indicating over-represention of Sg4 sequences compared with angiosperms such as Arabidopsis, Oryza, Vitis, and Populus. The Sg4C MYBs share the EAR motif core. Many of them had stress-responsive transcript profiles after wounding, jasmonic acid (JA) treatment, or exposure to cold in P. glauca and P. taeda, with MYB14 transcripts accumulating most strongly and rapidly. Functional characterization was initiated by expressing the P. taeda MYB14 (PtMYB14) gene in transgenic P. glauca plantlets with a tissue-preferential promoter (cinnamyl alcohol dehydrogenase) and a ubiquitous gene promoter (ubiquitin). Histological, metabolite, and transcript (microarray and targeted quantitiative real-time PCR) analyses of PtMYB14 transgenics, coupled with mechanical wounding and JA application experiments on wild-type plantlets, allowed identification of PtMYB14 as a putative regulator of an isoprenoid-oriented response that leads to the accumulation of sesquiterpene in conifers. Data further suggested that PtMYB14 may contribute to a broad defence response implicating flavonoids. This study also addresses the potential involvement of closely related Sg4C sequences in stress responses and plant evolution. PMID:20732878
Occurrence probability of structured motifs in random sequences.

PubMed

Robin, S; Daudin, J-J; Richard, H; Sagot, M-F; Schbath, S

2002-01-01

The problem of extracting from a set of nucleic acid sequences motifs which may have biological function is more and more important. In this paper, we are interested in particular motifs that may be implicated in the transcription process. These motifs, called structured motifs, are composed of two ordered parts separated by a variable distance and allowing for substitutions. In order to assess their statistical significance, we propose approximations of the probability of occurrences of such a structured motif in a given sequence. An application of our method to evaluate candidate promoters in E. coli and B. subtilis is presented. Simulations show the goodness of the approximations.

SSMART: Sequence-structure motif identification for RNA-binding proteins.

PubMed

Munteanu, Alina; Mukherjee, Neelanjan; Ohler, Uwe

2018-06-11

RNA-binding proteins (RBPs) regulate every aspect of RNA metabolism and function. There are hundreds of RBPs encoded in the eukaryotic genomes, and each recognize its RNA targets through a specific mixture of RNA sequence and structure properties. For most RBPs, however, only a primary sequence motif has been determined, while the structure of the binding sites is uncharacterized. We developed SSMART, an RNA motif finder that simultaneously models the primary sequence and the structural properties of the RNA targets sites. The sequence-structure motifs are represented as consensus strings over a degenerate alphabet, extending the IUPAC codes for nucleotides to account for secondary structure preferences. Evaluation on synthetic data showed that SSMART is able to recover both sequence and structure motifs implanted into 3'UTR-like sequences, for various degrees of structured/unstructured binding sites. In addition, we successfully used SSMART on high-throughput in vivo and in vitro data, showing that we not only recover the known sequence motif, but also gain insight into the structural preferences of the RBP. Availability: SSMART is freely available at https://ohlerlab.mdc-berlin.de/software/SSMART_137/. Supplementary data are available at Bioinformatics online.
Molecular population dynamics of DNA structures in a bcl-2 promoter sequence is regulated by small molecules and the transcription factor hnRNP LL.

PubMed

Cui, Yunxi; Koirala, Deepak; Kang, HyunJin; Dhakal, Soma; Yangyuoru, Philip; Hurley, Laurence H; Mao, Hanbin

2014-05-01

Minute difference in free energy change of unfolding among structures in an oligonucleotide sequence can lead to a complex population equilibrium, which is rather challenging for ensemble techniques to decipher. Herein, we introduce a new method, molecular population dynamics (MPD), to describe the intricate equilibrium among non-B deoxyribonucleic acid (DNA) structures. Using mechanical unfolding in laser tweezers, we identified six DNA species in a cytosine (C)-rich bcl-2 promoter sequence. Population patterns of these species with and without a small molecule (IMC-76 or IMC-48) or the transcription factor hnRNP LL are compared to reveal the MPD of different species. With a pattern recognition algorithm, we found that IMC-48 and hnRNP LL share 80% similarity in stabilizing i-motifs with 60 s incubation. In contrast, IMC-76 demonstrates an opposite behavior, preferring flexible DNA hairpins. With 120-180 s incubation, IMC-48 and hnRNP LL destabilize i-motifs, which has been previously proposed to activate bcl-2 transcriptions. These results provide strong support, from the population equilibrium perspective, that small molecules and hnRNP LL can modulate bcl-2 transcription through interaction with i-motifs. The excellent agreement with biochemical results firmly validates the MPD analyses, which, we expect, can be widely applicable to investigate complex equilibrium of biomacromolecules. © 2014 The Author(s). Published by Oxford University Press [on behalf of Nucleic Acids Research].
Memetic algorithms for de novo motif-finding in biomedical sequences.

PubMed

Bi, Chengpeng

2012-09-01

The objectives of this study are to design and implement a new memetic algorithm for de novo motif discovery, which is then applied to detect important signals hidden in various biomedical molecular sequences. In this paper, memetic algorithms are developed and tested in de novo motif-finding problems. Several strategies in the algorithm design are employed that are to not only efficiently explore the multiple sequence local alignment space, but also effectively uncover the molecular signals. As a result, there are a number of key features in the implementation of the memetic motif-finding algorithm (MaMotif), including a chromosome replacement operator, a chromosome alteration-aware local search operator, a truncated local search strategy, and a stochastic operation of local search imposed on individual learning. To test the new algorithm, we compare MaMotif with a few of other similar algorithms using simulated and experimental data including genomic DNA, primary microRNA sequences (let-7 family), and transmembrane protein sequences. The new memetic motif-finding algorithm is successfully implemented in C++, and exhaustively tested with various simulated and real biological sequences. In the simulation, it shows that MaMotif is the most time-efficient algorithm compared with others, that is, it runs 2 times faster than the expectation maximization (EM) method and 16 times faster than the genetic algorithm-based EM hybrid. In both simulated and experimental testing, results show that the new algorithm is compared favorably or superior to other algorithms. Notably, MaMotif is able to successfully discover the transcription factors' binding sites in the chromatin immunoprecipitation followed by massively parallel sequencing (ChIP-Seq) data, correctly uncover the RNA splicing signals in gene expression, and precisely find the highly conserved helix motif in the transmembrane protein sequences, as well as rightly detect the palindromic segments in the primary microRNA sequences. The memetic motif-finding algorithm is effectively designed and implemented, and its applications demonstrate it is not only time-efficient, but also exhibits excellent performance while compared with other popular algorithms. Copyright © 2012 Elsevier B.V. All rights reserved.
BEAM web server: a tool for structural RNA motif discovery.

PubMed

Pietrosanto, Marco; Adinolfi, Marta; Casula, Riccardo; Ausiello, Gabriele; Ferrè, Fabrizio; Helmer-Citterich, Manuela

2018-03-15

RNA structural motif finding is a relevant problem that becomes computationally hard when working on high-throughput data (e.g. eCLIP, PAR-CLIP), often represented by thousands of RNA molecules. Currently, the BEAM server is the only web tool capable to handle tens of thousands of RNA in input with a motif discovery procedure that is only limited by the current secondary structure prediction accuracies. The recently developed method BEAM (BEAr Motifs finder) can analyze tens of thousands of RNA molecules and identify RNA secondary structure motifs associated to a measure of their statistical significance. BEAM is extremely fast thanks to the BEAR encoding that transforms each RNA secondary structure in a string of characters. BEAM also exploits the evolutionary knowledge contained in a substitution matrix of secondary structure elements, extracted from the RFAM database of families of homologous RNAs. The BEAM web server has been designed to streamline data pre-processing by automatically handling folding and encoding of RNA sequences, giving users a choice for the preferred folding program. The server provides an intuitive and informative results page with the list of secondary structure motifs identified, the logo of each motif, its significance, graphic representation and information about its position in the RNA molecules sharing it. The web server is freely available at http://beam.uniroma2.it/ and it is implemented in NodeJS and Python with all major browsers supported. marco.pietrosanto@uniroma2.it. Supplementary data are available at Bioinformatics online.
Discovering Sequence Motifs with Arbitrary Insertions and Deletions

PubMed Central

Frith, Martin C.; Saunders, Neil F. W.; Kobe, Bostjan; Bailey, Timothy L.

2008-01-01

Biology is encoded in molecular sequences: deciphering this encoding remains a grand scientific challenge. Functional regions of DNA, RNA, and protein sequences often exhibit characteristic but subtle motifs; thus, computational discovery of motifs in sequences is a fundamental and much-studied problem. However, most current algorithms do not allow for insertions or deletions (indels) within motifs, and the few that do have other limitations. We present a method, GLAM2 (Gapped Local Alignment of Motifs), for discovering motifs allowing indels in a fully general manner, and a companion method GLAM2SCAN for searching sequence databases using such motifs. glam2 is a generalization of the gapless Gibbs sampling algorithm. It re-discovers variable-width protein motifs from the PROSITE database significantly more accurately than the alternative methods PRATT and SAM-T2K. Furthermore, it usefully refines protein motifs from the ELM database: in some cases, the refined motifs make orders of magnitude fewer overpredictions than the original ELM regular expressions. GLAM2 performs respectably on the BAliBASE multiple alignment benchmark, and may be superior to leading multiple alignment methods for “motif-like” alignments with N- and C-terminal extensions. Finally, we demonstrate the use of GLAM2 to discover protein kinase substrate motifs and a gapped DNA motif for the LIM-only transcriptional regulatory complex: using GLAM2SCAN, we identify promising targets for the latter. GLAM2 is especially promising for short protein motifs, and it should improve our ability to identify the protein cleavage sites, interaction sites, post-translational modification attachment sites, etc., that underlie much of biology. It may be equally useful for arbitrarily gapped motifs in DNA and RNA, although fewer examples of such motifs are known at present. GLAM2 is public domain software, available for download at http://bioinformatics.org.au/glam2. PMID:18437229
cWINNOWER Algorithm for Finding Fuzzy DNA Motifs

NASA Technical Reports Server (NTRS)

Liang, Shoudan

2003-01-01

The cWINNOWER algorithm detects fuzzy motifs in DNA sequences rich in protein-binding signals. A signal is defined as any short nucleotide pattern having up to d mutations differing from a motif of length l. The algorithm finds such motifs if multiple mutated copies of the motif (i.e., the signals) are present in the DNA sequence in sufficient abundance. The cWINNOWER algorithm substantially improves the sensitivity of the winnower method of Pevzner and Sze by imposing a consensus constraint, enabling it to detect much weaker signals. We studied the minimum number of detectable motifs qc as a function of sequence length N for random sequences. We found that qc increases linearly with N for a fast version of the algorithm based on counting three-member sub-cliques. Imposing consensus constraints reduces qc, by a factor of three in this case, which makes the algorithm dramatically more sensitive. Our most sensitive algorithm, which counts four-member sub-cliques, needs a minimum of only 13 signals to detect motifs in a sequence of length N = 12000 for (l,d) = (15,4).
Structural and Functional Basis of the Fidelity of Nucleotide Selection by Flavivirus RNA-Dependent RNA Polymerases

PubMed Central

Canard, Bruno

2018-01-01

Viral RNA-dependent RNA polymerases (RdRps) play a central role not only in viral replication, but also in the genetic evolution of viral RNAs. After binding to an RNA template and selecting 5′-triphosphate ribonucleosides, viral RdRps synthesize an RNA copy according to Watson-Crick base-pairing rules. The copy process sometimes deviates from both the base-pairing rules specified by the template and the natural ribose selectivity and, thus, the process is error-prone due to the intrinsic (in)fidelity of viral RdRps. These enzymes share a number of conserved amino-acid sequence strings, called motifs A–G, which can be defined from a structural and functional point-of-view. A co-relation is gradually emerging between mutations in these motifs and viral genome evolution or observed mutation rates. Here, we review our current knowledge on these motifs and their role on the structural and mechanistic basis of the fidelity of nucleotide selection and RNA synthesis by Flavivirus RdRps. PMID:29385764
Identification of sequence motifs significantly associated with antisense activity.

PubMed

McQuisten, Kyle A; Peek, Andrew S

2007-06-07

Predicting the suppression activity of antisense oligonucleotide sequences is the main goal of the rational design of nucleic acids. To create an effective predictive model, it is important to know what properties of an oligonucleotide sequence associate significantly with antisense activity. Also, for the model to be efficient we must know what properties do not associate significantly and can be omitted from the model. This paper will discuss the results of a randomization procedure to find motifs that associate significantly with either high or low antisense suppression activity, analysis of their properties, as well as the results of support vector machine modelling using these significant motifs as features. We discovered 155 motifs that associate significantly with high antisense suppression activity and 202 motifs that associate significantly with low suppression activity. The motifs range in length from 2 to 5 bases, contain several motifs that have been previously discovered as associating highly with antisense activity, and have thermodynamic properties consistent with previous work associating thermodynamic properties of sequences with their antisense activity. Statistical analysis revealed no correlation between a motif's position within an antisense sequence and that sequences antisense activity. Also, many significant motifs existed as subwords of other significant motifs. Support vector regression experiments indicated that the feature set of significant motifs increased correlation compared to all possible motifs as well as several subsets of the significant motifs. The thermodynamic properties of the significantly associated motifs support existing data correlating the thermodynamic properties of the antisense oligonucleotide with antisense efficiency, reinforcing our hypothesis that antisense suppression is strongly associated with probe/target thermodynamics, as there are no enzymatic mediators to speed the process along like the RNA Induced Silencing Complex (RISC) in RNAi. The independence of motif position and antisense activity also allows us to bypass consideration of this feature in the modelling process, promoting model efficiency and reducing the chance of overfitting when predicting antisense activity. The increase in SVR correlation with significant features compared to nearest-neighbour features indicates that thermodynamics alone is likely not the only factor in determining antisense efficiency.
kpLogo: positional k-mer analysis reveals hidden specificity in biological sequences

PubMed Central

2017-01-01

Abstract Motifs of only 1–4 letters can play important roles when present at key locations within macromolecules. Because existing motif-discovery tools typically miss these position-specific short motifs, we developed kpLogo, a probability-based logo tool for integrated detection and visualization of position-specific ultra-short motifs from a set of aligned sequences. kpLogo also overcomes the limitations of conventional motif-visualization tools in handling positional interdependencies and utilizing ranked or weighted sequences increasingly available from high-throughput assays. kpLogo can be found at http://kplogo.wi.mit.edu/. PMID:28460012
A motif detection and classification method for peptide sequences using genetic programming.

PubMed

Tomita, Yasuyuki; Kato, Ryuji; Okochi, Mina; Honda, Hiroyuki

2008-08-01

An exploration of common rules (property motifs) in amino acid sequences has been required for the design of novel sequences and elucidation of the interactions between molecules controlled by the structural or physical environment. In the present study, we developed a new method to search property motifs that are common in peptide sequence data. Our method comprises the following two characteristics: (i) the automatic determination of the position and length of common property motifs by calculating the physicochemical similarity of amino acids, and (ii) the quick and effective exploration of motif candidates that discriminates the positives and negatives by the introduction of genetic programming (GP). Our method was evaluated by two types of model data sets. First, the intentionally buried property motifs were searched in the artificially derived peptide data containing intentionally buried property motifs. As a result, the expected property motifs were correctly extracted by our algorithm. Second, the peptide data that interact with MHC class II molecules were analyzed as one of the models of biologically active peptides with buried motifs in various lengths. Twofold MHC class II binding peptides were identified with the rule using our method, compared to the existing scoring matrix method. In conclusion, our GP based motif searching approach enabled to obtain knowledge of functional aspects of the peptides without any prior knowledge.
Identification of a novel mitotic phosphorylation motif associated with protein localization to the mitotic apparatus

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yang, Feng; Camp, David G.; Gritsenko, Marina A.

2007-11-16

The chromosomal passenger complex (CPC) is a critical regulator of chromosome, cytoskeleton and membrane dynamics during mitosis. Here, we identified phosphopeptides and phosphoprotein complexes recognized by a phosphorylation specific antibody that labels the CPC using liquid chromatography coupled to mass spectrometry. A mitotic phosphorylation motif (PX{G/T/S}{L/M}[pS]P or WGL[pS]P) was identified in 11 proteins including Fzr/Cdh1 and RIC-8, two proteins with potential links to the CPC. Phosphoprotein complexes contained known CPC components INCENP, Aurora-B and TD-60, as well as SMAD2, 14-3-3 proteins, PP2A, and Cdk1, a likely kinase for this motif. Protein sequence analysis identified phosphorylation motifs in additional proteins includingmore » SMAD2, Plk3 and INCENP. Mitotic SMAD2 and Plk3 phosphorylation was confirmed using phosphorylation specific antibodies, and in the case of Plk3, phosphorylation correlates with its localization to the mitotic apparatus. A mutagenesis approach was used to show INCENP phosphorylation is required for midbody localization. These results provide evidence for a shared phosphorylation event that regulates localization of critical proteins during mitosis.« less
SARNAclust: Semi-automatic detection of RNA protein binding motifs from immunoprecipitation data

PubMed Central

Dotu, Ivan; Adamson, Scott I.; Coleman, Benjamin; Fournier, Cyril; Ricart-Altimiras, Emma; Eyras, Eduardo

2018-01-01

RNA-protein binding is critical to gene regulation, controlling fundamental processes including splicing, translation, localization and stability, and aberrant RNA-protein interactions are known to play a role in a wide variety of diseases. However, molecular understanding of RNA-protein interactions remains limited; in particular, identification of RNA motifs that bind proteins has long been challenging, especially when such motifs depend on both sequence and structure. Moreover, although RNA binding proteins (RBPs) often contain more than one binding domain, algorithms capable of identifying more than one binding motif simultaneously have not been developed. In this paper we present a novel pipeline to determine binding peaks in crosslinking immunoprecipitation (CLIP) data, to discover multiple possible RNA sequence/structure motifs among them, and to experimentally validate such motifs. At the core is a new semi-automatic algorithm SARNAclust, the first unsupervised method to identify and deconvolve multiple sequence/structure motifs simultaneously. SARNAclust computes similarity between sequence/structure objects using a graph kernel, providing the ability to isolate the impact of specific features through the bulge graph formalism. Application of SARNAclust to synthetic data shows its capability of clustering 5 motifs at once with a V-measure value of over 0.95, while GraphClust achieves only a V-measure of 0.083 and RNAcontext cannot detect any of the motifs. When applied to existing eCLIP sets, SARNAclust finds known motifs for SLBP and HNRNPC and novel motifs for several other RBPs such as AGGF1, AKAP8L and ILF3. We demonstrate an experimental validation protocol, a targeted Bind-n-Seq-like high-throughput sequencing approach that relies on RNA inverse folding for oligo pool design, that can validate the components within the SLBP motif. Finally, we use this protocol to experimentally interrogate the SARNAclust motif predictions for protein ILF3. Our results support a newly identified partially double-stranded UUUUUGAGA motif similar to that known for the splicing factor HNRNPC. PMID:29596423
Dimeric PROP1 binding to diverse palindromic TAAT sequences promotes its transcriptional activity.

PubMed

Nakayama, Michie; Kato, Takako; Susa, Takao; Sano, Akiko; Kitahara, Kousuke; Kato, Yukio

2009-08-13

Mutations in the Prop1 gene are responsible for murine Ames dwarfism and human combined pituitary hormone deficiency with hypogonadism. Recently, we reported that PROP1 is a possible transcription factor for gonadotropin subunit genes through plural cis-acting sites composed of AT-rich sequences containing a TAAT motif which differs from its consensus binding sequence known as PRDQ9 (TAATTGAATTA). This study aimed to verify the binding specificity and sequence of PROP1 by applying the method of SELEX (Systematic Evolution of Ligands by EXponential enrichment), EMSA (electrophoretic mobility shift assay) and transient transfection assay. SELEX, after 5, 7 and 9 generations of selection using a random sequence library, showed that nucleotides containing one or two TAAT motifs were accumulated and accounted for 98.5% at the 9th generation. Aligned sequences and EMSA demonstrated that PROP1 binds preferentially to 11 nucleotides composed of an inverted TAAT motif separated by 3 nucleotides with variation in the half site of palindromic TAAT motifs and with preferential requirement of T at the nucleotide number 5 immediately 3' to a TAAT motif. Transient transfection assay demonstrated first that dimeric binding of PROP1 to an inverted TAAT motif and its cognates resulted in transcriptional activation, whereas monomeric binding of PROP1 to a single TAAT motif and an inverted ATTA motif did not mediate activation. Thus, this study demonstrated that dimeric binding of PROP1 is able to recognize diverse palindromic TAAT sequences separated by 3 nucleotides and to exhibit its transcriptional activity.
Mining for class-specific motifs in protein sequence classification

PubMed Central

2013-01-01

Background In protein sequence classification, identification of the sequence motifs or n-grams that can precisely discriminate between classes is a more interesting scientific question than the classification itself. A number of classification methods aim at accurate classification but fail to explain which sequence features indeed contribute to the accuracy. We hypothesize that sequences in lower denominations (n-grams) can be used to explore the sequence landscape and to identify class-specific motifs that discriminate between classes during classification. Discriminative n-grams are short peptide sequences that are highly frequent in one class but are either minimally present or absent in other classes. In this study, we present a new substitution-based scoring function for identifying discriminative n-grams that are highly specific to a class. Results We present a scoring function based on discriminative n-grams that can effectively discriminate between classes. The scoring function, initially, harvests the entire set of 4- to 8-grams from the protein sequences of different classes in the dataset. Similar n-grams of the same size are combined to form new n-grams, where the similarity is defined by positive amino acid substitution scores in the BLOSUM62 matrix. Substitution has resulted in a large increase in the number of discriminatory n-grams harvested. Due to the unbalanced nature of the dataset, the frequencies of the n-grams are normalized using a dampening factor, which gives more weightage to the n-grams that appear in fewer classes and vice-versa. After the n-grams are normalized, the scoring function identifies discriminative 4- to 8-grams for each class that are frequent enough to be above a selection threshold. By mapping these discriminative n-grams back to the protein sequences, we obtained contiguous n-grams that represent short class-specific motifs in protein sequences. Our method fared well compared to an existing motif finding method known as Wordspy. We have validated our enriched set of class-specific motifs against the functionally important motifs obtained from the NLSdb, Prosite and ELM databases. We demonstrate that this method is very generic; thus can be widely applied to detect class-specific motifs in many protein sequence classification tasks. Conclusion The proposed scoring function and methodology is able to identify class-specific motifs using discriminative n-grams derived from the protein sequences. The implementation of amino acid substitution scores for similarity detection, and the dampening factor to normalize the unbalanced datasets have significant effect on the performance of the scoring function. Our multipronged validation tests demonstrate that this method can detect class-specific motifs from a wide variety of protein sequence classes with a potential application to detecting proteome-specific motifs of different organisms. PMID:23496846
DNA motifs associated with aberrant CpG island methylation.

PubMed

Feltus, F Alex; Lee, Eva K; Costello, Joseph F; Plass, Christoph; Vertino, Paula M

2006-05-01

Epigenetic silencing involving the aberrant methylation of promoter region CpG islands is widely recognized as a tumor suppressor silencing mechanism in cancer. However, the molecular pathways underlying aberrant DNA methylation remain elusive. Recently we showed that, on a genome-wide level, CpG island loci differ in their intrinsic susceptibility to aberrant methylation and that this susceptibility can be predicted based on underlying sequence context. These data suggest that there are sequence/structural features that contribute to the protection from or susceptibility to aberrant methylation. Here we use motif elicitation coupled with classification techniques to identify DNA sequence motifs that selectively define methylation-prone or methylation-resistant CpG islands. Motifs common to 28 methylation-prone or 47 methylation-resistant CpG island-containing genomic fragments were determined using the MEME and MAST algorithms (). The five most discriminatory motifs derived from methylation-prone sequences were found to be associated with CpG islands in general and were nonrandomly distributed throughout the genome. In contrast, the eight most discriminatory motifs derived from the methylation-resistant CpG islands were randomly distributed throughout the genome. Interestingly, this latter group tended to associate with Alu and other repetitive sequences. Used together, the frequency of occurrence of these motifs successfully discriminated methylation-prone and methylation-resistant CpG island groups with an accuracy of 87% after 10-fold cross-validation. The motifs identified here are candidate methylation-targeting or methylation-protection DNA sequences.
Isosteric And Non-Isosteric Base Pairs In RNA Motifs: Molecular Dynamics And Bioinformatics Study Of The Sarcin-Ricin Internal Loop

PubMed Central

Havrila, Marek; Réblová, Kamila; Zirbel, Craig L.; Leontis, Neocles B.; Šponer, Jiří

2013-01-01

The Sarcin-Ricin RNA motif (SR motif) is one of the most prominent recurrent RNA building blocks that occurs in many different RNA contexts and folds autonomously, i.e., in a context-independent manner. In this study, we combined bioinformatics analysis with explicit-solvent molecular dynamics (MD) simulations to better understand the relation between the RNA sequence and the evolutionary patterns of SR motif. SHAPE probing experiment was also performed to confirm fidelity of MD simulations. We identified 57 instances of the SR motif in a non-redundant subset of the RNA X-ray structure database and analyzed their basepairing, base-phosphate, and backbone-backbone interactions. We extracted sequences aligned to these instances from large ribosomal RNA alignments to determine frequency of occurrence for different sequence variants. We then used a simple scoring scheme based on isostericity to suggest 10 sequence variants with highly variable expected degree of compatibility with the SR motif 3D structure. We carried out MD simulations of SR motifs with these base substitutions. Non isosteric base substitutions led to unstable structures, but so did isosteric substitutions which were unable to make key base-phosphate interactions. MD technique explains why some potentially isosteric SR motifs are not realized during evolution. We also found that inability to form stable cWW geometry is an important factor in case of the first base pair of the flexible region of the SR motif. Comparison of structural, bioinformatics, SHAPE probing and MD simulation data reveals that explicit solvent MD simulations neatly reflect viability of different sequence variants of the SR motif. Thus, MD simulations can efficiently complement bioinformatics tools in studies of conservation patterns of RNA motifs and provide atomistic insight into the role of their different signature interactions. PMID:24144333
Comparison of the receptor FGFRL1 from sea urchins and humans illustrates evolution of a zinc binding motif in the intracellular domain

PubMed Central

2009-01-01

Background FGFRL1, the gene for the fifth member of the fibroblast growth factor receptor (FGFR) family, is found in all vertebrates from fish to man and in the cephalochordate amphioxus. Since it does not occur in more distantly related invertebrates such as insects and nematodes, we have speculated that FGFRL1 might have evolved just before branching of the vertebrate lineage from the other invertebrates (Beyeler and Trueb, 2006). Results We identified the gene for FGFRL1 also in the sea urchin Strongylocentrotus purpuratus and cloned its mRNA. The deduced amino acid sequence shares 62% sequence similarity with the human protein and shows conservation of all disulfides and N-linked carbohydrate attachment sites. Similar to the human protein, the S. purpuratus protein contains a histidine-rich motif at the C-terminus, but this motif is much shorter than the human counterpart. To analyze the function of the novel motif, recombinant fusion proteins were prepared in a bacterial expression system. The human fusion protein bound to nickel and zinc affinity columns, whereas the sea urchin protein barely interacted with such columns. Direct determination of metal ions by atomic absorption revealed 2.6 mole zinc/mole protein for human FGFRL1 and 1.7 mole zinc/mole protein for sea urchin FGFRL1. Conclusion The FGFRL1 gene has evolved much earlier than previously assumed. A comparison of the intracellular domain between sea urchin and human FGFRL1 provides interesting insights into the shaping of a novel zinc binding domain. PMID:20021659
Computational study of the fibril organization of polyglutamine repeats reveals a common motif identified in beta-helices.

PubMed

Zanuy, David; Gunasekaran, Kannan; Lesk, Arthur M; Nussinov, Ruth

2006-04-21

The formation of fibril aggregates by long polyglutamine sequences is assumed to play a major role in neurodegenerative diseases such as Huntington. Here, we model peptides rich in glutamine, through a series of molecular dynamics simulations. Starting from a rigid nanotube-like conformation, we have obtained a new conformational template that shares structural features of a tubular helix and of a beta-helix conformational organization. Our new model can be described as a super-helical arrangement of flat beta-sheet segments linked by planar turns or bends. Interestingly, our comprehensive analysis of the Protein Data Bank reveals that this is a common motif in beta-helices (termed beta-bend), although it has not been identified so far. The motif is based on the alternation of beta-sheet and helical conformation as the protein sequence is followed from the N to the C termini (beta-alpha(R)-beta-polyPro-beta). We further identify this motif in the ssNMR structure of the protofibril of the amyloidogenic peptide Abeta(1-40). The recurrence of the beta-bend suggests a general mode of connecting long parallel beta-sheet segments that would allow the growth of partially ordered fibril structures. The design allows the peptide backbone to change direction with a minimal loss of main chain hydrogen bonds. The identification of a coherent organization beyond that of the beta-sheet segments in different folds rich in parallel beta-sheets suggests a higher degree of ordered structure in protein fibrils, in agreement with their low solubility and dense molecular packing.
Structure, function, and evolution of bacterial ATP-binding cassette systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Davidson, A.L.; Dassa, E.; Orelle, C.

2010-07-27

The ATP-binding cassette (ABC) systems constitute one of the largest superfamilies of paralogous sequences. All ABC systems share a highly conserved ATP-hydrolyzing domain or protein (the ABC; also referred to as a nucleotide-binding domain [NBD]) that is unequivocally characterized by three short sequence motifs (Fig. 1): these are the Walker A and Walker B motifs, indicative of the presence of a nucleotide-binding site, and the signature motif, unique to ABC proteins, located upstream of the Walker B motif (426). Other motifs diagnostic of ABC proteins are also indicated in Fig. 1. The biological significance of these motifs is discussed inmore » Structure, Function, and Dynamics of the ABC. ABC systems are widespread among living organisms and have been detected in all genera of the three kingdoms of life, with remarkable conservation in the primary sequence of the cassette and in the organization of the constitutive domains or subunits (203, 420). ABC systems couple the energy of ATP hydrolysis to an impressively large variety of essential biological phenomena, comprising not only transmembrane (TM) transport, for which they are best known, but also several non-transport-related processes, such as translation elongation (62) and DNA repair (174). Although ABC systems deserve much attention because they are involved in severe human inherited diseases (107), they were first discovered and characterized in detail in prokaryotes, as early as the 1970s (13, 148, 238, 468). The most extensively analyzed systems were the high-affinity histidine and maltose uptake systems of Salmonella enterica serovar Typhimurium and Escherichia coli. Over 2 decades ago, after the completion of the nucleotide sequences encoding these transporters in the respective laboratories of Giovanna Ames and Maurice Hofnung, Hiroshi Nikaido and colleagues noticed that the two systems displayed a global similarity in the nature of their components and, moreover, that the primary sequences of MalK and HisP, the proteins suspected to energize these transporters, shared as much as 32% identity in amino acid residues when their sequences were aligned (171). Later, it was found that several bacterial proteins involved in uptake of nutrients, export of toxins, cell division, bacterial nodulation of plants, and DNA repair displayed the same similarity in their sequences (127, 196). This led to the notion that the conserved protein, which had been shown to bind ATP (198, 201), would probably energize the systems mentioned above by coupling the energy of ATP hydrolysis to transport. The latter was demonstrated with the maltose and histidine transporters by use of isolated membrane vesicles (105, 379) and purified transporters reconstituted into proteoliposomes (30, 98). The determination of the sequence of the first eukaryotic protein strongly similar to these bacterial transporters (the P-glycoprotein, involved in resistance of cancer cells to multiple drugs) (169, 179) demonstrated that these proteins were not restricted to prokaryotes. Two names, 'traffic ATPases' (15) and the more accepted name 'ABC transporters' (193, 218), were proposed for members of this new superfamily. ABC systems can be divided into three main functional categories, as follows. Importers mediate the uptake of nutrients in prokaryotes. The nature of the substrates that are transported is very wide, including mono- and oligosaccharides, organic and inorganic ions, amino acids, peptides, ironsiderophores, metals, polyamine cations, opines, and vitamins. Exporters are involved in the secretion of various molecules, such as peptides, lipids, hydrophobic drugs, polysaccharides, and proteins, including toxins such as hemolysin. The third category of systems is apparently not involved in transport, with some members being involved in translation of mRNA and in DNA repair. Despite the large, diverse population of substrates handled and the difference in the polarity of transport, importers and exporters share a common organization made of two hydrophobic membrane-spanning or integral membrane (IM) domains and two hydrophilic domains carrying the ABC peripherally associated with the IM domains on the cytosolic side of the membrane (26). In importers, these four domains are almost always independent polypeptide chains that come together to form a multimeric complex. In most exporters, including the E. coli hemolysin exporter HlyB, the N-terminal IM and the C-terminal ABC domains are fused as a single polypeptide chain (IM-ABC). An inverted organization in which the IM domain is C-terminal with respect to the ABC domain (ABC-IM) exists, such as in the MacB protein, involved in macrolide resistance in E. coli. No IM domain partners have been identified for ABC proteins falling into the third category, and these proteins consist of two ABCs fused together (ABC2).« less
cWINNOWER algorithm for finding fuzzy dna motifs

NASA Technical Reports Server (NTRS)

Liang, S.; Samanta, M. P.; Biegel, B. A.

2004-01-01

The cWINNOWER algorithm detects fuzzy motifs in DNA sequences rich in protein-binding signals. A signal is defined as any short nucleotide pattern having up to d mutations differing from a motif of length l. The algorithm finds such motifs if a clique consisting of a sufficiently large number of mutated copies of the motif (i.e., the signals) is present in the DNA sequence. The cWINNOWER algorithm substantially improves the sensitivity of the winnower method of Pevzner and Sze by imposing a consensus constraint, enabling it to detect much weaker signals. We studied the minimum detectable clique size qc as a function of sequence length N for random sequences. We found that qc increases linearly with N for a fast version of the algorithm based on counting three-member sub-cliques. Imposing consensus constraints reduces qc by a factor of three in this case, which makes the algorithm dramatically more sensitive. Our most sensitive algorithm, which counts four-member sub-cliques, needs a minimum of only 13 signals to detect motifs in a sequence of length N = 12,000 for (l, d) = (15, 4). Copyright Imperial College Press.

The Reconstruction of Condition-Specific Transcriptional Modules Provides New Insights in the Evolution of Yeast AP-1 Proteins

PubMed Central

Goudot, Christel; Etchebest, Catherine

2011-01-01

AP-1 proteins are transcription factors (TFs) that belong to the basic leucine zipper family, one of the largest families of TFs in eukaryotic cells. Despite high homology between their DNA binding domains, these proteins are able to recognize diverse DNA motifs. In yeasts, these motifs are referred as YRE (Yap Response Element) and are either seven (YRE-Overlap) or eight (YRE-Adjacent) base pair long. It has been proposed that the AP-1 DNA binding motif preference relies on a single change in the amino acid sequence of the yeast AP-1 TFs (an arginine in the YRE-O binding factors being replaced by a lysine in the YRE-A binding Yaps). We developed a computational approach to infer condition-specific transcriptional modules associated to the orthologous AP-1 protein Yap1p, Cgap1p and Cap1p, in three yeast species: the model yeast Saccharomyces cerevisiae and two pathogenic species Candida glabrata and Candida albicans. Exploitation of these modules in terms of predictions of the protein/DNA regulatory interactions changed our vision of AP-1 protein evolution. Cis-regulatory motif analyses revealed the presence of a conserved adenine in 5′ position of the canonical YRE sites. While Yap1p, Cgap1p and Cap1p shared a remarkably low number of target genes, an impressive conservation was observed in the YRE sequences identified by Yap1p and Cap1p. In Candida glabrata, we found that Cgap1p, unlike Yap1p and Cap1p, recognizes YRE-O and YRE-A motifs. These findings were supported by structural data available for the transcription factor Pap1p (Schizosaccharomyces pombe). Thus, whereas arginine and lysine substitutions in Cgap1p and Yap1p proteins were reported as responsible for a specific YRE-O or YRE-A preference, our analyses rather suggest that the ancestral yeast AP-1 protein could recognize both YRE-O and YRE-A motifs and that the arginine/lysine exchange is not the only determinant of the specialization of modern Yaps for one motif or another. PMID:21695268
A Gibbs sampler for motif detection in phylogenetically close sequences

NASA Astrophysics Data System (ADS)

Siddharthan, Rahul; van Nimwegen, Erik; Siggia, Eric

2004-03-01

Genes are regulated by transcription factors that bind to DNA upstream of genes and recognize short conserved ``motifs'' in a random intergenic ``background''. Motif-finders such as the Gibbs sampler compare the probability of these short sequences being represented by ``weight matrices'' to the probability of their arising from the background ``null model'', and explore this space (analogous to a free-energy landscape). But closely related species may show conservation not because of functional sites but simply because they have not had sufficient time to diverge, so conventional methods will fail. We introduce a new Gibbs sampler algorithm that accounts for common ancestry when searching for motifs, while requiring minimal ``prior'' assumptions on the number and types of motifs, assessing the significance of detected motifs by ``tracking'' clusters that stay together. We apply this scheme to motif detection in sporulation-cycle genes in the yeast S. cerevisiae, using recent sequences of other closely-related Saccharomyces species.
Discovering Motifs in Biological Sequences Using the Micron Automata Processor.

PubMed

Roy, Indranil; Aluru, Srinivas

2016-01-01

Finding approximately conserved sequences, called motifs, across multiple DNA or protein sequences is an important problem in computational biology. In this paper, we consider the (l, d) motif search problem of identifying one or more motifs of length l present in at least q of the n given sequences, with each occurrence differing from the motif in at most d substitutions. The problem is known to be NP-complete, and the largest solved instance reported to date is (26,11). We propose a novel algorithm for the (l,d) motif search problem using streaming execution over a large set of non-deterministic finite automata (NFA). This solution is designed to take advantage of the micron automata processor, a new technology close to deployment that can simultaneously execute multiple NFA in parallel. We demonstrate the capability for solving much larger instances of the (l, d) motif search problem using the resources available within a single automata processor board, by estimating run-times for problem instances (39,18) and (40,17). The paper serves as a useful guide to solving problems using this new accelerator technology.
RNA 3D Structural Motifs: Definition, Identification, Annotation, and Database Searching

NASA Astrophysics Data System (ADS)

Nasalean, Lorena; Stombaugh, Jesse; Zirbel, Craig L.; Leontis, Neocles B.

Structured RNA molecules resemble proteins in the hierarchical organization of their global structures, folding and broad range of functions. Structured RNAs are composed of recurrent modular motifs that play specific functional roles. Some motifs direct the folding of the RNA or stabilize the folded structure through tertiary interactions. Others bind ligands or proteins or catalyze chemical reactions. Therefore, it is desirable, starting from the RNA sequence, to be able to predict the locations of recurrent motifs in RNA molecules. Conversely, the potential occurrence of one or more known 3D RNA motifs may indicate that a genomic sequence codes for a structured RNA molecule. To identify known RNA structural motifs in new RNA sequences, precise structure-based definitions are needed that specify the core nucleotides of each motif and their conserved interactions. By comparing instances of each recurrent motif and applying base pair isosteriCity relations, one can identify neutral mutations that preserve its structure and function in the contexts in which it occurs.
Inheritance patterns of ATCCT repeat interruptions in spinocerebellar ataxia type 10 (SCA10) expansions.

PubMed

Landrian, Ivette; McFarland, Karen N; Liu, Jilin; Mulligan, Connie J; Rasmussen, Astrid; Ashizawa, Tetsuo

2017-01-01

Spinocerebellar ataxia type 10 (SCA10), an autosomal dominant cerebellar ataxia disorder, is caused by a non-coding ATTCT microsatellite repeat expansion in the ataxin 10 gene. In a subset of SCA10 families, the 5'-end of the repeat expansion contains a complex sequence of penta- and heptanucleotide interruption motifs which is followed by a pure tract of tandem ATCCT repeats of unknown length at its 3'-end. Intriguingly, expansions that carry these interruption motifs correlate with an epileptic seizure phenotype and are unstable despite the theory that interruptions are expected to stabilize expanded repeats. To examine the apparent contradiction of unstable, interruption-positive SCA10 expansion alleles and to determine whether the instability originates outside of the interrupted region, we sequenced approximately 1 kb of the 5'-end of SCA10 expansions using the ATCCT-PCR product in individuals across multiple generations from four SCA10 families. We found that the greatest instability within this region occurred in paternal transmissions of the allele in stretches of pure ATTCT motifs while the intervening interrupted sequences were stable. Overall, the ATCCT interruption changes by only one to three repeat units and therefore cannot account for the instability across the length of the disease allele. We conclude that the AT-rich interruptions locally stabilize the SCA10 expansion at the 5'-end but do not completely abolish instability across the entire span of the expansion. In addition, analysis of the interruption alleles across these families support a parsimonious single origin of the mutation with a shared distant ancestor.
Quantitative statistical analysis of cis-regulatory sequences in ABA/VP1- and CBF/DREB1-regulated genes of Arabidopsis.

PubMed

Suzuki, Masaharu; Ketterling, Matthew G; McCarty, Donald R

2005-09-01

We have developed a simple quantitative computational approach for objective analysis of cis-regulatory sequences in promoters of coregulated genes. The program, designated MotifFinder, identifies oligo sequences that are overrepresented in promoters of coregulated genes. We used this approach to analyze promoter sequences of Viviparous1 (VP1)/abscisic acid (ABA)-regulated genes and cold-regulated genes, respectively, of Arabidopsis (Arabidopsis thaliana). We detected significantly enriched sequences in up-regulated genes but not in down-regulated genes. This result suggests that gene activation but not repression is mediated by specific and common sequence elements in promoters. The enriched motifs include several known cis-regulatory sequences as well as previously unidentified motifs. With respect to known cis-elements, we dissected the flanking nucleotides of the core sequences of Sph element, ABA response elements (ABREs), and the C repeat/dehydration-responsive element. This analysis identified the motif variants that may correlate with qualitative and quantitative differences in gene expression. While both VP1 and cold responses are mediated in part by ABA signaling via ABREs, these responses correlate with unique ABRE variants distinguished by nucleotides flanking the ACGT core. ABRE and Sph motifs are tightly associated uniquely in the coregulated set of genes showing a strict dependence on VP1 and ABA signaling. Finally, analysis of distribution of the enriched sequences revealed a striking concentration of enriched motifs in a proximal 200-base region of VP1/ABA and cold-regulated promoters. Overall, each class of coregulated genes possesses a discrete set of the enriched motifs with unique distributions in their promoters that may account for the specificity of gene regulation.
Evidence for the Concerted Evolution between Short Linear Protein Motifs and Their Flanking Regions

PubMed Central

Chica, Claudia; Diella, Francesca; Gibson, Toby J.

2009-01-01

Background Linear motifs are short modules of protein sequences that play a crucial role in mediating and regulating many protein–protein interactions. The function of linear motifs strongly depends on the context, e.g. functional instances mainly occur inside flexible regions that are accessible for interaction. Sometimes linear motifs appear as isolated islands of conservation in multiple sequence alignments. However, they also occur in larger blocks of sequence conservation, suggesting an active role for the neighbouring amino acids. Results The evolution of regions flanking 116 functional linear motif instances was studied. The conservation of the amino acid sequence and order/disorder tendency of those regions was related to presence/absence of the instance. For the majority of the analysed instances, the pairs of sequences conserving the linear motif were also observed to maintain a similar local structural tendency and/or to have higher local sequence conservation when compared to pairs of sequences where one is missing the linear motif. Furthermore, those instances have a higher chance to co–evolve with the neighbouring residues in comparison to the distant ones. Those findings are supported by examples where the regulation of the linear motif–mediated interaction has been shown to depend on the modifications (e.g. phosphorylation) at neighbouring positions or is thought to benefit from the binding versatility of disordered regions. Conclusion The results suggest that flanking regions are relevant for linear motif–mediated interactions, both at the structural and sequence level. More interestingly, they indicate that the prediction of linear motif instances can be enriched with contextual information by performing a sequence analysis similar to the one presented here. This can facilitate the understanding of the role of these predicted instances in determining the protein function inside the broader context of the cellular network where they arise. PMID:19584925
Cryo-EM near-atomic structure of a dsRNA fungal virus shows ancient structural motifs preserved in the dsRNA viral lineage

PubMed Central

Luque, Daniel; Gómez-Blanco, Josué; Garriga, Damiá; Brilot, Axel F.; González, José M.; Havens, Wendy M.; Carrascosa, José L.; Trus, Benes L.; Verdaguer, Nuria; Ghabrial, Said A.; Castón, José R.

2014-01-01

Viruses evolve so rapidly that sequence-based comparison is not suitable for detecting relatedness among distant viruses. Structure-based comparisons suggest that evolution led to a small number of viral classes or lineages that can be grouped by capsid protein (CP) folds. Here, we report that the CP structure of the fungal dsRNA Penicillium chrysogenum virus (PcV) shows the progenitor fold of the dsRNA virus lineage and suggests a relationship between lineages. Cryo-EM structure at near-atomic resolution showed that the 982-aa PcV CP is formed by a repeated α-helical core, indicative of gene duplication despite lack of sequence similarity between the two halves. Superimposition of secondary structure elements identified a single “hotspot” at which variation is introduced by insertion of peptide segments. Structural comparison of PcV and other distantly related dsRNA viruses detected preferential insertion sites at which the complexity of the conserved α-helical core, made up of ancestral structural motifs that have acted as a skeleton, might have increased, leading to evolution of the highly varied current structures. Analyses of structural motifs only apparent after systematic structural comparisons indicated that the hallmark fold preserved in the dsRNA virus lineage shares a long (spinal) α-helix tangential to the capsid surface with the head-tailed phage and herpesvirus viral lineage. PMID:24821769
[Screening specific recognition motif of RNA-binding proteins by SELEX in combination with next-generation sequencing technique].

PubMed

Zhang, Lu; Xu, Jinhao; Ma, Jinbiao

2016-07-25

RNA-binding protein exerts important biological function by specifically recognizing RNA motif. SELEX (Systematic evolution of ligands by exponential enrichment), an in vitro selection method, can obtain consensus motif with high-affinity and specificity for many target molecules from DNA or RNA libraries. Here, we combined SELEX with next-generation sequencing to study the protein-RNA interaction in vitro. A pool of RNAs with 20 bp random sequences were transcribed by T7 promoter, and target protein was inserted into plasmid containing SBP-tag, which can be captured by streptavidin beads. Through only one cycle, the specific RNA motif can be obtained, which dramatically improved the selection efficiency. Using this method, we found that human hnRNP A1 RRMs domain (UP1 domain) bound RNA motifs containing AGG and AG sequences. The EMSA experiment indicated that hnRNP A1 RRMs could bind the obtained RNA motif. Taken together, this method provides a rapid and effective method to study the RNA binding specificity of proteins.
Discriminative motif optimization based on perceptron training

PubMed Central

Patel, Ronak Y.; Stormo, Gary D.

2014-01-01

Motivation: Generating accurate transcription factor (TF) binding site motifs from data generated using the next-generation sequencing, especially ChIP-seq, is challenging. The challenge arises because a typical experiment reports a large number of sequences bound by a TF, and the length of each sequence is relatively long. Most traditional motif finders are slow in handling such enormous amount of data. To overcome this limitation, tools have been developed that compromise accuracy with speed by using heuristic discrete search strategies or limited optimization of identified seed motifs. However, such strategies may not fully use the information in input sequences to generate motifs. Such motifs often form good seeds and can be further improved with appropriate scoring functions and rapid optimization. Results: We report a tool named discriminative motif optimizer (DiMO). DiMO takes a seed motif along with a positive and a negative database and improves the motif based on a discriminative strategy. We use area under receiver-operating characteristic curve (AUC) as a measure of discriminating power of motifs and a strategy based on perceptron training that maximizes AUC rapidly in a discriminative manner. Using DiMO, on a large test set of 87 TFs from human, drosophila and yeast, we show that it is possible to significantly improve motifs identified by nine motif finders. The motifs are generated/optimized using training sets and evaluated on test sets. The AUC is improved for almost 90% of the TFs on test sets and the magnitude of increase is up to 39%. Availability and implementation: DiMO is available at http://stormo.wustl.edu/DiMO Contact: rpatel@genetics.wustl.edu, ronakypatel@gmail.com PMID:24369152
SALAD database: a motif-based database of protein annotations for plant comparative genomics

PubMed Central

Mihara, Motohiro; Itoh, Takeshi; Izawa, Takeshi

2010-01-01

Proteins often have several motifs with distinct evolutionary histories. Proteins with similar motifs have similar biochemical properties and thus related biological functions. We constructed a unique comparative genomics database termed the SALAD database (http://salad.dna.affrc.go.jp/salad/) from plant-genome-based proteome data sets. We extracted evolutionarily conserved motifs by MEME software from 209 529 protein-sequence annotation groups selected by BLASTP from the proteome data sets of 10 species: rice, sorghum, Arabidopsis thaliana, grape, a lycophyte, a moss, 3 algae, and yeast. Similarity clustering of each protein group was performed by pairwise scoring of the motif patterns of the sequences. The SALAD database provides a user-friendly graphical viewer that displays a motif pattern diagram linked to the resulting bootstrapped dendrogram for each protein group. Amino-acid-sequence-based and nucleotide-sequence-based phylogenetic trees for motif combination alignment, a logo comparison diagram for each clade in the tree, and a Pfam-domain pattern diagram are also available. We also developed a viewer named ‘SALAD on ARRAYs’ to view arbitrary microarray data sets of paralogous genes linked to the same dendrogram in a window. The SALAD database is a powerful tool for comparing protein sequences and can provide valuable hints for biological analysis. PMID:19854933
SALAD database: a motif-based database of protein annotations for plant comparative genomics.

PubMed

Mihara, Motohiro; Itoh, Takeshi; Izawa, Takeshi

2010-01-01

Proteins often have several motifs with distinct evolutionary histories. Proteins with similar motifs have similar biochemical properties and thus related biological functions. We constructed a unique comparative genomics database termed the SALAD database (http://salad.dna.affrc.go.jp/salad/) from plant-genome-based proteome data sets. We extracted evolutionarily conserved motifs by MEME software from 209,529 protein-sequence annotation groups selected by BLASTP from the proteome data sets of 10 species: rice, sorghum, Arabidopsis thaliana, grape, a lycophyte, a moss, 3 algae, and yeast. Similarity clustering of each protein group was performed by pairwise scoring of the motif patterns of the sequences. The SALAD database provides a user-friendly graphical viewer that displays a motif pattern diagram linked to the resulting bootstrapped dendrogram for each protein group. Amino-acid-sequence-based and nucleotide-sequence-based phylogenetic trees for motif combination alignment, a logo comparison diagram for each clade in the tree, and a Pfam-domain pattern diagram are also available. We also developed a viewer named 'SALAD on ARRAYs' to view arbitrary microarray data sets of paralogous genes linked to the same dendrogram in a window. The SALAD database is a powerful tool for comparing protein sequences and can provide valuable hints for biological analysis.
Characteristic motifs for families of allergenic proteins

PubMed Central

Ivanciuc, Ovidiu; Garcia, Tzintzuni; Torres, Miguel; Schein, Catherine H.; Braun, Werner

2008-01-01

The identification of potential allergenic proteins is usually done by scanning a database of allergenic proteins and locating known allergens with a high sequence similarity. However, there is no universally accepted cut-off value for sequence similarity to indicate potential IgE cross-reactivity. Further, overall sequence similarity may be less important than discrete areas of similarity in proteins with homologous structure. To identify such areas, we first classified all allergens and their subdomains in the Structural Database of Allergenic Proteins (SDAP, http://fermi.utmb.edu/SDAP/) to their closest protein families as defined in Pfam, and identified conserved physicochemical property motifs characteristic of each group of sequences. Allergens populate only a small subset of all known Pfam families, as all allergenic proteins in SDAP could be grouped to only 130 (of 9318 total) Pfams, and 31 families contain more than four allergens. Conserved physicochemical property motifs for the aligned sequences of the most populated Pfam families were identified with the PCPMer program suite and catalogued in the webserver Motif-Mate (http://born.utmb.edu/motifmate/summary.php). We also determined specific motifs for allergenic members of a family that could distinguish them from non-allergenic ones. These allergen specific motifs should be most useful in database searches for potential allergens. We found that sequence motifs unique to the allergens in three families (seed storage proteins, Bet v 1, and tropomyosin) overlap with known IgE epitopes, thus providing evidence that our motif based approach can be used to assess the potential allergenicity of novel proteins. PMID:18951633
Toxic and nontoxic components of botulinum neurotoxin complex are evolved from a common ancestral zinc protein

DOE Office of Scientific and Technical Information (OSTI.GOV)

Inui, Ken; Japan Society for the Promotion of Science, 1-8 Chiyoda-ku, Tokyo 102-8472; Sagane, Yoshimasa

2012-03-16

Highlights: Black-Right-Pointing-Pointer BoNT and NTNHA proteins share a similar protein architecture. Black-Right-Pointing-Pointer NTNHA and BoNT were both identified as zinc-binding proteins. Black-Right-Pointing-Pointer NTNHA does not have a classical HEXXH zinc-coordinating motif similar to that found in all serotypes of BoNT. Black-Right-Pointing-Pointer Homology modeling implied probable key residues involved in zinc coordination. -- Abstract: Zinc atoms play an essential role in a number of enzymes. Botulinum neurotoxin (BoNT), the most potent toxin known in nature, is a zinc-dependent endopeptidase. Here we identify the nontoxic nonhemagglutinin (NTNHA), one of the BoNT-complex constituents, as a zinc-binding protein, along with BoNT. A protein structuremore » classification database search indicated that BoNT and NTNHA share a similar domain architecture, comprising a zinc-dependent metalloproteinase-like, BoNT coiled-coil motif and concanavalin A-like domains. Inductively coupled plasma-mass spectrometry analysis demonstrated that every single NTNHA molecule contains a single zinc atom. This is the first demonstration of a zinc atom in this protein, as far as we know. However, the NTNHA molecule does not possess any known zinc-coordinating motif, whereas all BoNT serotypes possess the classical HEXXH motif. Homology modeling of the NTNHA structure implied that a consensus K-C-L-I-K-X{sub 35}-D sequence common among all NTNHA serotype molecules appears to coordinate a single zinc atom. These findings lead us to propose that NTNHA and BoNT may have evolved distinct functional specializations following their branching out from a common ancestral zinc protein.« less
Characteristics common to a cytokine family spanning five orders of insects.

PubMed

Matsumoto, Hitoshi; Tsuzuki, Seiji; Date-Ito, Atsuko; Ohnishi, Atsushi; Hayakawa, Yoichi

2012-06-01

Growth-blocking peptide (GBP) is a member of an insect cytokine family with diverse functions including growth and immunity controls. Members of this cytokine family have been reported in 15 species of Lepidoptera, and we have recently identified GBP-like peptides in Diptera such as Lucilia cuprina and Drosophila melanogaster, indicating that this peptide family is not specific to Lepidoptera. In order to extend our knowledge of this peptide family, we purified the same family peptide from one of the tenebrionids, Zophobas atratus,(1) isolated its cDNA, and sequenced it. The Z. atratus GBP sequence together with reported sequence data of peptides from the same family enabled us to perform BLAST searches against EST and genome databases of several insect species including Coleoptera, Diptera, Hymenoptera, and Hemiptera and identify homologous peptide genes. Here we report conserved structural features in these sequence data. They consist of 19-30 amino acid residues encoded at the C terminus of a 73-152 amino acid precursor and contain the motif C-x(2)-G-x(4,6)-G-x(1,2)-C-[KR], which shares a certain similarity with the motif in the mammalian EGF peptide family. These data indicate that these small cytokines belonging to one family are present in at least five insect orders. Copyright © 2012 Elsevier Ltd. All rights reserved.
Isolation, Cloning, and Expression of an Acid Phosphatase Containing Phosphotyrosyl Phosphatase Activity from Prevotella intermedia

PubMed Central

Chen, Xiaochi; Ansai, Toshihiro; Awano, Shuji; Iida, Toshiya; Barik, Sailen; Takehara, Tadamichi

1999-01-01

A novel acid phosphatase containing phosphotyrosyl phosphatase (PTPase) activity, designated PiACP, from Prevotella intermedia ATCC 25611, an anaerobe implicated in progressive periodontal disease, has been purified and characterized. PiACP, a monomer with an apparent molecular mass of 30 kDa, did not require divalent metal cations for activity and was sensitive to orthovanadate but highly resistant to okadaic acid. The enzyme exhibited substantial activity against tyrosine phosphate-containing peptides derived from the epidermal growth factor receptor. On the basis of N-terminal and internal amino acid sequences of purified PiACP, the gene coding for PiACP was isolated and sequenced. The PiACP gene consisted of 792 bp and coded for a basic protein with an Mr of 29,164. The deduced amino acid sequence exhibited striking similarity (25 to 64%) to those of members of class A bacterial acid phosphatases, including PhoC of Morganella morganii, and involved a conserved phosphatase sequence motif that is shared among several lipid phosphatases and the mammalian glucose-6-phosphatases. The highly conservative motif HCXAGXXR in the active domain of PTPase was not found in PiACP. Mutagenesis of recombinant PiACP showed that His-170 and His-209 were essential for activity. Thus, the class A bacterial acid phosphatases including PiACP may function as atypical PTPases, the biological functions of which remain to be determined. PMID:10559178
Physical-chemical property based sequence motifs and methods regarding same

DOEpatents

Braun, Werner [Friendswood, TX; Mathura, Venkatarajan S [Sarasota, FL; Schein, Catherine H [Friendswood, TX

2008-09-09

A data analysis system, program, and/or method, e.g., a data mining/data exploration method, using physical-chemical property motifs. For example, a sequence database may be searched for identifying segments thereof having physical-chemical properties similar to the physical-chemical property motifs.
Motif types, motif locations and base composition patterns around the RNA polyadenylation site in microorganisms, plants and animals

PubMed Central

2014-01-01

Background The polyadenylation of RNA is critical for gene functioning, but the conserved sequence motifs (often called signal or signature motifs), motif locations and abundances, and base composition patterns around mRNA polyadenylation [poly(A)] sites are still uncharacterized in most species. The evolutionary tendency for poly(A) site selection is still largely unknown. Results We analyzed the poly(A) site regions of 31 species or phyla. Different groups of species showed different poly(A) signal motifs: UUACUU at the poly(A) site in the parasite Trypanosoma cruzi; UGUAAC (approximately 13 bases upstream of the site) in the alga Chlamydomonas reinhardtii; UGUUUG (or UGUUUGUU) at mainly the fourth base downstream of the poly(A) site in the parasite Blastocystis hominis; and AAUAAA at approximately 16 bases and approximately 19 bases upstream of the poly(A) site in animals and plants, respectively. Polyadenylation signal motifs are usually several hundred times more abundant around poly(A) sites than in whole genomes. These predominant motifs usually had very specific locations, whether upstream of, at, or downstream of poly(A) sites, depending on the species or phylum. The poly(A) site was usually an adenosine (A) in all analyzed species except for B. hominis, and there was weak A predominance in C. reinhardtii. Fungi, animals, plants, and the protist Phytophthora infestans shared a general base abundance pattern (or base composition pattern) of “U-rich—A-rich—U-rich—Poly(A) site—U-rich regions”, or U-A-U-A-U for short, with some variation for each kingdom or subkingdom. Conclusion This study identified the poly(A) signal motifs, motif locations, and base composition patterns around mRNA poly(A) sites in protists, fungi, plants, and animals and provided insight into poly(A) site evolution. PMID:25052519
Primary structure and cellular localization of chicken brain myosin-V (p190), an unconventional myosin with calmodulin light chains

PubMed Central

1992-01-01

Recent biochemical studies of p190, a calmodulin (CM)-binding protein purified from vertebrate brain, have demonstrated that this protein, purified as a complex with bound CM, shares a number of properties with myosins (Espindola, F. S., E. M. Espreafico, M. V. Coelho, A. R. Martins, F. R. C. Costa, M. S. Mooseker, and R. E. Larson. 1992. J. Cell Biol. 118:359-368). To determine whether or not p190 was a member of the myosin family of proteins, a set of overlapping cDNAs encoding the full-length protein sequence of chicken brain p190 was isolated and sequenced. Verification that the deduced primary structure was that of p190 was demonstrated through microsequence analysis of a cyanogen bromide peptide generated from chick brain p190. The deduced primary structure of chicken brain p190 revealed that this 1,830-amino acid (aa) 212,509-D) protein is a member of a novel structural class of unconventional myosins that includes the gene products encoded by the dilute locus of mouse and the MYO2 gene of Saccharomyces cerevisiae. We have named the p190-CM complex "myosin-V" based on the results of a detailed sequence comparison of the head domains of 29 myosin heavy chains (hc), which has revealed that this myosin, based on head structure, is the fifth of six distinct structural classes of myosin to be described thus far. Like the presumed products of the mouse dilute and yeast MYO2 genes, the head domain of chicken myosin-V hc (aa 1-764) is linked to a "neck" domain (aa 765-909) consisting of six tandem repeats of an approximately 23-aa "IQ-motif." All known myosins contain at least one such motif at their head-tail junctions; these IQ-motifs may function as calmodulin or light chain binding sites. The tail domain of chicken myosin-V consists of an initial 511 aa predicted to form several segments of coiled-coil alpha helix followed by a terminal 410-aa globular domain (aa, 1,421-1,830). Interestingly, a portion of the tail domain (aa, 1,094-1,830) shares 58% amino acid sequence identity with a 723-aa protein from mouse brain reported to be a glutamic acid decarboxylase. The neck region of chicken myosin-V, which contains the IQ-motifs, was demonstrated to contain the binding sites for CM by analyzing CM binding to bacterially expressed fusion proteins containing the head, neck, and tail domains. Immunolocalization of myosin-V in brain and in cultured cells revealed an unusual distribution for this myosin in both neurons and nonneuronal cells.(ABSTRACT TRUNCATED AT 400 WORDS) PMID:1469047
SCOPE: a web server for practical de novo motif discovery.

PubMed

Carlson, Jonathan M; Chakravarty, Arijit; DeZiel, Charles E; Gross, Robert H

2007-07-01

SCOPE is a novel parameter-free method for the de novo identification of potential regulatory motifs in sets of coordinately regulated genes. The SCOPE algorithm combines the output of three component algorithms, each designed to identify a particular class of motifs. Using an ensemble learning approach, SCOPE identifies the best candidate motifs from its component algorithms. In tests on experimentally determined datasets, SCOPE identified motifs with a significantly higher level of accuracy than a number of other web-based motif finders run with their default parameters. Because SCOPE has no adjustable parameters, the web server has an intuitive interface, requiring only a set of gene names or FASTA sequences and a choice of species. The most significant motifs found by SCOPE are displayed graphically on the main results page with a table containing summary statistics for each motif. Detailed motif information, including the sequence logo, PWM, consensus sequence and specific matching sites can be viewed through a single click on a motif. SCOPE's efficient, parameter-free search strategy has enabled the development of a web server that is readily accessible to the practising biologist while providing results that compare favorably with those of other motif finders. The SCOPE web server is at .

Identification of GATC- and CCGG- recognizing Type II REases and their putative specificity-determining positions using Scan2S—a novel motif scan algorithm with optional secondary structure constraints

PubMed Central

Niv, Masha Y.; Skrabanek, Lucy; Roberts, Richard J.; Scheraga, Harold A.; Weinstein, Harel

2008-01-01

Restriction endonucleases (REases) are DNA-cleaving enzymes that have become indispensable tools in molecular biology. Type II REases are highly divergent in sequence despite their common structural core, function and, in some cases, common specificities towards DNA sequences. This makes it difficult to identify and classify them functionally based on sequence, and has hampered the efforts of specificity-engineering. Here, we define novel REase sequence motifs, which extend beyond the PD-(D/E)XK hallmark, and incorporate secondary structure information. The automated search using these motifs is carried out with a newly developed fast regular expression matching algorithm that accommodates long patterns with optional secondary structure constraints. Using this new tool, named Scan2S, motifs derived from REases with specificity towards GATC- and CGGG-containing DNA sequences successfully identify REases of the same specificity. Notably, some of these sequences are not identified by standard sequence detection tools. The new motifs highlight potential specificity-determining positions that do not fully overlap for the GATC- and the CCGG-recognizing REases and are candidates for specificity re-engineering. PMID:17972284
Identification of GATC- and CCGG-recognizing Type II REases and their putative specificity-determining positions using Scan2S--a novel motif scan algorithm with optional secondary structure constraints.

PubMed

Niv, Masha Y; Skrabanek, Lucy; Roberts, Richard J; Scheraga, Harold A; Weinstein, Harel

2008-05-01

Restriction endonucleases (REases) are DNA-cleaving enzymes that have become indispensable tools in molecular biology. Type II REases are highly divergent in sequence despite their common structural core, function and, in some cases, common specificities towards DNA sequences. This makes it difficult to identify and classify them functionally based on sequence, and has hampered the efforts of specificity-engineering. Here, we define novel REase sequence motifs, which extend beyond the PD-(D/E)XK hallmark, and incorporate secondary structure information. The automated search using these motifs is carried out with a newly developed fast regular expression matching algorithm that accommodates long patterns with optional secondary structure constraints. Using this new tool, named Scan2S, motifs derived from REases with specificity towards GATC- and CGGG-containing DNA sequences successfully identify REases of the same specificity. Notably, some of these sequences are not identified by standard sequence detection tools. The new motifs highlight potential specificity-determining positions that do not fully overlap for the GATC- and the CCGG-recognizing REases and are candidates for specificity re-engineering.
Methods and statistics for combining motif match scores.

PubMed

Bailey, T L; Gribskov, M

1998-01-01

Position-specific scoring matrices are useful for representing and searching for protein sequence motifs. A sequence family can often be described by a group of one or more motifs, and an effective search must combine the scores for matching a sequence to each of the motifs in the group. We describe three methods for combining match scores and estimating the statistical significance of the combined scores and evaluate the search quality (classification accuracy) and the accuracy of the estimate of statistical significance of each. The three methods are: 1) sum of scores, 2) sum of reduced variates, 3) product of score p-values. We show that method 3) is superior to the other two methods in both regards, and that combining motif scores indeed gives better search accuracy. The MAST sequence homology search algorithm utilizing the product of p-values scoring method is available for interactive use and downloading at URL http:/(/)www.sdsc.edu/MEME.
Fast social-like learning of complex behaviors based on motor motifs.

PubMed

Calvo Tapia, Carlos; Tyukin, Ivan Y; Makarov, Valeri A

2018-05-01

Social learning is widely observed in many species. Less experienced agents copy successful behaviors exhibited by more experienced individuals. Nevertheless, the dynamical mechanisms behind this process remain largely unknown. Here we assume that a complex behavior can be decomposed into a sequence of n motor motifs. Then a neural network capable of activating motor motifs in a given sequence can drive an agent. To account for (n-1)! possible sequences of motifs in a neural network, we employ the winnerless competition approach. We then consider a teacher-learner situation: one agent exhibits a complex movement, while another one aims at mimicking the teacher's behavior. Despite the huge variety of possible motif sequences we show that the learner, equipped with the provided learning model, can rewire "on the fly" its synaptic couplings in no more than (n-1) learning cycles and converge exponentially to the durations of the teacher's motifs. We validate the learning model on mobile robots. Experimental results show that the learner is indeed capable of copying the teacher's behavior composed of six motor motifs in a few learning cycles. The reported mechanism of learning is general and can be used for replicating different functions, including, for example, sound patterns or speech.
Fast social-like learning of complex behaviors based on motor motifs

NASA Astrophysics Data System (ADS)

Calvo Tapia, Carlos; Tyukin, Ivan Y.; Makarov, Valeri A.

2018-05-01

Social learning is widely observed in many species. Less experienced agents copy successful behaviors exhibited by more experienced individuals. Nevertheless, the dynamical mechanisms behind this process remain largely unknown. Here we assume that a complex behavior can be decomposed into a sequence of n motor motifs. Then a neural network capable of activating motor motifs in a given sequence can drive an agent. To account for (n -1 )! possible sequences of motifs in a neural network, we employ the winnerless competition approach. We then consider a teacher-learner situation: one agent exhibits a complex movement, while another one aims at mimicking the teacher's behavior. Despite the huge variety of possible motif sequences we show that the learner, equipped with the provided learning model, can rewire "on the fly" its synaptic couplings in no more than (n -1 ) learning cycles and converge exponentially to the durations of the teacher's motifs. We validate the learning model on mobile robots. Experimental results show that the learner is indeed capable of copying the teacher's behavior composed of six motor motifs in a few learning cycles. The reported mechanism of learning is general and can be used for replicating different functions, including, for example, sound patterns or speech.
Resistance gene candidates identified by PCR with degenerate oligonucleotide primers map to clusters of resistance genes in lettuce.

PubMed

Shen, K A; Meyers, B C; Islam-Faridi, M N; Chin, D B; Stelly, D M; Michelmore, R W

1998-08-01

The recent cloning of genes for resistance against diverse pathogens from a variety of plants has revealed that many share conserved sequence motifs. This provides the possibility of isolating numerous additional resistance genes by polymerase chain reaction (PCR) with degenerate oligonucleotide primers. We amplified resistance gene candidates (RGCs) from lettuce with multiple combinations of primers with low degeneracy designed from motifs in the nucleotide binding sites (NBSs) of RPS2 of Arabidopsis thaliana and N of tobacco. Genomic DNA, cDNA, and bacterial artificial chromosome (BAC) clones were successfully used as templates. Four families of sequences were identified that had the same similarity to each other as to resistance genes from other species. The relationship of the amplified products to resistance genes was evaluated by several sequence and genetic criteria. The amplified products contained open reading frames with additional sequences characteristic of NBSs. Hybridization of RGCs to genomic DNA and to BAC clones revealed large numbers of related sequences. Genetic analysis demonstrated the existence of clustered multigene families for each of the four RGC sequences. This parallels classical genetic data on clustering of disease resistance genes. Two of the four families mapped to known clusters of resistance genes; these two families were therefore studied in greater detail. Additional evidence that these RGCs could be resistance genes was gained by the identification of leucine-rich repeat (LRR) regions in sequences adjoining the NBS similar to those in RPM1 and RPS2 of A. thaliana. Fluorescent in situ hybridization confirmed the clustered genomic distribution of these sequences. The use of PCR with degenerate oligonucleotide primers is therefore an efficient method to identify numerous RGCs in plants.
Recurring sequence-structure motifs in (βα)8-barrel proteins and experimental optimization of a chimeric protein designed based on such motifs.

PubMed

Wang, Jichao; Zhang, Tongchuan; Liu, Ruicun; Song, Meilin; Wang, Juncheng; Hong, Jiong; Chen, Quan; Liu, Haiyan

2017-02-01

An interesting way of generating novel artificial proteins is to combine sequence motifs from natural proteins, mimicking the evolutionary path suggested by natural proteins comprising recurring motifs. We analyzed the βα and αβ modules of TIM barrel proteins by structure alignment-based sequence clustering. A number of preferred motifs were identified. A chimeric TIM was designed by using recurring elements as mutually compatible interfaces. The foldability of the designed TIM protein was then significantly improved by six rounds of directed evolution. The melting temperature has been improved by more than 20°C. A variety of characteristics suggested that the resulting protein is well-folded. Our analysis provided a library of peptide motifs that is potentially useful for different protein engineering studies. The protein engineering strategy of using recurring motifs as interfaces to connect partial natural proteins may be applied to other protein folds. Copyright © 2016 Elsevier B.V. All rights reserved.
RNA motif search with data-driven element ordering.

PubMed

Rampášek, Ladislav; Jimenez, Randi M; Lupták, Andrej; Vinař, Tomáš; Brejová, Broňa

2016-05-18

In this paper, we study the problem of RNA motif search in long genomic sequences. This approach uses a combination of sequence and structure constraints to uncover new distant homologs of known functional RNAs. The problem is NP-hard and is traditionally solved by backtracking algorithms. We have designed a new algorithm for RNA motif search and implemented a new motif search tool RNArobo. The tool enhances the RNAbob descriptor language, allowing insertions in helices, which enables better characterization of ribozymes and aptamers. A typical RNA motif consists of multiple elements and the running time of the algorithm is highly dependent on their ordering. By approaching the element ordering problem in a principled way, we demonstrate more than 100-fold speedup of the search for complex motifs compared to previously published tools. We have developed a new method for RNA motif search that allows for a significant speedup of the search of complex motifs that include pseudoknots. Such speed improvements are crucial at a time when the rate of DNA sequencing outpaces growth in computing. RNArobo is available at http://compbio.fmph.uniba.sk/rnarobo .
CircularLogo: A lightweight web application to visualize intra-motif dependencies.

PubMed

Ye, Zhenqing; Ma, Tao; Kalmbach, Michael T; Dasari, Surendra; Kocher, Jean-Pierre A; Wang, Liguo

2017-05-22

The sequence logo has been widely used to represent DNA or RNA motifs for more than three decades. Despite its intelligibility and intuitiveness, the traditional sequence logo is unable to display the intra-motif dependencies and therefore is insufficient to fully characterize nucleotide motifs. Many methods have been developed to quantify the intra-motif dependencies, but fewer tools are available for visualization. We developed CircularLogo, a web-based interactive application, which is able to not only visualize the position-specific nucleotide consensus and diversity but also display the intra-motif dependencies. Applying CircularLogo to HNF6 binding sites and tRNA sequences demonstrated its ability to show intra-motif dependencies and intuitively reveal biomolecular structure. CircularLogo is implemented in JavaScript and Python based on the Django web framework. The program's source code and user's manual are freely available at http://circularlogo.sourceforge.net . CircularLogo web server can be accessed from http://bioinformaticstools.mayo.edu/circularlogo/index.html . CircularLogo is an innovative web application that is specifically designed to visualize and interactively explore intra-motif dependencies.
Two alternative ways of start site selection in human norovirus reinitiation of translation.

PubMed

Luttermann, Christine; Meyers, Gregor

2014-04-25

The calicivirus minor capsid protein VP2 is expressed via termination/reinitiation. This process depends on an upstream sequence element denoted termination upstream ribosomal binding site (TURBS). We have shown for feline calicivirus and rabbit hemorrhagic disease virus that the TURBS contains three sequence motifs essential for reinitiation. Motif 1 is conserved among caliciviruses and is complementary to a sequence in the 18 S rRNA leading to the model that hybridization between motif 1 and 18 S rRNA tethers the post-termination ribosome to the mRNA. Motif 2 and motif 2* are proposed to establish a secondary structure positioning the ribosome relative to the start site of the terminal ORF. Here, we analyzed human norovirus (huNV) sequences for the presence and importance of these motifs. The three motifs were identified by sequence analyses in the region upstream of the VP2 start site, and we showed that these motifs are essential for reinitiation of huNV VP2 translation. More detailed analyses revealed that the site of reinitiation is not fixed to a single codon and does not need to be an AUG, even though this codon is clearly preferred. Interestingly, we were able to show that reinitiation can occur at AUG codons downstream of the canonical start/stop site in huNV and feline calicivirus but not in rabbit hemorrhagic disease virus. Although reinitiation at the original start site is independent of the Kozak context, downstream initiation exhibits requirements for start site sequence context known for linear scanning. These analyses on start codon recognition give a more detailed insight into this fascinating mechanism of gene expression.
Identification of sequence motifs in oligonucleotides whose presence is correlated with antisense activity

PubMed Central

Matveeva, O. V.; Tsodikov, A. D.; Giddings, M.; Freier, S. M.; Wyatt, J. R.; Spiridonov, A. N.; Shabalina, S. A.; Gesteland, R. F.; Atkins, J. F.

2000-01-01

Design of antisense oligonucleotides targeting any mRNA can be much more efficient when several activity-enhancing motifs are included and activity-decreasing motifs are avoided. This conclusion was made after statistical analysis of data collected from >1000 experiments with phosphorothioate-modified oligonucleotides. Highly significant positive correlation between the presence of motifs CCAC, TCCC, ACTC, GCCA and CTCT in the oligonucleotide and its antisense efficiency was demonstrated. In addition, negative correlation was revealed for the motifs GGGG, ACTG, AAA and TAA. It was found that the likelihood of activity of an oligonucleotide against a desired mRNA target is sequence motif content dependent. PMID:10908347
Characterization of Satellite DNA Sequences from the Commercially Important Marine Rotifers Brachionus rotundiformis and Brachionus plicatilis.

PubMed

Boehm; Gibson; Lubzens

2000-01-01

This study was initiated to search for species-specific and strain-specific satellite DNA sequences for which oligonucleotide primers could be designed to differentiate between various commercially important strains of the marine monogonont rotifers Brachionus rotundiformis and Brachionus plicatilis. Two unrelated, highly reiterated satellite sequences were cloned and characterized. The eight sequenced monomers from B. rotundiformis and six from B. plicatilis had low intrarepeat variability and were similar in their overall lengths, A + T compositions, and high degrees of repeated motif substructure. However, hybridizations to 19 representative strains, sequence characterizations, and GenBank searches indicated that these two satellites are morphotype-specific and population-specific, respectively, and share little homology to each other or to other characterized sequences in the database. Primer pairs designed for the B. rotundiformis satellite confirmed hybridization specificities on polymerase chain reaction and could serve as a useful molecular diagnostic tool to identify strains belonging to the SS morphotype, which are gaining widespread usage as first feeds for marine fish in commercial production.
Cloning and sequence analysis of Hemonchus contortus HC58cDNA.

PubMed

Muleke, Charles I; Ruofeng, Yan; Lixin, Xu; Xinwen, Bo; Xiangrui, Li

2007-06-01

The complete coding sequence of Hemonchus contortus HC58cDNA was generated by rapid amplification of cDNA ends and polymerase chain reaction using primers based on the 5' and 3' ends of the parasite mRNA, accession no. AF305964. The HC58cDNA gene was 851 bp long, with open reading frame of 717 bp, precursors to 239 amino acids coding for approximately 27 kDa protein. Analysis of amino acid sequence revealed conserved residues of cysteine, histidine, asparagine, occluding loop pattern, hemoglobinase motif and glutamine of the oxyanion hole characteristic of cathepsin B like proteases (CBL). Comparison of the predicted amino acid sequences showed the protein shared 33.5-58.7% identity to cathepsin B homologues in the papain clan CA family (family C1). Phylogenetic analysis revealed close evolutionary proximity of the protein sequence to counterpart sequences in the CBL, suggesting that HC58cDNA was a member of the papain family.
Solution structure of CEH-37 homeodomain of the nematode Caenorhabditis elegans

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moon, Sunjin; Lee, Yong Woo; Kim, Woo Taek

Highlights: •We have determined solution structures of CEH-37 homedomain. •CEH-37 HD has a compact α-helical structure with HTH DNA binding motif. •Solution structure of CEH-37 HD shares its molecular topology with that of the homeodomain proteins. •Residues in the N-terminal region and HTH motif are important in binding to Caenorhabditis elegans telomeric DNA. •CEH-37 could play an important role in telomere function via DNA binding. -- Abstract: The nematode Caenorhabditis elegans protein CEH-37 belongs to the paired OTD/OTX family of homeobox-containing homeodomain proteins. CEH-37 shares sequence similarity with homeodomain proteins, although it specifically binds to double-stranded C. elegans telomeric DNA,more » which is unusual to homeodomain proteins. Here, we report the solution structure of CEH-37 homeodomain and molecular interaction with double-stranded C. elegans telomeric DNA using nuclear magnetic resonance (NMR) spectroscopy. NMR structure shows that CEH-37 homeodomain is composed of a flexible N-terminal region and three α-helices with a helix-turn-helix (HTH) DNA binding motif. Data from size-exclusion chromatography and fluorescence spectroscopy reveal that CEH-37 homeodomain interacts strongly with double-stranded C. elegans telomeric DNA. NMR titration experiments identified residues responsible for specific binding to nematode double-stranded telomeric DNA. These results suggest that C. elegans homeodomain protein, CEH-37 could play an important role in telomere function via DNA binding.« less
Molecular Basis of the Binding of YAP Transcriptional Regulator to the ErbB4 Receptor Tyrosine Kinase

PubMed Central

Schuchardt, Brett J.; Bhat, Vikas; Mikles, David C.; McDonald, Caleb B.; Sudol, Marius; Farooq, Amjad

2014-01-01

The newly discovered transactivation function of ErbB4 receptor tyrosine kinase is believed to be mediated by virtue of the ability of its proteolytically-cleaved intracellular domain (ICD) to physically associate with YAP2 transcriptional regulator. In an effort to unearth the molecular basis of YAP2-ErbB4 interaction, we have conducted a detailed biophysical analysis of the binding of WW domains of YAP2 to PPXY motifs located within the ICD of ErbB4. Our data show that the WW1 domain of YAP2 binds to PPXY motifs within the ICD in a differential manner and that this behavior is by and large replicated by the WW2 domain. Remarkably, while both WW domains absolutely require the integrity of the PPXY consensus sequence, non-consensus residues within and flanking this motif do not appear to be critical for binding. In spite of this shared mode of binding, the WW domains of YAP2 display distinct conformational dynamics in complex with PPXY motifs derived from ErbB4. Collectively, our study lends new insights into the molecular basis of a key protein-protein interaction involved in a diverse array of cellular processes. PMID:24472438
Molecular basis of the binding of YAP transcriptional regulator to the ErbB4 receptor tyrosine kinase.

PubMed

Schuchardt, Brett J; Bhat, Vikas; Mikles, David C; McDonald, Caleb B; Sudol, Marius; Farooq, Amjad

2014-06-01

The newly discovered transactivation function of ErbB4 receptor tyrosine kinase is believed to be mediated by virtue of the ability of its proteolytically-cleaved intracellular domain (ICD) to physically associate with YAP2 transcriptional regulator. In an effort to unearth the molecular basis of YAP2-ErbB4 interaction, we have conducted a detailed biophysical analysis of the binding of WW domains of YAP2 to PPXY motifs located within the ICD of ErbB4. Our data show that the WW1 domain of YAP2 binds to PPXY motifs within the ICD in a differential manner and that this behavior is by and large replicated by the WW2 domain. Remarkably, while both WW domains absolutely require the integrity of the PPXY consensus sequence, non-consensus residues within and flanking this motif do not appear to be critical for binding. In spite of this shared mode of binding, the WW domains of YAP2 display distinct conformational dynamics in complex with PPXY motifs derived from ErbB4. Collectively, our study lends new insights into the molecular basis of a key protein-protein interaction involved in a diverse array of cellular processes. Copyright © 2014 Elsevier Masson SAS. All rights reserved.
D-MATRIX: A web tool for constructing weight matrix of conserved DNA motifs

PubMed Central

Sen, Naresh; Mishra, Manoj; Khan, Feroz; Meena, Abha; Sharma, Ashok

2009-01-01

Despite considerable efforts to date, DNA motif prediction in whole genome remains a challenge for researchers. Currently the genome wide motif prediction tools required either direct pattern sequence (for single motif) or weight matrix (for multiple motifs). Although there are known motif pattern databases and tools for genome level prediction but no tool for weight matrix construction. Considering this, we developed a D-MATRIX tool which predicts the different types of weight matrix based on user defined aligned motif sequence set and motif width. For retrieval of known motif sequences user can access the commonly used databases such as TFD, RegulonDB, DBTBS, Transfac. DMATRIX program uses a simple statistical approach for weight matrix construction, which can be converted into different file formats according to user requirement. It provides the possibility to identify the conserved motifs in the coregulated genes or whole genome. As example, we successfully constructed the weight matrix of LexA transcription factor binding site with the help of known sosbox cisregulatory elements in Deinococcus radiodurans genome. The algorithm is implemented in C-Sharp and wrapped in ASP.Net to maintain a user friendly web interface. DMATRIX tool is accessible through the CIMAP domain network. Availability http://203.190.147.116/dmatrix/ PMID:19759861
D-MATRIX: a web tool for constructing weight matrix of conserved DNA motifs.

PubMed

Sen, Naresh; Mishra, Manoj; Khan, Feroz; Meena, Abha; Sharma, Ashok

2009-07-27

Despite considerable efforts to date, DNA motif prediction in whole genome remains a challenge for researchers. Currently the genome wide motif prediction tools required either direct pattern sequence (for single motif) or weight matrix (for multiple motifs). Although there are known motif pattern databases and tools for genome level prediction but no tool for weight matrix construction. Considering this, we developed a D-MATRIX tool which predicts the different types of weight matrix based on user defined aligned motif sequence set and motif width. For retrieval of known motif sequences user can access the commonly used databases such as TFD, RegulonDB, DBTBS, Transfac. D-MATRIX program uses a simple statistical approach for weight matrix construction, which can be converted into different file formats according to user requirement. It provides the possibility to identify the conserved motifs in the co-regulated genes or whole genome. As example, we successfully constructed the weight matrix of LexA transcription factor binding site with the help of known sos-box cis-regulatory elements in Deinococcus radiodurans genome. The algorithm is implemented in C-Sharp and wrapped in ASP.Net to maintain a user friendly web interface. D-MATRIX tool is accessible through the CIMAP domain network. http://203.190.147.116/dmatrix/
MOCCS: Clarifying DNA-binding motif ambiguity using ChIP-Seq data.

PubMed

Ozaki, Haruka; Iwasaki, Wataru

2016-08-01

As a key mechanism of gene regulation, transcription factors (TFs) bind to DNA by recognizing specific short sequence patterns that are called DNA-binding motifs. A single TF can accept ambiguity within its DNA-binding motifs, which comprise both canonical (typical) and non-canonical motifs. Clarification of such DNA-binding motif ambiguity is crucial for revealing gene regulatory networks and evaluating mutations in cis-regulatory elements. Although chromatin immunoprecipitation sequencing (ChIP-seq) now provides abundant data on the genomic sequences to which a given TF binds, existing motif discovery methods are unable to directly answer whether a given TF can bind to a specific DNA-binding motif. Here, we report a method for clarifying the DNA-binding motif ambiguity, MOCCS. Given ChIP-Seq data of any TF, MOCCS comprehensively analyzes and describes every k-mer to which that TF binds. Analysis of simulated datasets revealed that MOCCS is applicable to various ChIP-Seq datasets, requiring only a few minutes per dataset. Application to the ENCODE ChIP-Seq datasets proved that MOCCS directly evaluates whether a given TF binds to each DNA-binding motif, even if known position weight matrix models do not provide sufficient information on DNA-binding motif ambiguity. Furthermore, users are not required to provide numerous parameters or background genomic sequence models that are typically unavailable. MOCCS is implemented in Perl and R and is freely available via https://github.com/yuifu/moccs. By complementing existing motif-discovery software, MOCCS will contribute to the basic understanding of how the genome controls diverse cellular processes via DNA-protein interactions. Copyright © 2016 Elsevier Ltd. All rights reserved.
Discovery of phosphorylation motif mixtures in phosphoproteomics data

PubMed Central

Ritz, Anna; Shakhnarovich, Gregory; Salomon, Arthur R.; Raphael, Benjamin J.

2009-01-01

Motivation: Modification of proteins via phosphorylation is a primary mechanism for signal transduction in cells. Phosphorylation sites on proteins are determined in part through particular patterns, or motifs, present in the amino acid sequence. Results: We describe an algorithm that simultaneously discovers multiple motifs in a set of peptides that were phosphorylated by several different kinases. Such sets of peptides are routinely produced in proteomics experiments.Our motif-finding algorithm uses the principle of minimum description length to determine a mixture of sequence motifs that distinguish a foreground set of phosphopeptides from a background set of unphosphorylated peptides. We show that our algorithm outperforms existing motif-finding algorithms on synthetic datasets consisting of mixtures of known phosphorylation sites. We also derive a motif specificity score that quantifies whether or not the phosphoproteins containing an instance of a motif have a significant number of known interactions. Application of our motif-finding algorithm to recently published human and mouse proteomic studies recovers several known phosphorylation motifs and reveals a number of novel motifs that are enriched for interactions with a particular kinase or phosphatase. Our tools provide a new approach for uncovering the sequence specificities of uncharacterized kinases or phosphatases. Availability: Software is available at http:/cs.brown.edu/people/braphael/software.html. Contact: aritz@cs.brown.edu; braphael@cs.brown.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:18996944

A Bioinformatics Approach for Detecting Repetitive Nested Motifs using Pattern Matching.

PubMed

Romero, José R; Carballido, Jessica A; Garbus, Ingrid; Echenique, Viviana C; Ponzoni, Ignacio

2016-01-01

The identification of nested motifs in genomic sequences is a complex computational problem. The detection of these patterns is important to allow the discovery of transposable element (TE) insertions, incomplete reverse transcripts, deletions, and/or mutations. In this study, a de novo strategy for detecting patterns that represent nested motifs was designed based on exhaustive searches for pairs of motifs and combinatorial pattern analysis. These patterns can be grouped into three categories, motifs within other motifs, motifs flanked by other motifs, and motifs of large size. The methodology used in this study, applied to genomic sequences from the plant species Aegilops tauschii and Oryza sativa , revealed that it is possible to identify putative nested TEs by detecting these three types of patterns. The results were validated through BLAST alignments, which revealed the efficacy and usefulness of the new method, which is called Mamushka.
Identification of cancer-specific motifs in mimotope profiles of serum antibody repertoire.

PubMed

Gerasimov, Ekaterina; Zelikovsky, Alex; Măndoiu, Ion; Ionov, Yurij

2017-06-07

For fighting cancer, earlier detection is crucial. Circulating auto-antibodies produced by the patient's own immune system after exposure to cancer proteins are promising bio-markers for the early detection of cancer. Since an antibody recognizes not the whole antigen but 4-7 critical amino acids within the antigenic determinant (epitope), the whole proteome can be represented by a random peptide phage display library. This opens the possibility to develop an early cancer detection test based on a set of peptide sequences identified by comparing cancer patients' and healthy donors' global peptide profiles of antibody specificities. Due to the enormously large number of peptide sequences contained in global peptide profiles generated by next generation sequencing, the large number of cancer and control sera is required to identify cancer-specific peptides with high degree of statistical significance. To decrease the number of peptides in profiles generated by nextgen sequencing without losing cancer-specific sequences we used for generation of profiles the phage library enriched by panning on the pool of cancer sera. To further decrease the complexity of profiles we used computational methods for transforming a list of peptides constituting the mimotope profiles to the list motifs formed by similar peptide sequences. We have shown that the amino-acid order is meaningful in mimotope motifs since they contain significantly more peptides than motifs among peptides where amino-acids are randomly permuted. Also the single sample motifs significantly differ from motifs in peptides drawn from multiple samples. Finally, multiple cancer-specific motifs have been identified.
Comparative qualitative phosphoproteomics analysis identifies shared phosphorylation motifs and associated biological processes in evolutionary divergent plants.

PubMed

Al-Momani, Shireen; Qi, Da; Ren, Zhe; Jones, Andrew R

2018-06-15

Phosphorylation is one of the most prevalent post-translational modifications and plays a key role in regulating cellular processes. We carried out a bioinformatics analysis of pre-existing phosphoproteomics data, to profile two model species representing the largest subclasses in flowering plants the dicot Arabidopsis thaliana and the monocot Oryza sativa, to understand the extent to which phosphorylation signaling and function is conserved across evolutionary divergent plants. We identified 6537 phosphopeptides from 3189 phosphoproteins in Arabidopsis and 2307 phosphopeptides from 1613 phosphoproteins in rice. We identified phosphorylation motifs, finding nineteen pS motifs and two pT motifs shared in rice and Arabidopsis. The majority of shared motif-containing proteins were mapped to the same biological processes with similar patterns of fold enrichment, indicating high functional conservation. We also identified shared patterns of crosstalk between phosphoserines with enrichment for motifs pSXpS, pSXXpS and pSXXXpS, where X is any amino acid. Lastly, our results identified several pairs of motifs that are significantly enriched to co-occur in Arabidopsis proteins, indicating cross-talk between different sites, but this was not observed in rice. Our results demonstrate that there are evolutionary conserved mechanisms of phosphorylation-mediated signaling in plants, via analysis of high-throughput phosphorylation proteomics data from key monocot and dicot species: rice and Arabidposis thaliana. The results also suggest that there is increased crosstalk between phosphorylation sites in A. thaliana compared with rice. The results are important for our general understanding of cell signaling in plants, and the ability to use A. thaliana as a general model for plant biology. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.
Sequence information gain based motif analysis.

PubMed

Maynou, Joan; Pairó, Erola; Marco, Santiago; Perera, Alexandre

2015-11-09

The detection of regulatory regions in candidate sequences is essential for the understanding of the regulation of a particular gene and the mechanisms involved. This paper proposes a novel methodology based on information theoretic metrics for finding regulatory sequences in promoter regions. This methodology (SIGMA) has been tested on genomic sequence data for Homo sapiens and Mus musculus. SIGMA has been compared with different publicly available alternatives for motif detection, such as MEME/MAST, Biostrings (Bioconductor package), MotifRegressor, and previous work such Qresiduals projections or information theoretic based detectors. Comparative results, in the form of Receiver Operating Characteristic curves, show how, in 70% of the studied Transcription Factor Binding Sites, the SIGMA detector has a better performance and behaves more robustly than the methods compared, while having a similar computational time. The performance of SIGMA can be explained by its parametric simplicity in the modelling of the non-linear co-variability in the binding motif positions. Sequence Information Gain based Motif Analysis is a generalisation of a non-linear model of the cis-regulatory sequences detection based on Information Theory. This generalisation allows us to detect transcription factor binding sites with maximum performance disregarding the covariability observed in the positions of the training set of sequences. SIGMA is freely available to the public at http://b2slab.upc.edu.
Gain in Transcriptional Activity by Primate-specific Coevolution of Melanoma Antigen-A11 and Its Interaction Site in Androgen Receptor*

PubMed Central

Liu, Qiang; Su, Shifeng; Blackwelder, Amanda J.; Minges, John T.; Wilson, Elizabeth M.

2011-01-01

Male sex development and growth occur in response to high affinity androgen binding to the androgen receptor (AR). In contrast to complete amino acid sequence conservation in the AR DNA and ligand binding domains among mammals, a primate-specific difference in the AR NH2-terminal region that regulates the NH2- and carboxyl-terminal (N/C) interaction enables direct binding to melanoma antigen-A11 (MAGE-11), an AR coregulator that is also primate-specific. Human, mouse, and rat AR share the same NH2-terminal 23FQNLF27 sequence that mediates the androgen-dependent N/C interaction. However, the mouse and rat AR FXXLF motif is flanked by Ala33 that evolved to Val33 in primates. Human AR Val33 was required to interact directly with MAGE-11 and for the inhibitory effect of the AR N/C interaction on activation function 2 that was relieved by MAGE-11. The functional importance of MAGE-11 was indicated by decreased human AR regulation of an androgen-dependent endogenous gene using lentivirus short hairpin RNAs and by the greater transcriptional strength of human compared with mouse AR. MAGE-11 increased progesterone and glucocorticoid receptor activity independently of binding an FXXLF motif by interacting with p300 and p160 coactivators. We conclude that the coevolution of the AR NH2-terminal sequence and MAGE-11 expression among primates provides increased regulatory control over activation domain dominance. Primate-specific expression of MAGE-11 results in greater steroid receptor transcriptional activity through direct interactions with the human AR FXXLF motif region and indirectly through steroid receptor-associated p300 and p160 coactivators. PMID:21730049
Cis-acting elements in the promoter region of the human aldolase C gene.

PubMed

Buono, P; de Conciliis, L; Olivetta, E; Izzo, P; Salvatore, F

1993-08-16

We investigated the cis-acting sequences involved in the expression of the human aldolase C gene by transient transfections into human neuroblastoma cells (SKNBE). We demonstrate that 420 bp of the 5'-flanking DNA direct at high efficiency the transcription of the CAT reporter gene. A deletion between -420 bp and -164 bp causes a 60% decrease of CAT activity. Gel shift and DNase I footprinting analyses revealed four protected elements: A, B, C and D. Competition analyses indicate that Sp1 or factors sharing a similar sequence specificity bind to elements A and B, but not to elements C and D. Sequence analysis shows a half palindromic ERE motif (GGTCA), in elements B and D. Region D binds a transactivating factor which appears also essential to stabilize the initiation complex.
Identification of a DNA sequence motif required for expression of iron-regulated genes in pseudomonads.

PubMed

Rombel, I T; McMorran, B J; Lamont, I L

1995-02-20

Many bacteria respond to a lack of iron in the environment by synthesizing siderophores, which act as iron-scavenging compounds. Fluorescent pseudomonads synthesize strain-specific but chemically related siderophores called pyoverdines or pseudobactins. We have investigated the mechanisms by which iron controls expression of genes involved in pyoverdine metabolism in Pseudomonas aeruginosa. Transcription of these genes is repressed by the presence of iron in the growth medium. Three promoters from these genes were cloned and the activities of the promoters were dependent on the amounts of iron in the growth media. Two of the promoters were sequenced and the transcriptional start site were identified by S1 nuclease analysis. Sequences similar to the consensus binding site for the Fur repressor protein, which controls expression of iron-repressible genes in several gram-negative species, were not present in the promoters, suggesting that they are unlikely to have a high affinity for Fur. However, comparison of the promoter sequences with those of iron-regulated genes from other Pseudomonas species and also the iron-regulated exotoxin gene of P. aeruginosa allowed identification of a shared sequence element, with the consensus sequence (G/C)CTAAAT-CCC, which is likely to act as a binding site for a transcriptional activator protein. Mutations in this sequence greatly reduced the activities of the promoters characterized here as well as those of other iron-regulated promoters. The requirement for this motif in the promoters of iron-regulated genes of different Pseudomonas species indicates that similar mechanisms are likely to be involved in controlling expression of a range of iron-regulated genes in pseudomonads.
LDsplit: screening for cis-regulatory motifs stimulating meiotic recombination hotspots by analysis of DNA sequence polymorphisms.

PubMed

Yang, Peng; Wu, Min; Guo, Jing; Kwoh, Chee Keong; Przytycka, Teresa M; Zheng, Jie

2014-02-17

As a fundamental genomic element, meiotic recombination hotspot plays important roles in life sciences. Thus uncovering its regulatory mechanisms has broad impact on biomedical research. Despite the recent identification of the zinc finger protein PRDM9 and its 13-mer binding motif as major regulators for meiotic recombination hotspots, other regulators remain to be discovered. Existing methods for finding DNA sequence motifs of recombination hotspots often rely on the enrichment of co-localizations between hotspots and short DNA patterns, which ignore the cross-individual variation of recombination rates and sequence polymorphisms in the population. Our objective in this paper is to capture signals encoded in genetic variations for the discovery of recombination-associated DNA motifs. Recently, an algorithm called "LDsplit" has been designed to detect the association between single nucleotide polymorphisms (SNPs) and proximal meiotic recombination hotspots. The association is measured by the difference of population recombination rates at a hotspot between two alleles of a candidate SNP. Here we present an open source software tool of LDsplit, with integrative data visualization for recombination hotspots and their proximal SNPs. Applying LDsplit on SNPs inside an established 7-mer motif bound by PRDM9 we observed that SNP alleles preserving the original motif tend to have higher recombination rates than the opposite alleles that disrupt the motif. Running on SNP windows around hotspots each containing an occurrence of the 7-mer motif, LDsplit is able to guide the established motif finding algorithm of MEME to recover the 7-mer motif. In contrast, without LDsplit the 7-mer motif could not be identified. LDsplit is a software tool for the discovery of cis-regulatory DNA sequence motifs stimulating meiotic recombination hotspots by screening and narrowing down to hotspot associated SNPs. It is the first computational method that utilizes the genetic variation of recombination hotspots among individuals, opening a new avenue for motif finding. Tested on an established motif and simulated datasets, LDsplit shows promise to discover novel DNA motifs for meiotic recombination hotspots.
LDsplit: screening for cis-regulatory motifs stimulating meiotic recombination hotspots by analysis of DNA sequence polymorphisms

PubMed Central

2014-01-01

Background As a fundamental genomic element, meiotic recombination hotspot plays important roles in life sciences. Thus uncovering its regulatory mechanisms has broad impact on biomedical research. Despite the recent identification of the zinc finger protein PRDM9 and its 13-mer binding motif as major regulators for meiotic recombination hotspots, other regulators remain to be discovered. Existing methods for finding DNA sequence motifs of recombination hotspots often rely on the enrichment of co-localizations between hotspots and short DNA patterns, which ignore the cross-individual variation of recombination rates and sequence polymorphisms in the population. Our objective in this paper is to capture signals encoded in genetic variations for the discovery of recombination-associated DNA motifs. Results Recently, an algorithm called “LDsplit” has been designed to detect the association between single nucleotide polymorphisms (SNPs) and proximal meiotic recombination hotspots. The association is measured by the difference of population recombination rates at a hotspot between two alleles of a candidate SNP. Here we present an open source software tool of LDsplit, with integrative data visualization for recombination hotspots and their proximal SNPs. Applying LDsplit on SNPs inside an established 7-mer motif bound by PRDM9 we observed that SNP alleles preserving the original motif tend to have higher recombination rates than the opposite alleles that disrupt the motif. Running on SNP windows around hotspots each containing an occurrence of the 7-mer motif, LDsplit is able to guide the established motif finding algorithm of MEME to recover the 7-mer motif. In contrast, without LDsplit the 7-mer motif could not be identified. Conclusions LDsplit is a software tool for the discovery of cis-regulatory DNA sequence motifs stimulating meiotic recombination hotspots by screening and narrowing down to hotspot associated SNPs. It is the first computational method that utilizes the genetic variation of recombination hotspots among individuals, opening a new avenue for motif finding. Tested on an established motif and simulated datasets, LDsplit shows promise to discover novel DNA motifs for meiotic recombination hotspots. PMID:24533858
Searching RNA motifs and their intermolecular contacts with constraint networks.

PubMed

Thébault, P; de Givry, S; Schiex, T; Gaspin, C

2006-09-01

Searching RNA gene occurrences in genomic sequences is a task whose importance has been renewed by the recent discovery of numerous functional RNA, often interacting with other ligands. Even if several programs exist for RNA motif search, none exists that can represent and solve the problem of searching for occurrences of RNA motifs in interaction with other molecules. We present a constraint network formulation of this problem. RNA are represented as structured motifs that can occur on more than one sequence and which are related together by possible hybridization. The implemented tool MilPat is used to search for several sRNA families in genomic sequences. Results show that MilPat allows to efficiently search for interacting motifs in large genomic sequences and offers a simple and extensible framework to solve such problems. New and known sRNA are identified as H/ACA candidates in Methanocaldococcus jannaschii. http://carlit.toulouse.inra.fr/MilPaT/MilPat.pl.
Euglena gracilis and Trypanosomatids possess common patterns in predicted mitochondrial targeting presequences.

PubMed

Krnáčová, Katarína; Vesteg, Matej; Hampl, Vladimír; Vlček, Čestmír; Horváth, Anton

2012-10-01

Euglena gracilis possessing chloroplasts of secondary green algal origin and parasitic trypanosomatids Trypanosoma brucei, Trypanosoma cruzi and Leishmania major belong to the protist phylum Euglenozoa. Euglenozoa might be among the earliest eukaryotic branches bearing ancestral traits reminiscent of the last eukaryotic common ancestor (LECA) or missing features present in other eukaryotes. LECA most likely possessed mitochondria of endosymbiotic α-proteobacterial origin. In this study, we searched for the presence of homologs of mitochondria-targeted proteins from other organisms in the currently available EST dataset of E. gracilis. The common motifs in predicted N-terminal presequences and corresponding homologs from T. brucei, T. cruzi and L. major (if found) were analyzed. Other trypanosomatid mitochondrial protein precursor (e.g., those involved in RNA editing) were also included in the analysis. Mitochondrial presequences of E. gracilis and these trypanosomatids seem to be highly variable in sequence length (5-118 aa), but apparently share statistically significant similarities. In most cases, the common (M/L)RR motif is present at the N-terminus and it is probably responsible for recognition via import apparatus of mitochondrial outer membrane. Interestingly, this motif is present inside the predicted presequence region in some cases. In most presequences, this motif is followed by a hydrophobic region rich in alanine, leucine, and valine. In conclusion, either RR motif or arginine-rich region within hydrophobic aa-s present at the N-terminus of a preprotein can be sufficient signals for mitochondrial import irrespective of presequence length in Euglenozoa.
Human β-glucuronidase: structure, function, and application in enzyme replacement therapy.

PubMed

Naz, Huma; Islam, Asimul; Waheed, Abdul; Sly, William S; Ahmad, Faizan; Hassan, Imtaiyaz

2013-10-01

Lysosomal storage diseases occur due to incomplete metabolic degradation of macromolecules by various hydrolytic enzymes in the lysosome. Despite structural differences, most of the lysosomal enzymes share many common features including a lysosomal targeting motif and phosphotransferase recognition sites. β-Glucuronidase (GUSB) is an important lysosomal enzyme involved in the degradation of glucuronate-containing glycosaminoglycan. The deficiency of GUSB causes mucopolysaccharidosis type VII (MPSVII), leading to lysosomal storage in the brain. GUSB is a well-studied protein for its expression, sequence, structure, and function. The purpose of this review is to summarize our current understanding of sequence, structure, function, and evolution of GUSB and its lysosomal enzyme targeting. Enzyme replacement therapy reported for this protein is also discussed.
PISMA: A Visual Representation of Motif Distribution in DNA Sequences.

PubMed

Alcántara-Silva, Rogelio; Alvarado-Hermida, Moisés; Díaz-Contreras, Gibrán; Sánchez-Barrios, Martha; Carrera, Samantha; Galván, Silvia Carolina

2017-01-01

Because the graphical presentation and analysis of motif distribution can provide insights for experimental hypothesis, PISMA aims at identifying motifs on DNA sequences, counting and showing them graphically. The motif length ranges from 2 to 10 bases, and the DNA sequences range up to 10 kb. The motif distribution is shown as a bar-code-like, as a gene-map-like, and as a transcript scheme. We obtained graphical schemes of the CpG site distribution from 91 human papillomavirus genomes. Also, we present 2 analyses: one of DNA motifs associated with either methylation-resistant or methylation-sensitive CpG islands and another analysis of motifs associated with exosome RNA secretion. PISMA is developed in Java; it is executable in any type of hardware and in diverse operating systems. PISMA is freely available to noncommercial users. The English version and the User Manual are provided in Supplementary Files 1 and 2, and a Spanish version is available at www.biomedicas.unam.mx/wp-content/software/pisma.zip and www.biomedicas.unam.mx/wp-content/pdf/manual/pisma.pdf.
PISMA: A Visual Representation of Motif Distribution in DNA Sequences

PubMed Central

Alcántara-Silva, Rogelio; Alvarado-Hermida, Moisés; Díaz-Contreras, Gibrán; Sánchez-Barrios, Martha; Carrera, Samantha; Galván, Silvia Carolina

2017-01-01

Background: Because the graphical presentation and analysis of motif distribution can provide insights for experimental hypothesis, PISMA aims at identifying motifs on DNA sequences, counting and showing them graphically. The motif length ranges from 2 to 10 bases, and the DNA sequences range up to 10 kb. The motif distribution is shown as a bar-code–like, as a gene-map–like, and as a transcript scheme. Results: We obtained graphical schemes of the CpG site distribution from 91 human papillomavirus genomes. Also, we present 2 analyses: one of DNA motifs associated with either methylation-resistant or methylation-sensitive CpG islands and another analysis of motifs associated with exosome RNA secretion. Availability and Implementation: PISMA is developed in Java; it is executable in any type of hardware and in diverse operating systems. PISMA is freely available to noncommercial users. The English version and the User Manual are provided in Supplementary Files 1 and 2, and a Spanish version is available at www.biomedicas.unam.mx/wp-content/software/pisma.zip and www.biomedicas.unam.mx/wp-content/pdf/manual/pisma.pdf. PMID:28469418
CoSMoS: Conserved Sequence Motif Search in the proteome

PubMed Central

Liu, Xiao I; Korde, Neeraj; Jakob, Ursula; Leichert, Lars I

2006-01-01

Background With the ever-increasing number of gene sequences in the public databases, generating and analyzing multiple sequence alignments becomes increasingly time consuming. Nevertheless it is a task performed on a regular basis by researchers in many labs. Results We have now created a database called CoSMoS to find the occurrences and at the same time evaluate the significance of sequence motifs and amino acids encoded in the whole genome of the model organism Escherichia coli K12. We provide a precomputed set of multiple sequence alignments for each individual E. coli protein with all of its homologues in the RefSeq database. The alignments themselves, information about the occurrence of sequence motifs together with information on the conservation of each of the more than 1.3 million amino acids encoded in the E. coli genome can be accessed via the web interface of CoSMoS. Conclusion CoSMoS is a valuable tool to identify highly conserved sequence motifs, to find regions suitable for mutational studies in functional analyses and to predict important structural features in E. coli proteins. PMID:16433915
A cis-regulatory module activating transcription in the suspensor contains five cis-regulatory elements

DOE PAGES

Henry, Kelli F.; Kawashima, Tomokazu; Goldberg, Robert B.

2015-03-22

Little is known about the molecular mechanisms by which the embryo proper and suspensor of plant embryos activate specific gene sets shortly after fertilization. We analyzed the upstream region of the Scarlet Runner Bean ( Phaseolus coccineus) G564 gene in order to understand how genes are activated specifically in the suspensor during early embryo development. Previously, we showed that a 54-bp fragment of the G564 upstream region is sufficient for suspensor transcription and contains at least three required cis-regulatory sequences, including the 10-bp motif (5'-GAAAAGCGAA-3'), the 10 bp-like motif (5'-GAAAAACGAA-3'), and Region 2 motif (partial sequence 5'-TTGGT-3'). Here, we usemore » site-directed mutagenesis experiments in transgenic tobacco globularstage embryos to identify two additional cis-regulatory elements within the 54-bp cis-regulatory module that are required for G564 suspensor transcription: the Fifth motif (5'-GAGTTA-3') and a third 10-bp-related sequence (5'-GAAAACCACA-3'). Further deletion of the 54-bp fragment revealed that a 47-bp fragment containing the five motifs (the 10-bp, 10-bp-like, 10-bp-related, Region 2 and Fifth motifs) is sufficient for suspensor transcription, and represents a cis-regulatory module. A consensus sequence for each type of motif was determined by comparing motif sequences shown to activate suspensor transcription. Phylogenetic analyses suggest that the regulation of G564 is evolutionarily conserved. Lastly, a homologous cis-regulatory module was found upstream of the G564 ortholog in the Common Bean (Phaseolus vulgaris), indicating that the regulation of G564 is evolutionarily conserved in closely related bean species.« less
A cis-regulatory module activating transcription in the suspensor contains five cis-regulatory elements

DOE Office of Scientific and Technical Information (OSTI.GOV)

Henry, Kelli F.; Kawashima, Tomokazu; Goldberg, Robert B.

Little is known about the molecular mechanisms by which the embryo proper and suspensor of plant embryos activate specific gene sets shortly after fertilization. We analyzed the upstream region of the Scarlet Runner Bean ( Phaseolus coccineus) G564 gene in order to understand how genes are activated specifically in the suspensor during early embryo development. Previously, we showed that a 54-bp fragment of the G564 upstream region is sufficient for suspensor transcription and contains at least three required cis-regulatory sequences, including the 10-bp motif (5'-GAAAAGCGAA-3'), the 10 bp-like motif (5'-GAAAAACGAA-3'), and Region 2 motif (partial sequence 5'-TTGGT-3'). Here, we usemore » site-directed mutagenesis experiments in transgenic tobacco globularstage embryos to identify two additional cis-regulatory elements within the 54-bp cis-regulatory module that are required for G564 suspensor transcription: the Fifth motif (5'-GAGTTA-3') and a third 10-bp-related sequence (5'-GAAAACCACA-3'). Further deletion of the 54-bp fragment revealed that a 47-bp fragment containing the five motifs (the 10-bp, 10-bp-like, 10-bp-related, Region 2 and Fifth motifs) is sufficient for suspensor transcription, and represents a cis-regulatory module. A consensus sequence for each type of motif was determined by comparing motif sequences shown to activate suspensor transcription. Phylogenetic analyses suggest that the regulation of G564 is evolutionarily conserved. Lastly, a homologous cis-regulatory module was found upstream of the G564 ortholog in the Common Bean (Phaseolus vulgaris), indicating that the regulation of G564 is evolutionarily conserved in closely related bean species.« less
A cis-regulatory module activating transcription in the suspensor contains five cis-regulatory elements.

PubMed

Henry, Kelli F; Kawashima, Tomokazu; Goldberg, Robert B

2015-06-01

Little is known about the molecular mechanisms by which the embryo proper and suspensor of plant embryos activate specific gene sets shortly after fertilization. We analyzed the upstream region of the Scarlet Runner Bean (Phaseolus coccineus) G564 gene in order to understand how genes are activated specifically in the suspensor during early embryo development. Previously, we showed that a 54-bp fragment of the G564 upstream region is sufficient for suspensor transcription and contains at least three required cis-regulatory sequences, including the 10-bp motif (5'-GAAAAGCGAA-3'), the 10 bp-like motif (5'-GAAAAACGAA-3'), and Region 2 motif (partial sequence 5'-TTGGT-3'). Here, we use site-directed mutagenesis experiments in transgenic tobacco globular-stage embryos to identify two additional cis-regulatory elements within the 54-bp cis-regulatory module that are required for G564 suspensor transcription: the Fifth motif (5'-GAGTTA-3') and a third 10-bp-related sequence (5'-GAAAACCACA-3'). Further deletion of the 54-bp fragment revealed that a 47-bp fragment containing the five motifs (the 10-bp, 10-bp-like, 10-bp-related, Region 2 and Fifth motifs) is sufficient for suspensor transcription, and represents a cis-regulatory module. A consensus sequence for each type of motif was determined by comparing motif sequences shown to activate suspensor transcription. Phylogenetic analyses suggest that the regulation of G564 is evolutionarily conserved. A homologous cis-regulatory module was found upstream of the G564 ortholog in the Common Bean (Phaseolus vulgaris), indicating that the regulation of G564 is evolutionarily conserved in closely related bean species.
Molecular cloning and expression of two heat-shock protein genes (HSC70/HSP70) from Prenant's schizothoracin (Schizothorax prenanti).

PubMed

Li, Jiuxuan; Zhang, Haibin; Zhang, Xiuyue; Yang, Shiyong; Yan, Taiming; Song, Zhaobin

2015-04-01

Through the RT-PCR and rapid amplification of cDNA ends, two complementary deoxyribonucleic acid (cDNA) clones encoding heat-shock cognate 70 (HSC70, designated Sp-HSC70) and inducible heat-shock protein 70 (HSP70, designated Sp-HSP70) were isolated from the liver of Prenant's schizothoracin (Schizothorax prenanti). The cDNAs were 2344- and 2292-bp in length and contained 1950- and 1932-bp open reading frames, encoded proteins of 649 and 643 amino acids, respectively. Amino acid sequence analysis indicated that both Sp-HSC70 and Sp-HSP70 contained three signature sequences of HSP70 family, two partial overlapping bipartite nuclear localization signal sequences (an ATP-binding site motif, a bipartite nuclear targeting signal), and a cytoplasmic characteristic motif EEVD. Homology analysis revealed that Sp-HSC70 and Sp-HSP70 shared 77.5% identity and Sp-HSC70 shared more than 81.1% identity with the known HSC70s of other vertebrates, while Sp-HSP70 shared more than 77.5 % identity with the known HSP70s of other vertebrates. Fluorescent real-time quantitative RT-PCR showed that Sp-HSC70 and Sp-HSP70 mRNAs were found in all tested tissues, including blood, brain, heart, liver, spleen, head kidney, white muscle, skin, gonad, hypophysis, red muscle, and gill. The Sp-HSC70 and Sp-HSP70 mRNA expression level in blood and head kidney displayed a significant increase in vibrio-challenged group with the bacterium Aeromonas hydrophila at 24 h post-infection compared to a control group. Temporally, there was a clear time-dependent expression pattern of Sp-HSC70 or Sp-HSP70 gene after bacterial challenge, and the expression of Sp-HSC70 and Sp-HSP70 mRNAs reached a maximum level at 12 and 6 h post-challenge, respectively. Both returned to control level after 7 × 24 h. The results suggest that Sp-HSC70 and Sp-HSP70 genes may play important roles in mediating the immune responses of A. hydrophila-related diseases in the Prenant's schizothoracin.
Finishing and Special Motifs: Lessons Learned from CRISPR Analysis Using Next-Generation Draft Sequences (7th Annual SFAF Meeting, 2012)

ScienceCinema

Campbell, Catherine

2018-01-22

Catherine Campbell on "Finishing and Special Motifs: Lessons learned from CRISPR analysis using next-generation draft sequences" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

Finishing and Special Motifs: Lessons Learned from CRISPR Analysis Using Next-Generation Draft Sequences (7th Annual SFAF Meeting, 2012)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Campbell, Catherine

Catherine Campbell on "Finishing and Special Motifs: Lessons learned from CRISPR analysis using next-generation draft sequences" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.
Molecular characterization of the full-length L and M RNAs of Tomato yellow ring virus, a member of the genus Tospovirus.

PubMed

Chen, Tsung-Chi; Li, Ju-Ting; Fan, Ya-Shu; Yeh, Yi-Chun; Yeh, Shyi-Dong; Kormelink, Richard

2013-06-01

Tomato yellow ring virus (TYRV), first isolated from tomato in Iran, was classified as a non-approved species of the genus Tospovirus based on the characterization of its genomic S RNA. In the current study, the complete sequences of the genomic L and M RNAs of TYRV were determined and analyzed. The L RNA has 8,877 nucleotides (nt) and codes in the viral complementary (vc) strand for the putative RNA-dependent RNA polymerase (RdRp) of 2,873 amino acids (aa) (331 kDa). The RdRp of TYRV shares the highest aa sequence identity (88.7 %) with that of Iris yellow spot virus (IYSV), and contains conserved motifs shared with those of the animal-infecting bunyaviruses. The M RNA contains 4,786 nt and codes in ambisense arrangement for the NSm protein of 308 aa (34.5 kDa) in viral sense, and the Gn/Gc glycoprotein precursor (GP) of 1,310 aa (128 kDa) in vc-sense. Phylogenetic analyses indicated that TYRV is closely clustered with IYSV and Polygonum ringspot virus (PolRSV). The NSm and GP of TYRV share the highest aa sequence identity with those of IYSV and PolRSV (89.9 and 80.2-86.5 %, respectively). Moreover, the GPs of TYRV, IYSV, and PolRSV share highly similar characteristics, among which an identical deduced N-terminal protease cleavage site that is distinct from all tospoviral GPs analyzed thus far. Taken together, the elucidation of the complete genome sequence and biological features of TYRV support a close ancestral relationship with IYSV and PolRSV.
Characterization of the breakpoints of a polymorphic inversion complex detects strict and broad breakpoint reuse at the molecular level.

PubMed

Puerma, Eva; Orengo, Dorcas J; Salguero, David; Papaceit, Montserrat; Segarra, Carmen; Aguadé, Montserrat

2014-09-01

Inversions are an integral part of structural variation within species, and they play a leading role in genome reorganization across species. Work at both the cytological and genome sequence levels has revealed heterogeneity in the distribution of inversion breakpoints, with some regions being recurrently used. Breakpoint reuse at the molecular level has mostly been assessed for fixed inversions through genome sequence comparison, and therefore rather broadly. Here, we have identified and sequenced the breakpoints of two polymorphic inversions-E1 and E2 that share a breakpoint-in the extant Est and E1 + 2 chromosomal arrangements of Drosophila subobscura. The breakpoints are two medium-sized repeated motifs that mediated the inversions by two different mechanisms: E1 via staggered breaks and subsequent repair and E2 via repeat-mediated ectopic recombination. The fine delimitation of the shared breakpoint revealed its strict reuse at the molecular level regardless of which was the intermediate arrangement. The occurrence of other rearrangements in the most proximal and distal extended breakpoint regions reveals the broad reuse of these regions. This differential degree of fragility might be related to their sharing the presence outside the inverted region of snoRNA-encoding genes. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Hybrid DNA i-motif: Aminoethylprolyl-PNA (pC5) enhance the stability of DNA (dC5) i-motif structure.

PubMed

Gade, Chandrasekhar Reddy; Sharma, Nagendra K

2017-12-15

This report describes the synthesis of C-rich sequence, cytosine pentamer, of aep-PNA and its biophysical studies for the formation of hybrid DNA:aep-PNAi-motif structure with DNA cytosine pentamer (dC 5 ) under acidic pH conditions. Herein, the CD/UV/NMR/ESI-Mass studies strongly support the formation of stable hybrid DNA i-motif structure with aep-PNA even near acidic conditions. Hence aep-PNA C-rich sequence cytosine could be considered as potential DNA i-motif stabilizing agents in vivo conditions. Copyright © 2017 Elsevier Ltd. All rights reserved.
GibbsCluster: unsupervised clustering and alignment of peptide sequences.

PubMed

Andreatta, Massimo; Alvarez, Bruno; Nielsen, Morten

2017-07-03

Receptor interactions with short linear peptide fragments (ligands) are at the base of many biological signaling processes. Conserved and information-rich amino acid patterns, commonly called sequence motifs, shape and regulate these interactions. Because of the properties of a receptor-ligand system or of the assay used to interrogate it, experimental data often contain multiple sequence motifs. GibbsCluster is a powerful tool for unsupervised motif discovery because it can simultaneously cluster and align peptide data. The GibbsCluster 2.0 presented here is an improved version incorporating insertion and deletions accounting for variations in motif length in the peptide input. In basic terms, the program takes as input a set of peptide sequences and clusters them into meaningful groups. It returns the optimal number of clusters it identified, together with the sequence alignment and sequence motif characterizing each cluster. Several parameters are available to customize cluster analysis, including adjustable penalties for small clusters and overlapping groups and a trash cluster to remove outliers. As an example application, we used the server to deconvolute multiple specificities in large-scale peptidome data generated by mass spectrometry. The server is available at http://www.cbs.dtu.dk/services/GibbsCluster-2.0. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
A generic motif discovery algorithm for sequential data.

PubMed

Jensen, Kyle L; Styczynski, Mark P; Rigoutsos, Isidore; Stephanopoulos, Gregory N

2006-01-01

Motif discovery in sequential data is a problem of great interest and with many applications. However, previous methods have been unable to combine exhaustive search with complex motif representations and are each typically only applicable to a certain class of problems. Here we present a generic motif discovery algorithm (Gemoda) for sequential data. Gemoda can be applied to any dataset with a sequential character, including both categorical and real-valued data. As we show, Gemoda deterministically discovers motifs that are maximal in composition and length. As well, the algorithm allows any choice of similarity metric for finding motifs. Finally, Gemoda's output motifs are representation-agnostic: they can be represented using regular expressions, position weight matrices or any number of other models for any type of sequential data. We demonstrate a number of applications of the algorithm, including the discovery of motifs in amino acids sequences, a new solution to the (l,d)-motif problem in DNA sequences and the discovery of conserved protein substructures. Gemoda is freely available at http://web.mit.edu/bamel/gemoda
Convergent evolution and mimicry of protein linear motifs in host-pathogen interactions.

PubMed

Chemes, Lucía Beatriz; de Prat-Gay, Gonzalo; Sánchez, Ignacio Enrique

2015-06-01

Pathogen linear motif mimics are highly evolvable elements that facilitate rewiring of host protein interaction networks. Host linear motifs and pathogen mimics differ in sequence, leading to thermodynamic and structural differences in the resulting protein-protein interactions. Moreover, the functional output of a mimic depends on the motif and domain repertoire of the pathogen protein. Regulatory evolution mediated by linear motifs can be understood by measuring evolutionary rates, quantifying positive and negative selection and performing phylogenetic reconstructions of linear motif natural history. Convergent evolution of linear motif mimics is widespread among unrelated proteins from viral, prokaryotic and eukaryotic pathogens and can also take place within individual protein phylogenies. Statistics, biochemistry and laboratory models of infection link pathogen linear motifs to phenotypic traits such as tropism, virulence and oncogenicity. In vitro evolution experiments and analysis of natural sequences suggest that changes in linear motif composition underlie pathogen adaptation to a changing environment. Copyright © 2015 Elsevier Ltd. All rights reserved.
Degradation signals for ubiquitin system proteolysis in Saccharomyces cerevisiae.

PubMed Central

Gilon, T; Chomsky, O; Kulka, R G

1998-01-01

Combinations of different ubiquitin-conjugating (Ubc) enzymes and other factors constitute subsidiary pathways of the ubiquitin system, each of which ubiquitinates a specific subset of proteins. There is evidence that certain sequence elements or structural motifs of target proteins are degradation signals which mark them for ubiquitination by a particular branch of the ubiquitin system and for subsequent degradation. Our aim was to devise a way of searching systematically for degradation signals and to determine to which ubiquitin system subpathways they direct the proteins. We have constructed two reporter gene libraries based on the lacZ or URA3 genes which, in Saccharomyces cerevisiae, express fusion proteins with a wide variety of C-terminal extensions. From these, we have isolated clones producing unstable fusion proteins which are stabilized in various ubc mutants. Among these are 10 clones whose products are stabilized in ubc6, ubc7 or ubc6ubc7 double mutants. The C-terminal extensions of these clones, which vary in length from 16 to 50 amino acid residues, are presumed to contain degradation signals channeling proteins for degradation via the UBC6 and/or UBC7 subpathways of the ubiquitin system. Some of these C-terminal tails share similar sequence motifs, and a feature common to almost all of these sequences is a highly hydrophobic region such as is usually located inside globular proteins or inserted into membranes. PMID:9582269
DoOPSearch: a web-based tool for finding and analysing common conserved motifs in the promoter regions of different chordate and plant genes

PubMed Central

Sebestyén, Endre; Nagy, Tibor; Suhai, Sándor; Barta, Endre

2009-01-01

Background The comparative genomic analysis of a large number of orthologous promoter regions of the chordate and plant genes from the DoOP databases shows thousands of conserved motifs. Most of these motifs differ from any known transcription factor binding site (TFBS). To identify common conserved motifs, we need a specific tool to be able to search amongst them. Since conserved motifs from the DoOP databases are linked to genes, the result of such a search can give a list of genes that are potentially regulated by the same transcription factor(s). Results We have developed a new tool called DoOPSearch for the analysis of the conserved motifs in the promoter regions of chordate or plant genes. We used the orthologous promoters of the DoOP database to extract thousands of conserved motifs from different taxonomic groups. The advantage of this approach is that different sets of conserved motifs might be found depending on how broad the taxonomic coverage of the underlying orthologous promoter sequence collection is (consider e.g. primates vs. mammals or Brassicaceae vs. Viridiplantae). The DoOPSearch tool allows the users to search these motif collections or the promoter regions of DoOP with user supplied query sequences or any of the conserved motifs from the DoOP database. To find overrepresented gene ontologies, the gene lists obtained can be analysed further using a modified version of the GeneMerge program. Conclusion We present here a comparative genomics based promoter analysis tool. Our system is based on a unique collection of conserved promoter motifs characteristic of different taxonomic groups. We offer both a command line and a web-based tool for searching in these motif collections using user specified queries. These can be either short promoter sequences or consensus sequences of known transcription factor binding sites. The GeneMerge analysis of the search results allows the user to identify statistically overrepresented Gene Ontology terms that might provide a clue on the function of the motifs and genes. PMID:19534755
Argo_CUDA: Exhaustive GPU based approach for motif discovery in large DNA datasets.

PubMed

Vishnevsky, Oleg V; Bocharnikov, Andrey V; Kolchanov, Nikolay A

2018-02-01

The development of chromatin immunoprecipitation sequencing (ChIP-seq) technology has revolutionized the genetic analysis of the basic mechanisms underlying transcription regulation and led to accumulation of information about a huge amount of DNA sequences. There are a lot of web services which are currently available for de novo motif discovery in datasets containing information about DNA/protein binding. An enormous motif diversity makes their finding challenging. In order to avoid the difficulties, researchers use different stochastic approaches. Unfortunately, the efficiency of the motif discovery programs dramatically declines with the query set size increase. This leads to the fact that only a fraction of top "peak" ChIP-Seq segments can be analyzed or the area of analysis should be narrowed. Thus, the motif discovery in massive datasets remains a challenging issue. Argo_Compute Unified Device Architecture (CUDA) web service is designed to process the massive DNA data. It is a program for the detection of degenerate oligonucleotide motifs of fixed length written in 15-letter IUPAC code. Argo_CUDA is a full-exhaustive approach based on the high-performance GPU technologies. Compared with the existing motif discovery web services, Argo_CUDA shows good prediction quality on simulated sets. The analysis of ChIP-Seq sequences revealed the motifs which correspond to known transcription factor binding sites.
Recoding method that removes inhibitory sequences and improves HIV gene expression

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rabadan, Raul; Krasnitz, Michael; Robins, Harlan

The invention relates to inhibitory nucleotide signal sequences or "INS" sequences in the genomes of lentiviruses. In particular the invention relates to the AGG motif present in all viral genomes. The AGG motif may have an inhibitory effect on a virus, for example by reducing the levels of, or maintaining low steady-state levels of, viral RNAs in host cells, and inducing and/or maintaining in viral latency. In one aspect, the invention provides vaccines that contain, or are produced from, viral nucleic acids in which the AGG sequences have been mutated. In another aspect, the invention provides methods and compositions formore » affecting the function of the AGG motif, and methods for identifying other INS sequences in viral genomes.« less
The CGTCA sequence motif is essential for biological activity of the vasoactive intestinal peptide gene cAMP-regulated enhancer.

PubMed Central

Fink, J S; Verhave, M; Kasper, S; Tsukada, T; Mandel, G; Goodman, R H

1988-01-01

cAMP-regulated transcription of the human vasoactive intestinal peptide gene is dependent upon a 17-base-pair DNA element located 70 base pairs upstream from the transcriptional initiation site. This element is similar to sequences in other genes known to be regulated by cAMP and to sequences in several viral enhancers. We have demonstrated that the vasoactive intestinal peptide regulatory element is an enhancer that depends upon the integrity of two CGTCA sequence motifs for biological activity. Mutations in either of the CGTCA motifs diminish the ability of the element to respond to cAMP. Enhancers containing the CGTCA motif from the somatostatin and adenovirus genes compete for binding of nuclear proteins from C6 glioma and PC12 cells to the vasoactive intestinal peptide enhancer, suggesting that CGTCA-containing enhancers interact with similar transacting factors. Images PMID:2842787
Classification and assessment tools for structural motif discovery algorithms.

PubMed

Badr, Ghada; Al-Turaiki, Isra; Mathkour, Hassan

2013-01-01

Motif discovery is the problem of finding recurring patterns in biological data. Patterns can be sequential, mainly when discovered in DNA sequences. They can also be structural (e.g. when discovering RNA motifs). Finding common structural patterns helps to gain a better understanding of the mechanism of action (e.g. post-transcriptional regulation). Unlike DNA motifs, which are sequentially conserved, RNA motifs exhibit conservation in structure, which may be common even if the sequences are different. Over the past few years, hundreds of algorithms have been developed to solve the sequential motif discovery problem, while less work has been done for the structural case. In this paper, we survey, classify, and compare different algorithms that solve the structural motif discovery problem, where the underlying sequences may be different. We highlight their strengths and weaknesses. We start by proposing a benchmark dataset and a measurement tool that can be used to evaluate different motif discovery approaches. Then, we proceed by proposing our experimental setup. Finally, results are obtained using the proposed benchmark to compare available tools. To the best of our knowledge, this is the first attempt to compare tools solely designed for structural motif discovery. Results show that the accuracy of discovered motifs is relatively low. The results also suggest a complementary behavior among tools where some tools perform well on simple structures, while other tools are better for complex structures. We have classified and evaluated the performance of available structural motif discovery tools. In addition, we have proposed a benchmark dataset with tools that can be used to evaluate newly developed tools.
Limitations and potentials of current motif discovery algorithms

PubMed Central

Hu, Jianjun; Li, Bin; Kihara, Daisuke

2005-01-01

Computational methods for de novo identification of gene regulation elements, such as transcription factor binding sites, have proved to be useful for deciphering genetic regulatory networks. However, despite the availability of a large number of algorithms, their strengths and weaknesses are not sufficiently understood. Here, we designed a comprehensive set of performance measures and benchmarked five modern sequence-based motif discovery algorithms using large datasets generated from Escherichia coli RegulonDB. Factors that affect the prediction accuracy, scalability and reliability are characterized. It is revealed that the nucleotide and the binding site level accuracy are very low, while the motif level accuracy is relatively high, which indicates that the algorithms can usually capture at least one correct motif in an input sequence. To exploit diverse predictions from multiple runs of one or more algorithms, a consensus ensemble algorithm has been developed, which achieved 6–45% improvement over the base algorithms by increasing both the sensitivity and specificity. Our study illustrates limitations and potentials of existing sequence-based motif discovery algorithms. Taking advantage of the revealed potentials, several promising directions for further improvements are discussed. Since the sequence-based algorithms are the baseline of most of the modern motif discovery algorithms, this paper suggests substantial improvements would be possible for them. PMID:16284194
qPMS9: An Efficient Algorithm for Quorum Planted Motif Search

NASA Astrophysics Data System (ADS)

Nicolae, Marius; Rajasekaran, Sanguthevar

2015-01-01

Discovering patterns in biological sequences is a crucial problem. For example, the identification of patterns in DNA sequences has resulted in the determination of open reading frames, identification of gene promoter elements, intron/exon splicing sites, and SH RNAs, location of RNA degradation signals, identification of alternative splicing sites, etc. In protein sequences, patterns have led to domain identification, location of protease cleavage sites, identification of signal peptides, protein interactions, determination of protein degradation elements, identification of protein trafficking elements, discovery of short functional motifs, etc. In this paper we focus on the identification of an important class of patterns, namely, motifs. We study the (l, d) motif search problem or Planted Motif Search (PMS). PMS receives as input n strings and two integers l and d. It returns all sequences M of length l that occur in each input string, where each occurrence differs from M in at most d positions. Another formulation is quorum PMS (qPMS), where the motif appears in at least q% of the strings. We introduce qPMS9, a parallel exact qPMS algorithm that offers significant runtime improvements on DNA and protein datasets. qPMS9 solves the challenging DNA (l, d)-instances (28, 12) and (30, 13). The source code is available at https://code.google.com/p/qpms9/.
Identification of Predictive Cis-Regulatory Elements Using a Discriminative Objective Function and a Dynamic Search Space

PubMed Central

Karnik, Rahul; Beer, Michael A.

2015-01-01

The generation of genomic binding or accessibility data from massively parallel sequencing technologies such as ChIP-seq and DNase-seq continues to accelerate. Yet state-of-the-art computational approaches for the identification of DNA binding motifs often yield motifs of weak predictive power. Here we present a novel computational algorithm called MotifSpec, designed to find predictive motifs, in contrast to over-represented sequence elements. The key distinguishing feature of this algorithm is that it uses a dynamic search space and a learned threshold to find discriminative motifs in combination with the modeling of motifs using a full PWM (position weight matrix) rather than k-mer words or regular expressions. We demonstrate that our approach finds motifs corresponding to known binding specificities in several mammalian ChIP-seq datasets, and that our PWMs classify the ChIP-seq signals with accuracy comparable to, or marginally better than motifs from the best existing algorithms. In other datasets, our algorithm identifies novel motifs where other methods fail. Finally, we apply this algorithm to detect motifs from expression datasets in C. elegans using a dynamic expression similarity metric rather than fixed expression clusters, and find novel predictive motifs. PMID:26465884
Identification of Predictive Cis-Regulatory Elements Using a Discriminative Objective Function and a Dynamic Search Space.

PubMed

Karnik, Rahul; Beer, Michael A

2015-01-01

The generation of genomic binding or accessibility data from massively parallel sequencing technologies such as ChIP-seq and DNase-seq continues to accelerate. Yet state-of-the-art computational approaches for the identification of DNA binding motifs often yield motifs of weak predictive power. Here we present a novel computational algorithm called MotifSpec, designed to find predictive motifs, in contrast to over-represented sequence elements. The key distinguishing feature of this algorithm is that it uses a dynamic search space and a learned threshold to find discriminative motifs in combination with the modeling of motifs using a full PWM (position weight matrix) rather than k-mer words or regular expressions. We demonstrate that our approach finds motifs corresponding to known binding specificities in several mammalian ChIP-seq datasets, and that our PWMs classify the ChIP-seq signals with accuracy comparable to, or marginally better than motifs from the best existing algorithms. In other datasets, our algorithm identifies novel motifs where other methods fail. Finally, we apply this algorithm to detect motifs from expression datasets in C. elegans using a dynamic expression similarity metric rather than fixed expression clusters, and find novel predictive motifs.
Human MHC-II with Shared Epitope Motifs Are Optimal Epstein-Barr Virus Glycoprotein 42 Ligands-Relation to Rheumatoid Arthritis.

PubMed

Trier, Nicole; Izarzugaza, Jose; Chailyan, Anna; Marcatili, Paolo; Houen, Gunnar

2018-01-21

Rheumatoid arthritis (RA) is a chronic systemic autoimmune disorder of unknown etiology, which is characterized by inflammation in the synovium and joint damage. Although the pathogenesis of RA remains to be determined, a combination of environmental (e.g., viral infections) and genetic factors influence disease onset. Especially genetic factors play a vital role in the onset of disease, as the heritability of RA is 50-60%, with the human leukocyte antigen (HLA) alleles accounting for at least 30% of the overall genetic risk. Some HLA-DR alleles encode a conserved sequence of amino acids, referred to as the shared epitope (SE) structure. By analyzing the structure of a HLA-DR molecule in complex with Epstein-Barr virus (EBV), the SE motif is suggested to play a vital role in the interaction of MHC II with the viral glycoprotein (gp) 42, an essential entry factor for EBV. EBV has been repeatedly linked to RA by several lines of evidence and, based on several findings, we suggest that EBV is able to induce the onset of RA in predisposed SE-positive individuals, by promoting entry of B-cells through direct contact between SE and gp42 in the entry complex.
Human MHC-II with Shared Epitope Motifs Are Optimal Epstein-Barr Virus Glycoprotein 42 Ligands—Relation to Rheumatoid Arthritis

PubMed Central

Trier, Nicole; Izarzugaza, Jose; Chailyan, Anna; Marcatili, Paolo; Houen, Gunnar

2018-01-01

Rheumatoid arthritis (RA) is a chronic systemic autoimmune disorder of unknown etiology, which is characterized by inflammation in the synovium and joint damage. Although the pathogenesis of RA remains to be determined, a combination of environmental (e.g., viral infections) and genetic factors influence disease onset. Especially genetic factors play a vital role in the onset of disease, as the heritability of RA is 50–60%, with the human leukocyte antigen (HLA) alleles accounting for at least 30% of the overall genetic risk. Some HLA-DR alleles encode a conserved sequence of amino acids, referred to as the shared epitope (SE) structure. By analyzing the structure of a HLA-DR molecule in complex with Epstein-Barr virus (EBV), the SE motif is suggested to play a vital role in the interaction of MHC II with the viral glycoprotein (gp) 42, an essential entry factor for EBV. EBV has been repeatedly linked to RA by several lines of evidence and, based on several findings, we suggest that EBV is able to induce the onset of RA in predisposed SE-positive individuals, by promoting entry of B-cells through direct contact between SE and gp42 in the entry complex. PMID:29361739
Identification of 15 candidate structured noncoding RNA motifs in fungi by comparative genomics.

PubMed

Li, Sanshu; Breaker, Ronald R

2017-10-13

With the development of rapid and inexpensive DNA sequencing, the genome sequences of more than 100 fungal species have been made available. This dataset provides an excellent resource for comparative genomics analyses, which can be used to discover genetic elements, including noncoding RNAs (ncRNAs). Bioinformatics tools similar to those used to uncover novel ncRNAs in bacteria, likewise, should be useful for searching fungal genomic sequences, and the relative ease of genetic experiments with some model fungal species could facilitate experimental validation studies. We have adapted a bioinformatics pipeline for discovering bacterial ncRNAs to systematically analyze many fungal genomes. This comparative genomics pipeline integrates information on conserved RNA sequence and structural features with alternative splicing information to reveal fungal RNA motifs that are candidate regulatory domains, or that might have other possible functions. A total of 15 prominent classes of structured ncRNA candidates were identified, including variant HDV self-cleaving ribozyme representatives, atypical snoRNA candidates, and possible structured antisense RNA motifs. Candidate regulatory motifs were also found associated with genes for ribosomal proteins, S-adenosylmethionine decarboxylase (SDC), amidase, and HexA protein involved in Woronin body formation. We experimentally confirm that the variant HDV ribozymes undergo rapid self-cleavage, and we demonstrate that the SDC RNA motif reduces the expression of SAM decarboxylase by translational repression. Furthermore, we provide evidence that several other motifs discovered in this study are likely to be functional ncRNA elements. Systematic screening of fungal genomes using a computational discovery pipeline has revealed the existence of a variety of novel structured ncRNAs. Genome contexts and similarities to known ncRNA motifs provide strong evidence for the biological and biochemical functions of some newly found ncRNA motifs. Although initial examinations of several motifs provide evidence for their likely functions, other motifs will require more in-depth analysis to reveal their functions.

SMARTIV: combined sequence and structure de-novo motif discovery for in-vivo RNA binding data.

PubMed

Polishchuk, Maya; Paz, Inbal; Yakhini, Zohar; Mandel-Gutfreund, Yael

2018-05-25

Gene expression regulation is highly dependent on binding of RNA-binding proteins (RBPs) to their RNA targets. Growing evidence supports the notion that both RNA primary sequence and its local secondary structure play a role in specific Protein-RNA recognition and binding. Despite the great advance in high-throughput experimental methods for identifying sequence targets of RBPs, predicting the specific sequence and structure binding preferences of RBPs remains a major challenge. We present a novel webserver, SMARTIV, designed for discovering and visualizing combined RNA sequence and structure motifs from high-throughput RNA-binding data, generated from in-vivo experiments. The uniqueness of SMARTIV is that it predicts motifs from enriched k-mers that combine information from ranked RNA sequences and their predicted secondary structure, obtained using various folding methods. Consequently, SMARTIV generates Position Weight Matrices (PWMs) in a combined sequence and structure alphabet with assigned P-values. SMARTIV concisely represents the sequence and structure motif content as a single graphical logo, which is informative and easy for visual perception. SMARTIV was examined extensively on a variety of high-throughput binding experiments for RBPs from different families, generated from different technologies, showing consistent and accurate results. Finally, SMARTIV is a user-friendly webserver, highly efficient in run-time and freely accessible via http://smartiv.technion.ac.il/.
Onco-Regulon: an integrated database and software suite for site specific targeting of transcription factors of cancer genes

PubMed Central

Tomar, Navneet; Mishra, Akhilesh; Mrinal, Nirotpal; Jayaram, B.

2016-01-01

Transcription factors (TFs) bind at multiple sites in the genome and regulate expression of many genes. Regulating TF binding in a gene specific manner remains a formidable challenge in drug discovery because the same binding motif may be present at multiple locations in the genome. Here, we present Onco-Regulon (http://www.scfbio-iitd.res.in/software/onco/NavSite/index.htm), an integrated database of regulatory motifs of cancer genes clubbed with Unique Sequence-Predictor (USP) a software suite that identifies unique sequences for each of these regulatory DNA motifs at the specified position in the genome. USP works by extending a given DNA motif, in 5′→3′, 3′ →5′ or both directions by adding one nucleotide at each step, and calculates the frequency of each extended motif in the genome by Frequency Counter programme. This step is iterated till the frequency of the extended motif becomes unity in the genome. Thus, for each given motif, we get three possible unique sequences. Closest Sequence Finder program predicts off-target drug binding in the genome. Inclusion of DNA-Protein structural information further makes Onco-Regulon a highly informative repository for gene specific drug development. We believe that Onco-Regulon will help researchers to design drugs which will bind to an exclusive site in the genome with no off-target effects, theoretically. Database URL: http://www.scfbio-iitd.res.in/software/onco/NavSite/index.htm PMID:27515825
SA-Mot: a web server for the identification of motifs of interest extracted from protein loops

PubMed Central

Regad, Leslie; Saladin, Adrien; Maupetit, Julien; Geneix, Colette; Camproux, Anne-Claude

2011-01-01

The detection of functional motifs is an important step for the determination of protein functions. We present here a new web server SA-Mot (Structural Alphabet Motif) for the extraction and location of structural motifs of interest from protein loops. Contrary to other methods, SA-Mot does not focus only on functional motifs, but it extracts recurrent and conserved structural motifs involved in structural redundancy of loops. SA-Mot uses the structural word notion to extract all structural motifs from uni-dimensional sequences corresponding to loop structures. Then, SA-Mot provides a description of these structural motifs using statistics computed in the loop data set and in SCOP superfamily, sequence and structural parameters. SA-Mot results correspond to an interactive table listing all structural motifs extracted from a target structure and their associated descriptors. Using this information, the users can easily locate loop regions that are important for the protein folding and function. The SA-Mot web server is available at http://sa-mot.mti.univ-paris-diderot.fr. PMID:21665924
SA-Mot: a web server for the identification of motifs of interest extracted from protein loops.

PubMed

Regad, Leslie; Saladin, Adrien; Maupetit, Julien; Geneix, Colette; Camproux, Anne-Claude

2011-07-01

The detection of functional motifs is an important step for the determination of protein functions. We present here a new web server SA-Mot (Structural Alphabet Motif) for the extraction and location of structural motifs of interest from protein loops. Contrary to other methods, SA-Mot does not focus only on functional motifs, but it extracts recurrent and conserved structural motifs involved in structural redundancy of loops. SA-Mot uses the structural word notion to extract all structural motifs from uni-dimensional sequences corresponding to loop structures. Then, SA-Mot provides a description of these structural motifs using statistics computed in the loop data set and in SCOP superfamily, sequence and structural parameters. SA-Mot results correspond to an interactive table listing all structural motifs extracted from a target structure and their associated descriptors. Using this information, the users can easily locate loop regions that are important for the protein folding and function. The SA-Mot web server is available at http://sa-mot.mti.univ-paris-diderot.fr.
Mining protein loops using a structural alphabet and statistical exceptionality

PubMed Central

2010-01-01

Background Protein loops encompass 50% of protein residues in available three-dimensional structures. These regions are often involved in protein functions, e.g. binding site, catalytic pocket... However, the description of protein loops with conventional tools is an uneasy task. Regular secondary structures, helices and strands, have been widely studied whereas loops, because they are highly variable in terms of sequence and structure, are difficult to analyze. Due to data sparsity, long loops have rarely been systematically studied. Results We developed a simple and accurate method that allows the description and analysis of the structures of short and long loops using structural motifs without restriction on loop length. This method is based on the structural alphabet HMM-SA. HMM-SA allows the simplification of a three-dimensional protein structure into a one-dimensional string of states, where each state is a four-residue prototype fragment, called structural letter. The difficult task of the structural grouping of huge data sets is thus easily accomplished by handling structural letter strings as in conventional protein sequence analysis. We systematically extracted all seven-residue fragments in a bank of 93000 protein loops and grouped them according to the structural-letter sequence, named structural word. This approach permits a systematic analysis of loops of all sizes since we consider the structural motifs of seven residues rather than complete loops. We focused the analysis on highly recurrent words of loops (observed more than 30 times). Our study reveals that 73% of loop-lengths are covered by only 3310 highly recurrent structural words out of 28274 observed words). These structural words have low structural variability (mean RMSd of 0.85 Å). As expected, half of these motifs display a flanking-region preference but interestingly, two thirds are shared by short (less than 12 residues) and long loops. Moreover, half of recurrent motifs exhibit a significant level of amino-acid conservation with at least four significant positions and 87% of long loops contain at least one such word. We complement our analysis with the detection of statistically over-represented patterns of structural letters as in conventional DNA sequence analysis. About 30% (930) of structural words are over-represented, and cover about 40% of loop lengths. Interestingly, these words exhibit lower structural variability and higher sequential specificity, suggesting structural or functional constraints. Conclusions We developed a method to systematically decompose and study protein loops using recurrent structural motifs. This method is based on the structural alphabet HMM-SA and not on structural alignment and geometrical parameters. We extracted meaningful structural motifs that are found in both short and long loops. To our knowledge, it is the first time that pattern mining helps to increase the signal-to-noise ratio in protein loops. This finding helps to better describe protein loops and might permit to decrease the complexity of long-loop analysis. Detailed results are available at http://www.mti.univ-paris-diderot.fr/publication/supplementary/2009/ACCLoop/. PMID:20132552
Mining protein loops using a structural alphabet and statistical exceptionality.

PubMed

Regad, Leslie; Martin, Juliette; Nuel, Gregory; Camproux, Anne-Claude

2010-02-04

Protein loops encompass 50% of protein residues in available three-dimensional structures. These regions are often involved in protein functions, e.g. binding site, catalytic pocket... However, the description of protein loops with conventional tools is an uneasy task. Regular secondary structures, helices and strands, have been widely studied whereas loops, because they are highly variable in terms of sequence and structure, are difficult to analyze. Due to data sparsity, long loops have rarely been systematically studied. We developed a simple and accurate method that allows the description and analysis of the structures of short and long loops using structural motifs without restriction on loop length. This method is based on the structural alphabet HMM-SA. HMM-SA allows the simplification of a three-dimensional protein structure into a one-dimensional string of states, where each state is a four-residue prototype fragment, called structural letter. The difficult task of the structural grouping of huge data sets is thus easily accomplished by handling structural letter strings as in conventional protein sequence analysis. We systematically extracted all seven-residue fragments in a bank of 93000 protein loops and grouped them according to the structural-letter sequence, named structural word. This approach permits a systematic analysis of loops of all sizes since we consider the structural motifs of seven residues rather than complete loops. We focused the analysis on highly recurrent words of loops (observed more than 30 times). Our study reveals that 73% of loop-lengths are covered by only 3310 highly recurrent structural words out of 28274 observed words). These structural words have low structural variability (mean RMSd of 0.85 A). As expected, half of these motifs display a flanking-region preference but interestingly, two thirds are shared by short (less than 12 residues) and long loops. Moreover, half of recurrent motifs exhibit a significant level of amino-acid conservation with at least four significant positions and 87% of long loops contain at least one such word. We complement our analysis with the detection of statistically over-represented patterns of structural letters as in conventional DNA sequence analysis. About 30% (930) of structural words are over-represented, and cover about 40% of loop lengths. Interestingly, these words exhibit lower structural variability and higher sequential specificity, suggesting structural or functional constraints. We developed a method to systematically decompose and study protein loops using recurrent structural motifs. This method is based on the structural alphabet HMM-SA and not on structural alignment and geometrical parameters. We extracted meaningful structural motifs that are found in both short and long loops. To our knowledge, it is the first time that pattern mining helps to increase the signal-to-noise ratio in protein loops. This finding helps to better describe protein loops and might permit to decrease the complexity of long-loop analysis. Detailed results are available at http://www.mti.univ-paris-diderot.fr/publication/supplementary/2009/ACCLoop/.
Hairpin structures with conserved sequence motifs determine the 3' ends of non-polyadenylated invertebrate iridovirus transcripts.

PubMed

İnce, İkbal Agah; Pijlman, Gorben P; Vlak, Just M; van Oers, Monique M

2017-11-01

Previously, we observed that the transcripts of Invertebrate iridescent virus 6 (IIV6) are not polyadenylated, in line with the absence of canonical poly(A) motifs (AATAAA) downstream of the open reading frames (ORFs) in the genome. Here, we determined the 3' ends of the transcripts of fifty-four IIV6 virion protein genes in infected Drosophila Schneider 2 (S2) cells. By using ligation-based amplification of cDNA ends (LACE) it was shown that the IIV6 mRNAs often ended with a CAUUA motif. In silico analysis showed that the 3'-untranslated regions of IIV6 genes have the ability to form hairpin structures (22-56 nt in length) and that for about half of all IIV6 genes these 3' sequences contained complementary TAATG and CATTA motifs. We also show that a hairpin in the 3' flanking region with conserved sequence motifs is a conserved feature in invertebrate-infecting iridoviruses (genus Iridovirus and Chloriridovirus). Copyright © 2017 Elsevier Inc. All rights reserved.
Sequence, Structure, and Context Preferences of Human RNA Binding Proteins.

PubMed

Dominguez, Daniel; Freese, Peter; Alexis, Maria S; Su, Amanda; Hochman, Myles; Palden, Tsultrim; Bazile, Cassandra; Lambert, Nicole J; Van Nostrand, Eric L; Pratt, Gabriel A; Yeo, Gene W; Graveley, Brenton R; Burge, Christopher B

2018-06-07

RNA binding proteins (RBPs) orchestrate the production, processing, and function of mRNAs. Here, we present the affinity landscapes of 78 human RBPs using an unbiased assay that determines the sequence, structure, and context preferences of these proteins in vitro by deep sequencing of bound RNAs. These data enable construction of "RNA maps" of RBP activity without requiring crosslinking-based assays. We found an unexpectedly low diversity of RNA motifs, implying frequent convergence of binding specificity toward a relatively small set of RNA motifs, many with low compositional complexity. Offsetting this trend, however, we observed extensive preferences for contextual features distinct from short linear RNA motifs, including spaced "bipartite" motifs, biased flanking nucleotide composition, and bias away from or toward RNA structure. Our results emphasize the importance of contextual features in RNA recognition, which likely enable targeting of distinct subsets of transcripts by different RBPs that recognize the same linear motif. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
Small Deletion Variants Have Stable Breakpoints Commonly Associated with Alu Elements

PubMed Central

Coin, Lachlan J. M.; Steinfeld, Israel; Yakhini, Zohar; Sladek, Rob; Froguel, Philippe; Blakemore, Alexandra I. F.

2008-01-01

Copy number variants (CNVs) contribute significantly to human genomic variation, with over 5000 loci reported, covering more than 18% of the euchromatic human genome. Little is known, however, about the origin and stability of variants of different size and complexity. We investigated the breakpoints of 20 small, common deletions, representing a subset of those originally identified by array CGH, using Agilent microarrays, in 50 healthy French Caucasian subjects. By sequencing PCR products amplified using primers designed to span the deleted regions, we determined the exact size and genomic position of the deletions in all affected samples. For each deletion studied, all individuals carrying the deletion share identical upstream and downstream breakpoints at the sequence level, suggesting that the deletion event occurred just once and later became common in the population. This is supported by linkage disequilibrium (LD) analysis, which has revealed that most of the deletions studied are in moderate to strong LD with surrounding SNPs, and have conserved long-range haplotypes. Analysis of the sequences flanking the deletion breakpoints revealed an enrichment of microhomology at the breakpoint junctions. More significantly, we found an enrichment of Alu repeat elements, the overwhelming majority of which intersected deletion breakpoints at their poly-A tails. We found no enrichment of LINE elements or segmental duplications, in contrast to other reports. Sequence analysis revealed enrichment of a conserved motif in the sequences surrounding the deletion breakpoints, although whether this motif has any mechanistic role in the formation of some deletions has yet to be determined. Considered together with existing information on more complex inherited variant regions, and reports of de novo variants associated with autism, these data support the presence of different subgroups of CNV in the genome which may have originated through different mechanisms. PMID:18769679
PreCisIon: PREdiction of CIS-regulatory elements improved by gene's positION.

PubMed

Elati, Mohamed; Nicolle, Rémy; Junier, Ivan; Fernández, David; Fekih, Rim; Font, Julio; Képès, François

2013-02-01

Conventional approaches to predict transcriptional regulatory interactions usually rely on the definition of a shared motif sequence on the target genes of a transcription factor (TF). These efforts have been frustrated by the limited availability and accuracy of TF binding site motifs, usually represented as position-specific scoring matrices, which may match large numbers of sites and produce an unreliable list of target genes. To improve the prediction of binding sites, we propose to additionally use the unrelated knowledge of the genome layout. Indeed, it has been shown that co-regulated genes tend to be either neighbors or periodically spaced along the whole chromosome. This study demonstrates that respective gene positioning carries significant information. This novel type of information is combined with traditional sequence information by a machine learning algorithm called PreCisIon. To optimize this combination, PreCisIon builds a strong gene target classifier by adaptively combining weak classifiers based on either local binding sequence or global gene position. This strategy generically paves the way to the optimized incorporation of any future advances in gene target prediction based on local sequence, genome layout or on novel criteria. With the current state of the art, PreCisIon consistently improves methods based on sequence information only. This is shown by implementing a cross-validation analysis of the 20 major TFs from two phylogenetically remote model organisms. For Bacillus subtilis and Escherichia coli, respectively, PreCisIon achieves on average an area under the receiver operating characteristic curve of 70 and 60%, a sensitivity of 80 and 70% and a specificity of 60 and 56%. The newly predicted gene targets are demonstrated to be functionally consistent with previously known targets, as assessed by analysis of Gene Ontology enrichment or of the relevant literature and databases.
Evolutionary relationships in the ilarviruses: nucleotide sequence of prunus necrotic ringspot virus RNA 3.

PubMed

Sánchez-Navarro, J A; Pallás, V

1997-01-01

The complete nucleotide sequence of an isolate of prunus necrotic ringspot virus (PNRSV) RNA 3 has been determined. Elucidation of the amino acid sequence of the proteins encoded by the two large open reading frames (ORFs) allowed us to carry out comparative and phylogenetic studies on the movement (MP) and coat (CP) proteins in the ilarvirus group. Amino acid sequence comparison of the MP revealed a highly conserved basic sequence motif with an amphipathic alpha-helical structure preceding the conserved motif of the '30K superfamily' proposed by Mushegian and Koonin [26] for MP's. Within this '30K' motif a strictly conserved transmembrane domain is present in all ilarviruses sequenced so far. At the amino-terminal end, prune dwarf virus (PDV) has an extension not present in other ilarviruses but which is observed in all bromo- and cucumoviruses, suggesting a common ancestor or a recombinational event in the Bromoviridae family. Examination of the N-terminus of the CP's of all ilarviruses revealed a highly basic region, part of which resembles the Arg-rich motif that has been characterized in the RNA-binding protein family. This motif has also been found in the other members of the Bromoviridae family, suggesting its involvement in a structural function. Furthermore this region is required for infectivity in ilarviruses. The similarities found in this Arg-rich motif are discussed in terms of this process known as genome activation. Finally, phylogenetic analysis of both the MP and CP proteins revealed a higher relationship of A1MV to PNRSV, apple mosaic virus (ApMV) and PDV than any other member of the ilarvirus group. In that sense, A1MV should be considered as a true ilarvirus instead of forming a distinct group of viruses.
Finding functional features in Saccharomyces genomes by phylogenetic footprinting.

PubMed

Cliften, Paul; Sudarsanam, Priya; Desikan, Ashwin; Fulton, Lucinda; Fulton, Bob; Majors, John; Waterston, Robert; Cohen, Barak A; Johnston, Mark

2003-07-04

The sifting and winnowing of DNA sequence that occur during evolution cause nonfunctional sequences to diverge, leaving phylogenetic footprints of functional sequence elements in comparisons of genome sequences. We searched for such footprints among the genome sequences of six Saccharomyces species and identified potentially functional sequences. Comparison of these sequences allowed us to revise the catalog of yeast genes and identify sequence motifs that may be targets of transcriptional regulatory proteins. Some of these conserved sequence motifs reside upstream of genes with similar functional annotations or similar expression patterns or those bound by the same transcription factor and are thus good candidates for functional regulatory sequences.
An integrative and applicable phylogenetic footprinting framework for cis-regulatory motifs identification in prokaryotic genomes.

PubMed

Liu, Bingqiang; Zhang, Hanyuan; Zhou, Chuan; Li, Guojun; Fennell, Anne; Wang, Guanghui; Kang, Yu; Liu, Qi; Ma, Qin

2016-08-09

Phylogenetic footprinting is an important computational technique for identifying cis-regulatory motifs in orthologous regulatory regions from multiple genomes, as motifs tend to evolve slower than their surrounding non-functional sequences. Its application, however, has several difficulties for optimizing the selection of orthologous data and reducing the false positives in motif prediction. Here we present an integrative phylogenetic footprinting framework for accurate motif predictions in prokaryotic genomes (MP(3)). The framework includes a new orthologous data preparation procedure, an additional promoter scoring and pruning method and an integration of six existing motif finding algorithms as basic motif search engines. Specifically, we collected orthologous genes from available prokaryotic genomes and built the orthologous regulatory regions based on sequence similarity of promoter regions. This procedure made full use of the large-scale genomic data and taxonomy information and filtered out the promoters with limited contribution to produce a high quality orthologous promoter set. The promoter scoring and pruning is implemented through motif voting by a set of complementary predicting tools that mine as many motif candidates as possible and simultaneously eliminate the effect of random noise. We have applied the framework to Escherichia coli k12 genome and evaluated the prediction performance through comparison with seven existing programs. This evaluation was systematically carried out at the nucleotide and binding site level, and the results showed that MP(3) consistently outperformed other popular motif finding tools. We have integrated MP(3) into our motif identification and analysis server DMINDA, allowing users to efficiently identify and analyze motifs in 2,072 completely sequenced prokaryotic genomes. The performance evaluation indicated that MP(3) is effective for predicting regulatory motifs in prokaryotic genomes. Its application may enhance progress in elucidating transcription regulation mechanism, thus provide benefit to the genomic research community and prokaryotic genome researchers in particular.
RSAT 2018: regulatory sequence analysis tools 20th anniversary.

PubMed

Nguyen, Nga Thi Thuy; Contreras-Moreira, Bruno; Castro-Mondragon, Jaime A; Santana-Garcia, Walter; Ossio, Raul; Robles-Espinoza, Carla Daniela; Bahin, Mathieu; Collombet, Samuel; Vincens, Pierre; Thieffry, Denis; van Helden, Jacques; Medina-Rivera, Alejandra; Thomas-Chollier, Morgane

2018-05-02

RSAT (Regulatory Sequence Analysis Tools) is a suite of modular tools for the detection and the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, including from genome-wide datasets like ChIP-seq/ATAC-seq, (ii) motif scanning, (iii) motif analysis (quality assessment, comparisons and clustering), (iv) analysis of regulatory variations, (v) comparative genomics. Six public servers jointly support 10 000 genomes from all kingdoms. Six novel or refactored programs have been added since the 2015 NAR Web Software Issue, including updated programs to analyse regulatory variants (retrieve-variation-seq, variation-scan, convert-variations), along with tools to extract sequences from a list of coordinates (retrieve-seq-bed), to select motifs from motif collections (retrieve-matrix), and to extract orthologs based on Ensembl Compara (get-orthologs-compara). Three use cases illustrate the integration of new and refactored tools to the suite. This Anniversary update gives a 20-year perspective on the software suite. RSAT is well-documented and available through Web sites, SOAP/WSDL (Simple Object Access Protocol/Web Services Description Language) web services, virtual machines and stand-alone programs at http://www.rsat.eu/.
A novel approach to identifying regulatory motifs in distantly related genomes

PubMed Central

Van Hellemont, Ruth; Monsieurs, Pieter; Thijs, Gert; De Moor, Bart; Van de Peer, Yves; Marchal, Kathleen

2005-01-01

Although proven successful in the identification of regulatory motifs, phylogenetic footprinting methods still show some shortcomings. To assess these difficulties, most apparent when applying phylogenetic footprinting to distantly related organisms, we developed a two-step procedure that combines the advantages of sequence alignment and motif detection approaches. The results on well-studied benchmark datasets indicate that the presented method outperforms other methods when the sequences become either too long or too heterogeneous in size. PMID:16420672
Modeling of DNA local parameters predicts encrypted architectural motifs in Xenopus laevis ribosomal gene promoter.

PubMed

Roux-Rouquie, M; Marilley, M

2000-09-15

We have modeled local DNA sequence parameters to search for DNA architectural motifs involved in transcription regulation and promotion within the Xenopus laevis ribosomal gene promoter and the intergenic spacer (IGS) sequences. The IGS was found to be shaped into distinct topological domains. First, intrinsic bends split the IGS into domains of common but different helical features. Local parameters at inter-domain junctions exhibit a high variability with respect to intrinsic curvature, bendability and thermal stability. Secondly, the repeated sequence blocks of the IGS exhibit right-handed supercoiled structures which could be related to their enhancer properties. Thirdly, the gene promoter presents both inherent curvature and minor groove narrowing which may be viewed as motifs of a structural code for protein recognition and binding. Such pre-existing deformations could simply be remodeled during the binding of the transcription complex. Alternatively, these deformations could pre-shape the promoter in such a way that further remodeling is facilitated. Mutations shown to abolish promoter curvature as well as intrinsic minor groove narrowing, in a variant which maintained full transcriptional activity, bring circumstantial evidence for structurally-preorganized motifs in relation to transcription regulation and promotion. Using well documented X. laevis rDNA regulatory sequences we showed that computer modeling may be of invaluable assistance in assessing encrypted architectural motifs. The evidence of these DNA topological motifs with respect to the concept of structural code is discussed.
Authentic interdomain communication in an RNA helicase reconstituted by expressed protein ligation of two helicase domains.

PubMed

Karow, Anne R; Theissen, Bettina; Klostermeier, Dagmar

2007-01-01

RNA helicases mediate structural rearrangements of RNA or RNA-protein complexes at the expense of ATP hydrolysis. Members of the DEAD box helicase family consist of two flexibly connected helicase domains. They share nine conserved sequence motifs that are involved in nucleotide binding and hydrolysis, RNA binding, and helicase activity. Most of these motifs line the cleft between the two helicase domains, and extensive communication between them is required for RNA unwinding. The two helicase domains of the Bacillus subtilis RNA helicase YxiN were produced separately as intein fusions, and a functional RNA helicase was generated by expressed protein ligation. The ligated helicase binds adenine nucleotides with very similar affinities to the wild-type protein. Importantly, its intrinsically low ATPase activity is stimulated by RNA, and the Michaelis-Menten parameters are similar to those of the wild-type. Finally, ligated YxiN unwinds a minimal RNA substrate to an extent comparable to that of the wild-type helicase, confirming authentic interdomain communication.
Cofactor-binding sites in proteins of deviating sequence: comparative analysis and clustering in torsion angle, cavity, and fold space.

PubMed

Stegemann, Björn; Klebe, Gerhard

2012-02-01

Small molecules are recognized in protein-binding pockets through surface-exposed physicochemical properties. To optimize binding, they have to adopt a conformation corresponding to a local energy minimum within the formed protein-ligand complex. However, their conformational flexibility makes them competent to bind not only to homologous proteins of the same family but also to proteins of remote similarity with respect to the shape of the binding pockets and folding pattern. Considering drug action, such observations can give rise to unexpected and undesired cross reactivity. In this study, datasets of six different cofactors (ADP, ATP, NAD(P)(H), FAD, and acetyl CoA, sharing an adenosine diphosphate moiety as common substructure), observed in multiple crystal structures of protein-cofactor complexes exhibiting sequence identity below 25%, have been analyzed for the conformational properties of the bound ligands, the distribution of physicochemical properties in the accommodating protein-binding pockets, and the local folding patterns next to the cofactor-binding site. State-of-the-art clustering techniques have been applied to group the different protein-cofactor complexes in the different spaces. Interestingly, clustering in cavity (Cavbase) and fold space (DALI) reveals virtually the same data structuring. Remarkable relationships can be found among the different spaces. They provide information on how conformations are conserved across the host proteins and which distinct local cavity and fold motifs recognize the different portions of the cofactors. In those cases, where different cofactors are found to be accommodated in a similar fashion to the same fold motifs, only a commonly shared substructure of the cofactors is used for the recognition process. Copyright © 2011 Wiley Periodicals, Inc.
CORE-SINEs: eukaryotic short interspersed retroposing elements with common sequence motifs.

PubMed

Gilbert, N; Labuda, D

1999-03-16

A 65-bp "core" sequence is dispersed in hundreds of thousands copies in the human genome. This sequence was found to constitute the central segment of a group of short interspersed elements (SINEs), referred to as mammalian-wide interspersed repeats, that proliferated before the radiation of placental mammals. Here, we propose that the core identifies an ancient tRNA-like SINE element, which survived in different lineages such as mammals, reptiles, birds, and fish, as well as mollusks, presumably for >550 million years. This element gave rise to a number of sequence families (CORE-SINEs), including mammalian-wide interspersed repeats, whose distinct 3' ends are shared with different families of long interspersed elements (LINEs). The evolutionary success of the generic CORE-SINE element can be related to the recruitment of the internal promoter from highly transcribed host RNA as well as to its capacity to adapt to changing retropositional opportunities by sequence exchange with actively amplifying LINEs. It reinforces the notion that the very existence of SINEs depends on the cohabitation with both LINEs and the host genome.
CORE-SINEs: Eukaryotic short interspersed retroposing elements with common sequence motifs

PubMed Central

Gilbert, Nicolas; Labuda, Damian

1999-01-01

A 65-bp “core” sequence is dispersed in hundreds of thousands copies in the human genome. This sequence was found to constitute the central segment of a group of short interspersed elements (SINEs), referred to as mammalian-wide interspersed repeats, that proliferated before the radiation of placental mammals. Here, we propose that the core identifies an ancient tRNA-like SINE element, which survived in different lineages such as mammals, reptiles, birds, and fish, as well as mollusks, presumably for >550 million years. This element gave rise to a number of sequence families (CORE-SINEs), including mammalian-wide interspersed repeats, whose distinct 3′ ends are shared with different families of long interspersed elements (LINEs). The evolutionary success of the generic CORE-SINE element can be related to the recruitment of the internal promoter from highly transcribed host RNA as well as to its capacity to adapt to changing retropositional opportunities by sequence exchange with actively amplifying LINEs. It reinforces the notion that the very existence of SINEs depends on the cohabitation with both LINEs and the host genome. PMID:10077603

Transmembrane insertion of twin-arginine signal peptides is driven by TatC and regulated by TatB

PubMed Central

Fröbel, Julia; Rose, Patrick; Lausberg, Frank; Blümmel, Anne-Sophie; Freudl, Roland; Müller, Matthias

2012-01-01

The twin-arginine translocation (Tat) pathway of bacteria and plant chloroplasts mediates the transmembrane transport of folded proteins, which harbour signal sequences with a conserved twin-arginine motif. Many Tat translocases comprise the three membrane proteins TatA, TatB and TatC. TatC was previously shown to be involved in recognizing twin-arginine signal peptides. Here we show that beyond recognition, TatC mediates the transmembrane insertion of a twin-arginine signal sequence, thereby translocating the signal sequence cleavage site across the bilayer. In the absence of TatB, this can lead to the removal of the signal sequence even from a translocation-incompetent substrate. Hence interaction of twin-arginine signal peptides with TatB counteracts their premature cleavage uncoupled from translocation. This capacity of TatB is not shared by the homologous TatA protein. Collectively our results suggest that TatC is an insertase for twin-arginine signal peptides and that translocation-proficient signal sequence recognition requires the concerted action of TatC and TatB. PMID:23250441
Transmembrane insertion of twin-arginine signal peptides is driven by TatC and regulated by TatB.

PubMed

Fröbel, Julia; Rose, Patrick; Lausberg, Frank; Blümmel, Anne-Sophie; Freudl, Roland; Müller, Matthias

2012-01-01

The twin-arginine translocation (Tat) pathway of bacteria and plant chloroplasts mediates the transmembrane transport of folded proteins, which harbour signal sequences with a conserved twin-arginine motif. Many Tat translocases comprise the three membrane proteins TatA, TatB and TatC. TatC was previously shown to be involved in recognizing twin-arginine signal peptides. Here we show that beyond recognition, TatC mediates the transmembrane insertion of a twin-arginine signal sequence, thereby translocating the signal sequence cleavage site across the bilayer. In the absence of TatB, this can lead to the removal of the signal sequence even from a translocation-incompetent substrate. Hence interaction of twin-arginine signal peptides with TatB counteracts their premature cleavage uncoupled from translocation. This capacity of TatB is not shared by the homologous TatA protein. Collectively our results suggest that TatC is an insertase for twin-arginine signal peptides and that translocation-proficient signal sequence recognition requires the concerted action of TatC and TatB.
Seed storage protein gene promoters contain conserved DNA motifs in Brassicaceae, Fabaceae and Poaceae

PubMed Central

Fauteux, François; Strömvik, Martina V

2009-01-01

Background Accurate computational identification of cis-regulatory motifs is difficult, particularly in eukaryotic promoters, which typically contain multiple short and degenerate DNA sequences bound by several interacting factors. Enrichment in combinations of rare motifs in the promoter sequence of functionally or evolutionarily related genes among several species is an indicator of conserved transcriptional regulatory mechanisms. This provides a basis for the computational identification of cis-regulatory motifs. Results We have used a discriminative seeding DNA motif discovery algorithm for an in-depth analysis of 54 seed storage protein (SSP) gene promoters from three plant families, namely Brassicaceae (mustards), Fabaceae (legumes) and Poaceae (grasses) using backgrounds based on complete sets of promoters from a representative species in each family, namely Arabidopsis (Arabidopsis thaliana (L.) Heynh.), soybean (Glycine max (L.) Merr.) and rice (Oryza sativa L.) respectively. We have identified three conserved motifs (two RY-like and one ACGT-like) in Brassicaceae and Fabaceae SSP gene promoters that are similar to experimentally characterized seed-specific cis-regulatory elements. Fabaceae SSP gene promoter sequences are also enriched in a novel, seed-specific E2Fb-like motif. Conserved motifs identified in Poaceae SSP gene promoters include a GCN4-like motif, two prolamin-box-like motifs and an Skn-1-like motif. Evidence of the presence of a variant of the TATA-box is found in the SSP gene promoters from the three plant families. Motifs discovered in SSP gene promoters were used to score whole-genome sets of promoters from Arabidopsis, soybean and rice. The highest-scoring promoters are associated with genes coding for different subunits or precursors of seed storage proteins. Conclusion Seed storage protein gene promoter motifs are conserved in diverse species, and different plant families are characterized by a distinct combination of conserved motifs. The majority of discovered motifs match experimentally characterized cis-regulatory elements. These results provide a good starting point for further experimental analysis of plant seed-specific promoters and our methodology can be used to unravel more transcriptional regulatory mechanisms in plants and other eukaryotes. PMID:19843335
Nuclear Retention Elements of U3 Small Nucleolar RNA

PubMed Central

Speckmann, Wayne; Narayanan, Aarthi; Terns, Rebecca; Terns, Michael P.

1999-01-01

The processing and methylation of precursor rRNA is mediated by the box C/D small nucleolar RNAs (snoRNAs). These snoRNAs differ from most cellular RNAs in that they are not exported to the cytoplasm. Instead, these RNAs are actively retained in the nucleus where they assemble with proteins into mature small nucleolar ribonucleoprotein particles and are targeted to their intranuclear site of action, the nucleolus. In this study, we have identified the cis-acting sequences responsible for the nuclear retention of U3 box C/D snoRNA by analyzing the nucleocytoplasmic distributions of an extensive panel of U3 RNA variants after injection of the RNAs into Xenopus oocyte nuclei. Our data indicate the importance of two conserved sequence motifs in retaining U3 RNA in the nucleus. The first motif is comprised of the conserved box C′ and box D sequences that characterize the box C/D family. The second motif contains conserved box sequences B and C. Either motif is sufficient for nuclear retention, but disruption of both motifs leads to mislocalization of the RNAs to the cytoplasm. Variant RNAs that are not retained also lack 5′ cap hypermethylation and fail to associate with fibrillarin. Furthermore, our results indicate that nuclear retention of U3 RNA does not simply reflect its nucleolar localization. A fragment of U3 containing the box B/C motif is not localized to nucleoli but retained in coiled bodies. Thus, nuclear retention and nucleolar localization are distinct processes with differing sequence requirements. PMID:10567566
Interactions of HIPPI, a molecular partner of Huntingtin interacting protein HIP1, with the specific motif present at the putative promoter sequence of the caspase-1, caspase-8 and caspase-10 genes.

PubMed

Majumder, P; Choudhury, A; Banerjee, M; Lahiri, A; Bhattacharyya, N P

2007-08-01

To investigate the mechanism of increased expression of caspase-1 caused by exogenous Hippi, observed earlier in HeLa and Neuro2A cells, in this work we identified a specific motif AAAGACATG (- 101 to - 93) at the caspase-1 gene upstream sequence where HIPPI could bind. Various mutations in this specific sequence compromised the interaction, showing the specificity of the interactions. In the luciferase reporter assay, when the reporter gene was driven by caspase-1 gene upstream sequences (- 151 to - 92) with the mutation G to T at position - 98, luciferase activity was decreased significantly in green fluorescent protein-Hippi-expressing HeLa cells in comparison to that obtained with the wild-type caspase-1 gene 60 bp upstream sequence, indicating the biological significance of such binding. It was observed that the C-terminal 'pseudo' death effector domain of HIPPI interacted with the 60 bp (- 151 to - 92) upstream sequence of the caspase-1 gene containing the motif. We further observed that expression of caspase-8 and caspase-10 was increased in green fluorescent protein-Hippi-expressing HeLa cells. In addition, HIPPI interacted in vitro with putative promoter sequences of these genes, containing a similar motif. In summary, we identified a novel function of HIPPI; it binds to specific upstream sequences of the caspase-1, caspase-8 and caspase-10 genes and alters the expression of the genes. This result showed the motif-specific interaction of HIPPI with DNA, and indicates that it could act as transcription regulator.
Mixotrophy and intraguild predation - dynamic consequences of shifts between food web motifs

NASA Astrophysics Data System (ADS)

Karnatak, Rajat; Wollrab, Sabine

2017-06-01

Mixotrophy is ubiquitous in microbial communities of aquatic systems with many flagellates being able to use autotroph as well as heterotroph pathways for energy acquisition. The usage of one over the other pathway is associated with resource availability and the coupling of alternative pathways has strong implications for system stability. We investigated the impact of dominance of different energy pathways related to relative resource availability on system dynamics in the setting of a tritrophic food web motif. This motif consists of a mixotroph feeding on a purely autotroph species while competing for a shared resource. In addition, the autotroph can use an additional exclusive food source. By changing the relative abundance of shared vs. exclusive food source, we shift the food web motif from an intraguild predation motif to a food chain motif. We analyzed the dependence of system dynamics on absolute and relative resource availability. In general, the system exhibits a transition from stable to oscillatory dynamics with increasing nutrient availability. However, this transition occurs at a much lower nutrient level for the food chain in comparison to the intraguild predation motif. A similar transition is also observed with variations in the relative abundance of food sources for a range of nutrient levels. We expect this shift in food web motifs to occur frequently in microbial communities and therefore the results from our study are highly relevant for natural systems.
Exploitation of peptide motif sequences and their use in nanobiotechnology.

PubMed

Shiba, Kiyotaka

2010-08-01

Short amino acid sequences extracted from natural proteins or created using in vitro evolution systems are sometimes associated with particular biological functions. These peptides, called peptide motifs, can serve as functional units for the creation of various tools for nanobiotechnology. In particular, peptide motifs that have the ability to specifically recognize the surfaces of solid materials and to mineralize certain inorganic materials have been linking biological science to material science. Here, I review how these peptide motifs have been isolated from natural proteins or created using in vitro evolution systems, and how they have been used in the nanobiotechnology field. Copyright © 2010 Elsevier Ltd. All rights reserved.
Insights into Structural and Mechanistic Features of Viral IRES Elements

PubMed Central

Martinez-Salas, Encarnacion; Francisco-Velilla, Rosario; Fernandez-Chamorro, Javier; Embarek, Azman M.

2018-01-01

Internal ribosome entry site (IRES) elements are cis-acting RNA regions that promote internal initiation of protein synthesis using cap-independent mechanisms. However, distinct types of IRES elements present in the genome of various RNA viruses perform the same function despite lacking conservation of sequence and secondary RNA structure. Likewise, IRES elements differ in host factor requirement to recruit the ribosomal subunits. In spite of this diversity, evolutionarily conserved motifs in each family of RNA viruses preserve sequences impacting on RNA structure and RNA–protein interactions important for IRES activity. Indeed, IRES elements adopting remarkable different structural organizations contain RNA structural motifs that play an essential role in recruiting ribosomes, initiation factors and/or RNA-binding proteins using different mechanisms. Therefore, given that a universal IRES motif remains elusive, it is critical to understand how diverse structural motifs deliver functions relevant for IRES activity. This will be useful for understanding the molecular mechanisms beyond cap-independent translation, as well as the evolutionary history of these regulatory elements. Moreover, it could improve the accuracy to predict IRES-like motifs hidden in genome sequences. This review summarizes recent advances on the diversity and biological relevance of RNA structural motifs for viral IRES elements. PMID:29354113
NLSdb-major update for database of nuclear localization signals and nuclear export signals.

PubMed

Bernhofer, Michael; Goldberg, Tatyana; Wolf, Silvana; Ahmed, Mohamed; Zaugg, Julian; Boden, Mikael; Rost, Burkhard

2018-01-04

NLSdb is a database collecting nuclear export signals (NES) and nuclear localization signals (NLS) along with experimentally annotated nuclear and non-nuclear proteins. NES and NLS are short sequence motifs related to protein transport out of and into the nucleus. The updated NLSdb now contains 2253 NLS and introduces 398 NES. The potential sets of novel NES and NLS have been generated by a simple 'in silico mutagenesis' protocol. We started with motifs annotated by experiments. In step 1, we increased specificity such that no known non-nuclear protein matched the refined motif. In step 2, we increased the sensitivity trying to match several different families with a motif. We then iterated over steps 1 and 2. The final set of 2253 NLS motifs matched 35% of 8421 experimentally verified nuclear proteins (up from 21% for the previous version) and none of 18 278 non-nuclear proteins. We updated the web interface providing multiple options to search protein sequences for NES and NLS motifs, and to evaluate your own signal sequences. NLSdb can be accessed via Rostlab services at: https://rostlab.org/services/nlsdb/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Unitary circular code motifs in genomes of eukaryotes.

PubMed

El Soufi, Karim; Michel, Christian J

A set X of 20 trinucleotides was identified in genes of bacteria, eukaryotes, plasmids and viruses, which has in average the highest occurrence in reading frame compared to its two shifted frames (Michel, 2015; Arquès and Michel, 1996). This set X has an interesting mathematical property as X is a circular code (Arquès and Michel, 1996). Thus, the motifs from this circular code X, called X motifs, have the property to always retrieve, synchronize and maintain the reading frame in genes. The origin of this circular code X in genes is an open problem since its discovery in 1996. Here, we first show that the unitary circular codes (UCC), i.e. sets of one word, allow to generate unitary circular code motifs (UCC motifs), i.e. a concatenation of the same motif (simple repeats) leading to low complexity DNA. Three classes of UCC motifs are studied here: repeated dinucleotides (D + motifs), repeated trinucleotides (T + motifs) and repeated tetranucleotides (T + motifs). Thus, the D + , T + and T + motifs allow to retrieve, synchronize and maintain a frame modulo 2, modulo 3 and modulo 4, respectively, and their shifted frames (1 modulo 2; 1 and 2 modulo 3; 1, 2 and 3 modulo 4 according to the C 2 , C 3 and C 4 properties, respectively) in the DNA sequences. The statistical distribution of the D + , T + and T + motifs is analyzed in the genomes of eukaryotes. A UCC motif and its comp lementary UCC motif have the same distribution in the eukaryotic genomes. Furthermore, a UCC motif and its complementary UCC motif have increasing occurrences contrary to their number of hydrogen bonds, very significant with the T + motifs. The longest D + , T + and T + motifs in the studied eukaryotic genomes are also given. Surprisingly, a scarcity of repeated trinucleotides (T + motifs) in the large eukaryotic genomes is observed compared to the D + and T + motifs. This result has been investigated and may be explained by two outcomes. Repeated trinucleotides (T + motifs) are identified in the X motifs of low composition (cardinality less than 10) in the genomes of eukaryotes. Furthermore, identical trinucleotide pairs of the circular code X are preferentially used in the gene sequences of eukaryotes. These two results suggest that the unitary circular codes of trinucleotides may have been involved in the formation of the trinucleotide circular code X. Indeed, repeated trinucleotides in the X motifs in the genomes of eukaryotes may represent an intermediary evolution from repeated trinucleotides of cardinality 1 (T + motifs) in the genomes of eukaryotes up to the X motifs of cardinality 20 in the gene sequences of eukaryotes. Copyright © 2017 Elsevier B.V. All rights reserved.
Genome Analysis Reveals Interplay between 5′UTR Introns and Nuclear mRNA Export for Secretory and Mitochondrial Genes

PubMed Central

Cenik, Can; Chua, Hon Nian; Zhang, Hui; Tarnawsky, Stefan P.; Akef, Abdalla; Derti, Adnan; Tasan, Murat; Moore, Melissa J.; Palazzo, Alexander F.; Roth, Frederick P.

2011-01-01

In higher eukaryotes, messenger RNAs (mRNAs) are exported from the nucleus to the cytoplasm via factors deposited near the 5′ end of the transcript during splicing. The signal sequence coding region (SSCR) can support an alternative mRNA export (ALREX) pathway that does not require splicing. However, most SSCR–containing genes also have introns, so the interplay between these export mechanisms remains unclear. Here we support a model in which the furthest upstream element in a given transcript, be it an intron or an ALREX–promoting SSCR, dictates the mRNA export pathway used. We also experimentally demonstrate that nuclear-encoded mitochondrial genes can use the ALREX pathway. Thus, ALREX can also be supported by nucleotide signals within mitochondrial-targeting sequence coding regions (MSCRs). Finally, we identified and experimentally verified novel motifs associated with the ALREX pathway that are shared by both SSCRs and MSCRs. Our results show strong correlation between 5′ untranslated region (5′UTR) intron presence/absence and sequence features at the beginning of the coding region. They also suggest that genes encoding secretory and mitochondrial proteins share a common regulatory mechanism at the level of mRNA export. PMID:21533221
A private DNA motif finding algorithm.

PubMed

Chen, Rui; Peng, Yun; Choi, Byron; Xu, Jianliang; Hu, Haibo

2014-08-01

With the increasing availability of genomic sequence data, numerous methods have been proposed for finding DNA motifs. The discovery of DNA motifs serves a critical step in many biological applications. However, the privacy implication of DNA analysis is normally neglected in the existing methods. In this work, we propose a private DNA motif finding algorithm in which a DNA owner's privacy is protected by a rigorous privacy model, known as ∊-differential privacy. It provides provable privacy guarantees that are independent of adversaries' background knowledge. Our algorithm makes use of the n-gram model and is optimized for processing large-scale DNA sequences. We evaluate the performance of our algorithm over real-life genomic data and demonstrate the promise of integrating privacy into DNA motif finding. Copyright © 2014 Elsevier Inc. All rights reserved.
Composite Structural Motifs of Binding Sites for Delineating Biological Functions of Proteins

PubMed Central

Kinjo, Akira R.; Nakamura, Haruki

2012-01-01

Most biological processes are described as a series of interactions between proteins and other molecules, and interactions are in turn described in terms of atomic structures. To annotate protein functions as sets of interaction states at atomic resolution, and thereby to better understand the relation between protein interactions and biological functions, we conducted exhaustive all-against-all atomic structure comparisons of all known binding sites for ligands including small molecules, proteins and nucleic acids, and identified recurring elementary motifs. By integrating the elementary motifs associated with each subunit, we defined composite motifs that represent context-dependent combinations of elementary motifs. It is demonstrated that function similarity can be better inferred from composite motif similarity compared to the similarity of protein sequences or of individual binding sites. By integrating the composite motifs associated with each protein function, we define meta-composite motifs each of which is regarded as a time-independent diagrammatic representation of a biological process. It is shown that meta-composite motifs provide richer annotations of biological processes than sequence clusters. The present results serve as a basis for bridging atomic structures to higher-order biological phenomena by classification and integration of binding site structures. PMID:22347478
Prediction of virus-host protein-protein interactions mediated by short linear motifs.

PubMed

Becerra, Andrés; Bucheli, Victor A; Moreno, Pedro A

2017-03-09

Short linear motifs in host organisms proteins can be mimicked by viruses to create protein-protein interactions that disable or control metabolic pathways. Given that viral linear motif instances of host motif regular expressions can be found by chance, it is necessary to develop filtering methods of functional linear motifs. We conduct a systematic comparison of linear motifs filtering methods to develop a computational approach for predicting motif-mediated protein-protein interactions between human and the human immunodeficiency virus 1 (HIV-1). We implemented three filtering methods to obtain linear motif sets: 1) conserved in viral proteins (C), 2) located in disordered regions (D) and 3) rare or scarce in a set of randomized viral sequences (R). The sets C,D,R are united and intersected. The resulting sets are compared by the number of protein-protein interactions correctly inferred with them - with experimental validation. The comparison is done with HIV-1 sequences and interactions from the National Institute of Allergy and Infectious Diseases (NIAID). The number of correctly inferred interactions allows to rank the interactions by the sets used to deduce them: D∪R and C. The ordering of the sets is descending on the probability of capturing functional interactions. With respect to HIV-1, the sets C∪R, D∪R, C∪D∪R infer all known interactions between HIV1 and human proteins mediated by linear motifs. We found that the majority of conserved linear motifs in the virus are located in disordered regions. We have developed a method for predicting protein-protein interactions mediated by linear motifs between HIV-1 and human proteins. The method only use protein sequences as inputs. We can extend the software developed to any other eukaryotic virus and host in order to find and rank candidate interactions. In future works we will use it to explore possible viral attack mechanisms based on linear motif mimicry.
Purification and sequence analysis of 4-methyl-5-nitrocatechol oxygenase from Burkholderia sp. strain DNT.

PubMed Central

Haigler, B E; Suen, W C; Spain, J C

1996-01-01

4-Methyl-5-nitrocatechol (MNC) is an intermediate in the degradation of 2,4-dinitrotoluene by Burkholderia sp. strain DNT. In the presence of NADPH and oxygen, MNC monooxygenase catalyzes the removal of the nitro group from MNC to form 2-hydroxy-5-methylquinone. The gene (dntB) encoding MNC monooxygenase has been previously cloned and characterized. In order to examine the properties of MNC monooxygenase and to compare it with other enzymes, we sequenced the gene encoding the MNC monooxygenase and purified the enzyme from strain DNT. dntB was localized within a 2.2-kb ApaI DNA fragment. Sequence analysis of this fragment revealed an open reading frame of 1,644 bp with an N-terminal amino acid sequence identical to that of purified MNC monooxygenase from strain DNT. Comparison of the derived amino acid sequences with those of other genes showed that DntB contains the highly conserved ADP and flavin adenine dinucleotide (FAD) binding motifs characteristic of flavoprotein hydroxylases. MNC monooxygenase was purified to homogeneity from strain DNT by anion exchange and gel filtration chromatography. Sodium dodecyl sulfate-polyacrylamide gel electrophoresis revealed a single protein with a molecular weight of 60,200, which is consistent with the size determined from the gene sequence. The native molecular weight determined by gel filtration was 65,000, which indicates that the native enzyme is a monomer. It used either NADH or NADPH as electron donors, and NADPH was the preferred cofactor. The purified enzyme contained 1 mol of FAD per mol of protein, which is also consistent with the detection of an FAD binding motif in the amino acid sequence of DntB. MNC monooxygenase has a narrow substrate specificity. MNC and 4-nitrocatechol are good substrates whereas 3-methyl-4-nitrophenol, 3-methyl-4-nitrocatechol, 4-nitrophenol, 3-nitrophenol, and 4-chlorocatechol were not. These studies suggest that MNC monooxygenase is a flavoprotein that shares some properties with previously studied nitrophenol oxygenases. PMID:8830701
Common fold in helix–hairpin–helix proteins

PubMed Central

Shao, Xuguang; Grishin, Nick V.

2000-01-01

Helix–hairpin–helix (HhH) is a widespread motif involved in non-sequence-specific DNA binding. The majority of HhH motifs function as DNA-binding modules, however, some of them are used to mediate protein–protein interactions or have acquired enzymatic activity by incorporating catalytic residues (DNA glycosylases). From sequence and structural analysis of HhH-containing proteins we conclude that most HhH motifs are integrated as a part of a five-helical domain, termed (HhH)2 domain here. It typically consists of two consecutive HhH motifs that are linked by a connector helix and displays pseudo-2-fold symmetry. (HhH)2 domains show clear structural integrity and a conserved hydrophobic core composed of seven residues, one residue from each α-helix and each hairpin, and deserves recognition as a distinct protein fold. In addition to known HhH in the structures of RuvA, RadA, MutY and DNA-polymerases, we have detected new HhH motifs in sterile alpha motif and barrier-to-autointegration factor domains, the α-subunit of Escherichia coli RNA-polymerase, DNA-helicase PcrA and DNA glycosylases. Statistically significant sequence similarity of HhH motifs and pronounced structural conservation argue for homology between (HhH)2 domains in different protein families. Our analysis helps to clarify how non-symmetric protein motifs bind to the double helix of DNA through the formation of a pseudo-2-fold symmetric (HhH)2 functional unit. PMID:10908318
Identification of the divergent calmodulin binding motif in yeast Ssb1/Hsp75 protein and in other HSP70 family members.

PubMed

Heinen, R C; Diniz-Mendes, L; Silva, J T; Paschoalin, V M F

2006-11-01

Yeast soluble proteins were fractionated by calmodulin-agarose affinity chromatography and the Ca2+/calmodulin-binding proteins were analyzed by SDS-PAGE. One prominent protein of 66 kDa was excised from the gel, digested with trypsin and the masses of the resultant fragments were determined by MALDI/MS. Twenty-one of 38 monoisotopic peptide masses obtained after tryptic digestion were matched to the heat shock protein Ssb1/Hsp75, covering 37% of its sequence. Computational analysis of the primary structure of Ssb1/Hsp75 identified a unique potential amphipathic alpha-helix in its N-terminal ATPase domain with features of target regions for Ca2+/calmodulin binding. This region, which shares 89% similarity to the experimentally determined calmodulin-binding domain from mouse, Hsc70, is conserved in near half of the 113 members of the HSP70 family investigated, from yeast to plant and animals. Based on the sequence of this region, phylogenetic analysis grouped the HSP70s in three distinct branches. Two of them comprise the non-calmodulin binding Hsp70s BIP/GR78, a subfamily of eukaryotic HSP70 localized in the endoplasmic reticulum, and DnaK, a subfamily of prokaryotic HSP70. A third heterogeneous group is formed by eukaryotic cytosolic HSP70s containing the new calmodulin-binding motif and other cytosolic HSP70s whose sequences do not conform to those conserved motif, indicating that not all eukaryotic cytosolic Hsp70s are target for calmodulin regulation. Furthermore, the calmodulin-binding domain found in eukaryotic HSP70s is also the target for binding of Bag-1 - an enhancer of ADP/ATP exchange activity of Hsp70s. A model in which calmodulin displaces Bag-1 and modulates Ssb1/Hsp75 chaperone activity is discussed.
Topological characteristics of helical repeat proteins.

PubMed

Groves, M R; Barford, D

1999-06-01

The recent elucidation of protein structures based upon repeating amino acid motifs, including the armadillo motif, the HEAT motif and tetratricopeptide repeats, reveals that they belong to the class of helical repeat proteins. These proteins share the common property of being assembled from tandem repeats of an alpha-helical structural unit, creating extended superhelical structures that are ideally suited to create a protein recognition interface.
Automated classification of RNA 3D motifs and the RNA 3D Motif Atlas

PubMed Central

Petrov, Anton I.; Zirbel, Craig L.; Leontis, Neocles B.

2013-01-01

The analysis of atomic-resolution RNA three-dimensional (3D) structures reveals that many internal and hairpin loops are modular, recurrent, and structured by conserved non-Watson–Crick base pairs. Structurally similar loops define RNA 3D motifs that are conserved in homologous RNA molecules, but can also occur at nonhomologous sites in diverse RNAs, and which often vary in sequence. To further our understanding of RNA motif structure and sequence variability and to provide a useful resource for structure modeling and prediction, we present a new method for automated classification of internal and hairpin loop RNA 3D motifs and a new online database called the RNA 3D Motif Atlas. To classify the motif instances, a representative set of internal and hairpin loops is automatically extracted from a nonredundant list of RNA-containing PDB files. Their structures are compared geometrically, all-against-all, using the FR3D program suite. The loops are clustered into motif groups, taking into account geometric similarity and structural annotations and making allowance for a variable number of bulged bases. The automated procedure that we have implemented identifies all hairpin and internal loop motifs previously described in the literature. All motif instances and motif groups are assigned unique and stable identifiers and are made available in the RNA 3D Motif Atlas (http://rna.bgsu.edu/motifs), which is automatically updated every four weeks. The RNA 3D Motif Atlas provides an interactive user interface for exploring motif diversity and tools for programmatic data access. PMID:23970545
SamSelect: a sample sequence selection algorithm for quorum planted motif search on large DNA datasets.

PubMed

Yu, Qiang; Wei, Dingbang; Huo, Hongwei

2018-06-18

Given a set of t n-length DNA sequences, q satisfying 0 < q ≤ 1, and l and d satisfying 0 ≤ d < l < n, the quorum planted motif search (qPMS) finds l-length strings that occur in at least qt input sequences with up to d mismatches and is mainly used to locate transcription factor binding sites in DNA sequences. Existing qPMS algorithms have been able to efficiently process small standard datasets (e.g., t = 20 and n = 600), but they are too time consuming to process large DNA datasets, such as ChIP-seq datasets that contain thousands of sequences or more. We analyze the effects of t and q on the time performance of qPMS algorithms and find that a large t or a small q causes a longer computation time. Based on this information, we improve the time performance of existing qPMS algorithms by selecting a sample sequence set D' with a small t and a large q from the large input dataset D and then executing qPMS algorithms on D'. A sample sequence selection algorithm named SamSelect is proposed. The experimental results on both simulated and real data show (1) that SamSelect can select D' efficiently and (2) that the qPMS algorithms executed on D' can find implanted or real motifs in a significantly shorter time than when executed on D. We improve the ability of existing qPMS algorithms to process large DNA datasets from the perspective of selecting high-quality sample sequence sets so that the qPMS algorithms can find motifs in a short time in the selected sample sequence set D', rather than take an unfeasibly long time to search the original sequence set D. Our motif discovery method is an approximate algorithm.

Effector prediction in host-pathogen interaction based on a Markov model of a ubiquitous EPIYA motif

PubMed Central

2010-01-01

Background Effector secretion is a common strategy of pathogen in mediating host-pathogen interaction. Eight EPIYA-motif containing effectors have recently been discovered in six pathogens. Once these effectors enter host cells through type III/IV secretion systems (T3SS/T4SS), tyrosine in the EPIYA motif is phosphorylated, which triggers effectors binding other proteins to manipulate host-cell functions. The objectives of this study are to evaluate the distribution pattern of EPIYA motif in broad biological species, to predict potential effectors with EPIYA motif, and to suggest roles and biological functions of potential effectors in host-pathogen interactions. Results A hidden Markov model (HMM) of five amino acids was built for the EPIYA-motif based on the eight known effectors. Using this HMM to search the non-redundant protein database containing 9,216,047 sequences, we obtained 107,231 sequences with at least one EPIYA motif occurrence and 3115 sequences with multiple repeats of the EPIYA motif. Although the EPIYA motif exists among broad species, it is significantly over-represented in some particular groups of species. For those proteins containing at least four copies of EPIYA motif, most of them are from intracellular bacteria, extracellular bacteria with T3SS or T4SS or intracellular protozoan parasites. By combining the EPIYA motif and the adjacent SH2 binding motifs (KK, R4, Tarp and Tir), we built HMMs of nine amino acids and predicted many potential effectors in bacteria and protista by the HMMs. Some potential effectors for pathogens (such as Lawsonia intracellularis, Plasmodium falciparum and Leishmania major) are suggested. Conclusions Our study indicates that the EPIYA motif may be a ubiquitous functional site for effectors that play an important pathogenicity role in mediating host-pathogen interactions. We suggest that some intracellular protozoan parasites could secrete EPIYA-motif containing effectors through secretion systems similar to the T3SS/T4SS in bacteria. Our predicted effectors provide useful hypotheses for further studies. PMID:21143776
Triazine-based sequence-defined polymers with side-chain diversity and backbone-backbone interaction motifs

DOE PAGES

Grate, Jay W.; Mo, Kai -For; Daily, Michael D.

2016-02-10

Sequence control in polymers, well-known in nature, encodes structure and functionality. Here we introduce a new architecture, based on the nucleophilic aromatic substitution chemistry of cyanuric chloride, that creates a new class of sequence-defined polymers dubbed TZPs. Proof of concept is demonstrated with two synthesized hexamers, having neutral and ionizable side chains. Molecular dynamics simulations show backbone–backbone interactions, including H-bonding motifs and pi–pi interactions. This architecture is arguably biomimetic while differing from sequence-defined polymers having peptide bonds. In conclusion, the synthetic methodology supports the structural diversity of side chains known in peptides, as well as backbone–backbone hydrogen-bonding motifs, and willmore » thus enable new macromolecules and materials with useful functions.« less
Triazine-Based Sequence-Defined Polymers with Side-Chain Diversity and Backbone-Backbone Interaction Motifs.

PubMed

Grate, Jay W; Mo, Kai-For; Daily, Michael D

2016-03-14

Sequence control in polymers, well-known in nature, encodes structure and functionality. Here we introduce a new architecture, based on the nucleophilic aromatic substitution chemistry of cyanuric chloride, that creates a new class of sequence-defined polymers dubbed TZPs. Proof of concept is demonstrated with two synthesized hexamers, having neutral and ionizable side chains. Molecular dynamics simulations show backbone-backbone interactions, including H-bonding motifs and pi-pi interactions. This architecture is arguably biomimetic while differing from sequence-defined polymers having peptide bonds. The synthetic methodology supports the structural diversity of side chains known in peptides, as well as backbone-backbone hydrogen-bonding motifs, and will thus enable new macromolecules and materials with useful functions. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Triazine-based sequence-defined polymers with side-chain diversity and backbone-backbone interaction motifs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grate, Jay W.; Mo, Kai -For; Daily, Michael D.

Sequence control in polymers, well-known in nature, encodes structure and functionality. Here we introduce a new architecture, based on the nucleophilic aromatic substitution chemistry of cyanuric chloride, that creates a new class of sequence-defined polymers dubbed TZPs. Proof of concept is demonstrated with two synthesized hexamers, having neutral and ionizable side chains. Molecular dynamics simulations show backbone–backbone interactions, including H-bonding motifs and pi–pi interactions. This architecture is arguably biomimetic while differing from sequence-defined polymers having peptide bonds. In conclusion, the synthetic methodology supports the structural diversity of side chains known in peptides, as well as backbone–backbone hydrogen-bonding motifs, and willmore » thus enable new macromolecules and materials with useful functions.« less
Characterization of tannase protein sequences of bacteria and fungi: an in silico study.

PubMed

Banerjee, Amrita; Jana, Arijit; Pati, Bikash R; Mondal, Keshab C; Das Mohapatra, Pradeep K

2012-04-01

The tannase protein sequences of 149 bacteria and 36 fungi were retrieved from NCBI database. Among them only 77 bacterial and 31 fungal tannase sequences were taken which have different amino acid compositions. These sequences were analysed for different physical and chemical properties, superfamily search, multiple sequence alignment, phylogenetic tree construction and motif finding to find out the functional motif and the evolutionary relationship among them. The superfamily search for these tannase exposed the occurrence of proline iminopeptidase-like, biotin biosynthesis protein BioH, O-acetyltransferase, carboxylesterase/thioesterase 1, carbon-carbon bond hydrolase, haloperoxidase, prolyl oligopeptidase, C-terminal domain and mycobacterial antigens families and alpha/beta hydrolase superfamily. Some bacterial and fungal sequence showed similarity with different families individually. The multiple sequence alignment of these tannase protein sequences showed conserved regions at different stretches with maximum homology from amino acid residues 389-469 and 482-523 which could be used for designing degenerate primers or probes specific for tannase producing bacterial and fungal species. Phylogenetic tree showed two different clusters; one has only bacteria and another have both fungi and bacteria showing some relationship between these different genera. Although in second cluster near about all fungal species were found together in a corner which indicates the sequence level similarity among fungal genera. The distributions of fourteen motifs analysis revealed Motif 1 with a signature amino acid sequence of 29 amino acids, i.e. GCSTGGREALKQAQRWPHDYDGIIANNPA, was uniformly observed in 83.3 % of studied tannase sequences representing its participation with the structure and enzymatic function.
Interaction of Cu(+) with cytosine and formation of i-motif-like C-M(+)-C complexes: alkali versus coinage metals.

PubMed

Gao, Juehan; Berden, Giel; Rodgers, M T; Oomens, Jos

2016-03-14

The Watson-Crick structure of DNA is among the most well-known molecular structures of our time. However, alternative base-pairing motifs are also known to occur, often depending on base sequence, pH, or the presence of cations. Pairing of cytosine (C) bases induced by the sharing of a single proton (C-H(+)-C) may give rise to the so-called i-motif, which occurs primarily in expanded trinucleotide repeats and the telomeric region of DNA, particularly at low pH. At physiological pH, silver cations were recently found to stabilize C dimers in a C-Ag(+)-C structure analogous to the hemiprotonated C-dimer. Here we use infrared ion spectroscopy in combination with density functional theory calculations at the B3LYP/6-311G+(2df,2p) level to show that copper in the 1+ oxidation state induces an analogous formation of C-Cu(+)-C structures. In contrast to protons and these transition metal ions, alkali metal ions induce a different dimer structure, where each ligand coordinates the alkali metal ion in a bidentate fashion in which the N3 and O2 atoms of both cytosine ligands coordinate to the metal ion, sacrificing hydrogen-bonding interactions between the ligands for improved chelation of the metal cation.
Finding specific RNA motifs: Function in a zeptomole world?

PubMed Central

KNIGHT, ROB; YARUS, MICHAEL

2003-01-01

We have developed a new method for estimating the abundance of any modular (piecewise) RNA motif within a longer random region. We have used this method to estimate the size of the active motifs available to modern SELEX experiments (picomoles of unique sequences) and to a plausible RNA World (zeptomoles of unique sequences: 1 zmole = 602 sequences). Unexpectedly, activities such as specific isoleucine binding are almost certainly present in zeptomoles of molecules, and even ribozymes such as self-cleavage motifs may appear (depending on assumptions about the minimal structures). The number of specified nucleotides is not the only important determinant of a motif’s rarity: The number of modules into which it is divided, and the details of this division, are also crucial. We propose three maxims for easily isolated motifs: the Maxim of Minimization, the Maxim of Multiplicity, and the Maxim of the Median. These maxims together state that selected motifs should be small and composed of as many separate, equally sized modules as possible. For evenly divided motifs with four modules, the largest accessible activity in picomole scale (1–1000 pmole) pools of length 100 is about 34 nucleotides; while for zeptomole scale (1–1000 zmole) pools it is about 20 specific nucleotides (50% probability of occurrence). This latter figure includes some ribozymes and aptamers. Consequently, an RNA metabolism apparently could have begun with only zeptomoles of RNA molecules. PMID:12554865
SIRW: A web server for the Simple Indexing and Retrieval System that combines sequence motif searches with keyword searches.

PubMed

Ramu, Chenna

2003-07-01

SIRW (http://sirw.embl.de/) is a World Wide Web interface to the Simple Indexing and Retrieval System (SIR) that is capable of parsing and indexing various flat file databases. In addition it provides a framework for doing sequence analysis (e.g. motif pattern searches) for selected biological sequences through keyword search. SIRW is an ideal tool for the bioinformatics community for searching as well as analyzing biological sequences of interest.
Conservation of the glycoprotein B homologs of the Kaposi’s sarcoma-associated herpesvirus (KSHV/HHV8) and Old World primate rhadinoviruses of chimpanzees and macaques

PubMed Central

Bruce, A. Gregory; Horst, Jeremy A.; Rose, Timothy M.

2016-01-01

The envelope-associated glycoprotein B (gB) is highly conserved within the Herpesviridae and plays a critical role in viral entry. We analyzed the evolutionary conservation of sequence and structural motifs within the Kaposi’s sarcoma-associated herpesvirus (KSHV) gB and homologs of Old World primate rhadinoviruses belonging to the distinct RV1 and RV2 rhadinovirus lineages. In addition to gB homologs of rhadinoviruses infecting the pig-tailed and rhesus macaques, we cloned and sequenced gB homologs of RV1 and RV2 rhadinoviruses infecting chimpanzees. A structural model of the KSHV gB was determined, and functional motifs and sequence variants were mapped to the model structure. Conserved domains and motifs were identified, including an “RGD” motif that plays a critical role in KSHV binding and entry through the cellular integrin αVβ3. The RGD motif was only detected in RV1 rhadinoviruses suggesting an important difference in cell tropism between the two rhadinovirus lineages. PMID:27070755
Transterm—extended search facilities and improved integration with other databases

PubMed Central

Jacobs, Grant H.; Stockwell, Peter A.; Tate, Warren P.; Brown, Chris M.

2006-01-01

Transterm has now been publicly available for >10 years. Major changes have been made since its last description in this database issue in 2002. The current database provides data for key regions of mRNA sequences, a curated database of mRNA motifs and tools to allow users to investigate their own motifs or mRNA sequences. The key mRNA regions database is derived computationally from Genbank. It contains 3′ and 5′ flanking regions, the initiation and termination signal context and coding sequence for annotated CDS features from Genbank and RefSeq. The database is non-redundant, enabling summary files and statistics to be prepared for each species. Advances include providing extended search facilities, the database may now be searched by BLAST in addition to regular expressions (patterns) allowing users to search for motifs such as known miRNA sequences, and the inclusion of RefSeq data. The database contains >40 motifs or structural patterns important for translational control. In this release, patterns from UTRsite and Rfam are also incorporated with cross-referencing. Users may search their sequence data with Transterm or user-defined patterns. The system is accessible at . PMID:16381889
A sequence-specific transcription activator motif and powerful synthetic variants that bind Mediator using a fuzzy protein interface.

PubMed

Warfield, Linda; Tuttle, Lisa M; Pacheco, Derek; Klevit, Rachel E; Hahn, Steven

2014-08-26

Although many transcription activators contact the same set of coactivator complexes, the mechanism and specificity of these interactions have been unclear. For example, do intrinsically disordered transcription activation domains (ADs) use sequence-specific motifs, or do ADs of seemingly different sequence have common properties that encode activation function? We find that the central activation domain (cAD) of the yeast activator Gcn4 functions through a short, conserved sequence-specific motif. Optimizing the residues surrounding this short motif by inserting additional hydrophobic residues creates very powerful ADs that bind the Mediator subunit Gal11/Med15 with high affinity via a "fuzzy" protein interface. In contrast to Gcn4, the activity of these synthetic ADs is not strongly dependent on any one residue of the AD, and this redundancy is similar to that of some natural ADs in which few if any sequence-specific residues have been identified. The additional hydrophobic residues in the synthetic ADs likely allow multiple faces of the AD helix to interact with the Gal11 activator-binding domain, effectively forming a fuzzier interface than that of the wild-type cAD.
TFBSshape: a motif database for DNA shape features of transcription factor binding sites.

PubMed

Yang, Lin; Zhou, Tianyin; Dror, Iris; Mathelier, Anthony; Wasserman, Wyeth W; Gordân, Raluca; Rohs, Remo

2014-01-01

Transcription factor binding sites (TFBSs) are most commonly characterized by the nucleotide preferences at each position of the DNA target. Whereas these sequence motifs are quite accurate descriptions of DNA binding specificities of transcription factors (TFs), proteins recognize DNA as a three-dimensional object. DNA structural features refine the description of TF binding specificities and provide mechanistic insights into protein-DNA recognition. Existing motif databases contain extensive nucleotide sequences identified in binding experiments based on their selection by a TF. To utilize DNA shape information when analysing the DNA binding specificities of TFs, we developed a new tool, the TFBSshape database (available at http://rohslab.cmb.usc.edu/TFBSshape/), for calculating DNA structural features from nucleotide sequences provided by motif databases. The TFBSshape database can be used to generate heat maps and quantitative data for DNA structural features (i.e., minor groove width, roll, propeller twist and helix twist) for 739 TF datasets from 23 different species derived from the motif databases JASPAR and UniPROBE. As demonstrated for the basic helix-loop-helix and homeodomain TF families, our TFBSshape database can be used to compare, qualitatively and quantitatively, the DNA binding specificities of closely related TFs and, thus, uncover differential DNA binding specificities that are not apparent from nucleotide sequence alone.
TFBSshape: a motif database for DNA shape features of transcription factor binding sites

PubMed Central

Yang, Lin; Zhou, Tianyin; Dror, Iris; Mathelier, Anthony; Wasserman, Wyeth W.; Gordân, Raluca; Rohs, Remo

2014-01-01

Transcription factor binding sites (TFBSs) are most commonly characterized by the nucleotide preferences at each position of the DNA target. Whereas these sequence motifs are quite accurate descriptions of DNA binding specificities of transcription factors (TFs), proteins recognize DNA as a three-dimensional object. DNA structural features refine the description of TF binding specificities and provide mechanistic insights into protein–DNA recognition. Existing motif databases contain extensive nucleotide sequences identified in binding experiments based on their selection by a TF. To utilize DNA shape information when analysing the DNA binding specificities of TFs, we developed a new tool, the TFBSshape database (available at http://rohslab.cmb.usc.edu/TFBSshape/), for calculating DNA structural features from nucleotide sequences provided by motif databases. The TFBSshape database can be used to generate heat maps and quantitative data for DNA structural features (i.e., minor groove width, roll, propeller twist and helix twist) for 739 TF datasets from 23 different species derived from the motif databases JASPAR and UniPROBE. As demonstrated for the basic helix-loop-helix and homeodomain TF families, our TFBSshape database can be used to compare, qualitatively and quantitatively, the DNA binding specificities of closely related TFs and, thus, uncover differential DNA binding specificities that are not apparent from nucleotide sequence alone. PMID:24214955
Exploring the limits of sequence and structure in a variant βγ-crystallin domain of the protein absent in melanoma-1 (AIM1)

PubMed Central

Aravind, Penmatsa; Wistow, Graeme; Sharma, Yogendra; Sankaranarayanan, Rajan

2008-01-01

βγ-Crystallins belong to a superfamily of proteins in prokaryotes and eukaryotes that are based on duplications of a characteristic, highly conserved Greek Key motif. Most members of the superfamily in vertebrates are structural proteins of the eye lens that contain four motifs arranged as two structural domains. Absent in melanoma-1 (AIM1), an unusual member of the superfamily whose expression is associated with suppression of malignancy in melanoma, contains 12 βγ-crystallin motifs in six domains. Some of these motifs diverge considerably from the canonical motif sequence. AIM1g1, the first βγ-crystallin domain of AIM1, is the most variant of βγ-crystallin domains currently known. In order to understand the limits of sequence variation on the structure, we report the crystal structure of AIM1g1 at 1.9Å resolution. In spite of having changes in key residues, the domain retains the overall βγ-crystallin fold. The domain also contains an unusual extended surface loop that significantly alters the shape of the domain and its charge profile. This structure illustrates the resilience of the βγ fold to considerable sequence changes and its remarkable ability to adapt for novel functions. PMID:18582473
Identification of sequence-structure RNA binding motifs for SELEX-derived aptamers.

PubMed

Hoinka, Jan; Zotenko, Elena; Friedman, Adam; Sauna, Zuben E; Przytycka, Teresa M

2012-06-15

Systematic Evolution of Ligands by EXponential Enrichment (SELEX) represents a state-of-the-art technology to isolate single-stranded (ribo)nucleic acid fragments, named aptamers, which bind to a molecule (or molecules) of interest via specific structural regions induced by their sequence-dependent fold. This powerful method has applications in designing protein inhibitors, molecular detection systems, therapeutic drugs and antibody replacement among others. However, full understanding and consequently optimal utilization of the process has lagged behind its wide application due to the lack of dedicated computational approaches. At the same time, the combination of SELEX with novel sequencing technologies is beginning to provide the data that will allow the examination of a variety of properties of the selection process. To close this gap we developed, Aptamotif, a computational method for the identification of sequence-structure motifs in SELEX-derived aptamers. To increase the chances of identifying functional motifs, Aptamotif uses an ensemble-based approach. We validated the method using two published aptamer datasets containing experimentally determined motifs of increasing complexity. We were able to recreate the author's findings to a high degree, thus proving the capability of our approach to identify binding motifs in SELEX data. Additionally, using our new experimental dataset, we illustrate the application of Aptamotif to elucidate several properties of the selection process.
Modeling of DNA local parameters predicts encrypted architectural motifs in Xenopus laevis ribosomal gene promoter

PubMed Central

Roux-Rouquie, Magali; Marilley, Monique

2000-01-01

We have modeled local DNA sequence parameters to search for DNA architectural motifs involved in transcription regulation and promotion within the Xenopus laevis ribosomal gene promoter and the intergenic spacer (IGS) sequences. The IGS was found to be shaped into distinct topological domains. First, intrinsic bends split the IGS into domains of common but different helical features. Local parameters at inter-domain junctions exhibit a high variability with respect to intrinsic curvature, bendability and thermal stability. Secondly, the repeated sequence blocks of the IGS exhibit right-handed supercoiled structures which could be related to their enhancer properties. Thirdly, the gene promoter presents both inherent curvature and minor groove narrowing which may be viewed as motifs of a structural code for protein recognition and binding. Such pre-existing deformations could simply be remodeled during the binding of the transcription complex. Alternatively, these deformations could pre-shape the promoter in such a way that further remodeling is facilitated. Mutations shown to abolish promoter curvature as well as intrinsic minor groove narrowing, in a variant which maintained full transcriptional activity, bring circumstantial evidence for structurally-preorganized motifs in relation to transcription regulation and promotion. Using well documented X.laevis rDNA regulatory sequences we showed that computer modeling may be of invaluable assistance in assessing encrypted architectural motifs. The evidence of these DNA topological motifs with respect to the concept of structural code is discussed. PMID:10982860
Characterization of cDNAs and genomic DNAs for human threonyl- and cysteinyl-tRNA synthetases

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cruzen, M.E.

1993-01-01

Techniques of molecular biology were used to clone, sequence and map two human aminoacyl-tRNA synthetase (aaRS) cDNAs: threonyl-tRNA synthetase (ThrRS) a class II enzyme and cysteinyl-tRNA synthetase (CysRS) a class I enzyme. The predicted protein sequence of human ThrRS is highly homologous to that of lower eukaryotic and prokaryotic ThRSs, particularly in the regions containing the three structural motifs common to all class II synthetases. Signature regions 1 and 2, which characterize the class IIa subgroup (SerRS, ThrRS and HisRS) are highly conserved from bacteria to human. Structural predictions for human ThrRS based on the known structure of the closelymore » related SerRS from E.coli implicate strongly conserved residues in the signature sequences to be important in substrate binding. The amino terminal 100 residues of the deduced amino acid sequence of ThrRS shares structural similarity to SerRS consistent with forming an antiparallel helix implicated in tRNA binding. The 5' untranslated sequence of the human ThrRS gene shares short stretches of common sequence with the gene for hamster HisRS including a binding site for the promoter specific transcription factor sp-1. The deduced amino acid sequence of human CysRS has a high degree of sequence identify to E. coli CysRS. Human CysRS possesses the classic characteristics of a class I synthetase and is most closely related to the MetRS subgroup. The amino terminal half of human CysRS can be modeled as a nucleotide binding fold and shares significant sequence and structural similarity to the other enzymes in this subgroup. The CysRS structural gene (CARS) was mapped to human chromosome 11p15.5 by fluorescent in situ hybridization. CARS is the first aaRS gene to be mapped to chromosome 11. The steady state of both CysRS and ThrRs mRNA were quantitated in several human tissues. Message levels for these enzymes appear to be subjected to differential regulation in different cell types.« less
NNAlign: a platform to construct and evaluate artificial neural network models of receptor-ligand interactions.

PubMed

Nielsen, Morten; Andreatta, Massimo

2017-07-03

Peptides are extensively used to characterize functional or (linear) structural aspects of receptor-ligand interactions in biological systems, e.g. SH2, SH3, PDZ peptide-recognition domains, the MHC membrane receptors and enzymes such as kinases and phosphatases. NNAlign is a method for the identification of such linear motifs in biological sequences. The algorithm aligns the amino acid or nucleotide sequences provided as training set, and generates a model of the sequence motif detected in the data. The webserver allows setting up cross-validation experiments to estimate the performance of the model, as well as evaluations on independent data. Many features of the training sequences can be encoded as input, and the network architecture is highly customizable. The results returned by the server include a graphical representation of the motif identified by the method, performance values and a downloadable model that can be applied to scan protein sequences for occurrence of the motif. While its performance for the characterization of peptide-MHC interactions is widely documented, we extended NNAlign to be applicable to other receptor-ligand systems as well. Version 2.0 supports alignments with insertions and deletions, encoding of receptor pseudo-sequences, and custom alphabets for the training sequences. The server is available at http://www.cbs.dtu.dk/services/NNAlign-2.0. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
THGS: a web-based database of Transmembrane Helices in Genome Sequences

PubMed Central

Fernando, S. A.; Selvarani, P.; Das, Soma; Kumar, Ch. Kiran; Mondal, Sukanta; Ramakumar, S.; Sekar, K.

2004-01-01

Transmembrane Helices in Genome Sequences (THGS) is an interactive web-based database, developed to search the transmembrane helices in the user-interested gene sequences available in the Genome Database (GDB). The proposed database has provision to search sequence motifs in transmembrane and globular proteins. In addition, the motif can be searched in the other sequence databases (Swiss-Prot and PIR) or in the macromolecular structure database, Protein Data Bank (PDB). Further, the 3D structure of the corresponding queried motif, if it is available in the solved protein structures deposited in the Protein Data Bank, can also be visualized using the widely used graphics package RASMOL. All the sequence databases used in the present work are updated frequently and hence the results produced are up to date. The database THGS is freely available via the world wide web and can be accessed at http://pranag.physics.iisc.ernet.in/thgs/ or http://144.16.71.10/thgs/. PMID:14681375
RSAT 2015: Regulatory Sequence Analysis Tools

PubMed Central

Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A.; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M.; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

2015-01-01

RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. PMID:25904632

The animal sialyltransferases and sialyltransferase-related genes: a phylogenetic approach.

PubMed

Harduin-Lepers, Anne; Mollicone, Rosella; Delannoy, Philippe; Oriol, Rafael

2005-08-01

The animal sialyltransferases are Golgi type II transmembrane glycosyltransferases. Twenty distinct sialyltransferases have been identified in both human and murine genomes. These enzymes catalyze transfer of sialic acid from CMP-Neu5Ac to the glycan moiety of glycoconjugates. Despite low overall identities, they share four conserved peptide motifs [L (large), S (small), motif III, and motif VS (very small)] that are hallmarks for sialyltransferase identification. We have identified 155 new putative genes in 25 animal species, and we have exploited two lines of evidence: (1) sequence comparisons and (2) exon-intron organization of the genes. An ortholog to the ancestor present before the split of ST6Gal I and II subfamilies was detected in arthropods. An ortholog to the ancestor present before the split of ST6GalNAc III, IV, V, and VI subfamilies was detected in sea urchin. An ortholog to the ancestor present before the split of ST3Gal I and II subfamilies was detected in ciona, and an ortholog to the ancestor of all the ST8Sia was detected in amphioxus. Therefore, single examples of the four families (ST3Gal, ST6Gal, ST6GalNAc, and ST8Sia) have appeared in invertebrates, earlier than previously thought, whereas the four families were all detected in bony fishes, amphibians, birds, and mammals. As previously hypothesized, sequence similarities among sialyltransferases suggest a common genetic origin, by successive duplications of an ancestral gene, followed by divergent evolution. Finally, we propose predictions on these invertebrates sialyltransferase-related activities that have not previously been demonstrated and that will ultimately need to be substantiated by protein expression and enzymatic activity assays.
Molecular and functional characterization of clathrin- and AP-2-binding determinants within a disordered domain of auxilin.

PubMed

Scheele, Urte; Alves, Jurgen; Frank, Ronald; Duwel, Michael; Kalthoff, Christoph; Ungewickell, Ernst

2003-07-11

Uncoating of clathrin-coated vesicles requires the J-domain protein auxilin for targeting hsc70 to the clathrin coats and for stimulating the hsc70 ATPase activity. This results in the release of hsc70-complexed clathrin triskelia and concomitant dissociation of the coat. To understand the complex role of auxilin in uncoating and clathrin assembly in more detail, we analyzed the molecular organization of its clathrin-binding domain (amino acids 547-813). CD spectroscopy of auxilin fragments revealed that the clathrin-binding domain is almost completely disordered in solution. By systematic mapping using synthetic peptides and by site-directed mutagenesis, we identified short peptide sequences involved in clathrin heavy chain and AP-2 binding and evaluated their significance for the function of auxilin. Some of the binding determinants, including those containing sequences 674DPF and 636WDW, showed dual specificity for both clathrin and AP-2. In contrast, the two DLL motifs within the clathrin-binding domain were exclusively involved in clathrin binding. Surprisingly, they interacted not only with the N-terminal domain of the heavy chain, but also with the distal domain. Moreover, both DLL peptides proved to be essential for clathrin assembly and uncoating. In addition, we found that the motif 726NWQ is required for efficient clathrin assembly activity. Auxilin shares a number of protein-protein interaction motifs with other endocytic proteins, including AP180. We demonstrate that AP180 and auxilin compete for binding to the alpha-ear domain of AP-2. Like AP180, auxilin also directly interacts with the ear domain of beta-adaptin. On the basis of our data, we propose a refined model for the uncoating mechanism of clathrin-coated vesicles.
The C-terminal portion of the cleaved HT motif is necessary and sufficient to mediate export of proteins from the malaria parasite into its host cell

PubMed Central

Tarr, Sarah J; Cryar, Adam; Thalassinos, Konstantinos; Haldar, Kasturi; Osborne, Andrew R

2013-01-01

The malaria parasite exports proteins across its plasma membrane and a surrounding parasitophorous vacuole membrane, into its host erythrocyte. Most exported proteins contain a Host Targeting motif (HT motif) that targets them for export. In the parasite secretory pathway, the HT motif is cleaved by the protease plasmepsin V, but the role of the newly generated N-terminal sequence in protein export is unclear. Using a model protein that is cleaved by an exogenous viral protease, we show that the new N-terminal sequence, normally generated by plasmepsin V cleavage, is sufficient to target a protein for export, and that cleavage by plasmepsin V is not coupled directly to the transfer of a protein to the next component in the export pathway. Mutation of the fourth and fifth positions of the HT motif, as well as amino acids further downstream, block or affect the efficiency of protein export indicating that this region is necessary for efficient export. We also show that the fifth position of the HT motif is important for plasmepsin V cleavage. Our results indicate that plasmepsin V cleavage is required to generate a new N-terminal sequence that is necessary and sufficient to mediate protein export by the malaria parasite. PMID:23279267
Statistical Methods for Identifying Sequence Motifs Affecting Point Mutations

PubMed Central

Zhu, Yicheng; Neeman, Teresa; Yap, Von Bing; Huttley, Gavin A.

2017-01-01

Mutation processes differ between types of point mutation, genomic locations, cells, and biological species. For some point mutations, specific neighboring bases are known to be mechanistically influential. Beyond these cases, numerous questions remain unresolved, including: what are the sequence motifs that affect point mutations? How large are the motifs? Are they strand symmetric? And, do they vary between samples? We present new log-linear models that allow explicit examination of these questions, along with sequence logo style visualization to enable identifying specific motifs. We demonstrate the performance of these methods by analyzing mutation processes in human germline and malignant melanoma. We recapitulate the known CpG effect, and identify novel motifs, including a highly significant motif associated with A→G mutations. We show that major effects of neighbors on germline mutation lie within ±2 of the mutating base. Models are also presented for contrasting the entire mutation spectra (the distribution of the different point mutations). We show the spectra vary significantly between autosomes and X-chromosome, with a difference in T→C transition dominating. Analyses of malignant melanoma confirmed reported characteristic features of this cancer, including statistically significant strand asymmetry, and markedly different neighboring influences. The methods we present are made freely available as a Python library https://bitbucket.org/pycogent3/mutationmotif. PMID:27974498
Evolutionary history of the alpha2,8-sialyltransferase (ST8Sia) gene family: Tandem duplications in early deuterostomes explain most of the diversity found in the vertebrate ST8Sia genes

PubMed Central

2008-01-01

Background The animal sialyltransferases, which catalyze the transfer of sialic acid to the glycan moiety of glycoconjugates, are subdivided into four families: ST3Gal, ST6Gal, ST6GalNAc and ST8Sia, based on acceptor sugar specificity and glycosidic linkage formed. Despite low overall sequence identity between each sialyltransferase family, all sialyltransferases share four conserved peptide motifs (L, S, III and VS) that serve as hallmarks for the identification of the sialyltransferases. Currently, twenty subfamilies have been described in mammals and birds. Examples of the four sialyltransferase families have also been found in invertebrates. Focusing on the ST8Sia family, we investigated the origin of the three groups of α2,8-sialyltransferases demonstrated in vertebrates to carry out poly-, oligo- and mono-α2,8-sialylation. Results We identified in the genome of invertebrate deuterostomes, orthologs to the common ancestor for each of the three vertebrate ST8Sia groups and a set of novel genes named ST8Sia EX, not found in vertebrates. All these ST8Sia sequences share a new conserved family-motif, named "C-term" that is involved in protein folding, via an intramolecular disulfide bridge. Interestingly, sequences from Branchiostoma floridae orthologous to the common ancestor of polysialyltransferases possess a polysialyltransferase domain (PSTD) and those orthologous to the common ancestor of oligosialyltransferases possess a new ST8Sia III-specific motif similar to the PSTD. In osteichthyans, we have identified two new subfamilies. In addition, we describe the expression profile of ST8Sia genes in Danio rerio. Conclusion Polysialylation appeared early in the deuterostome lineage. The recent release of several deuterostome genome databases and paralogons combined with synteny analysis allowed us to obtain insight into events at the gene level that led to the diversification of the ST8Sia genes, with their corresponding enzymatic activities, in both invertebrates and vertebrates. The initial expansion and subsequent divergence of the ST8Sia genes resulted as a consequence of a series of ancient duplications and translocations in the invertebrate genome long before the emergence of vertebrates. A second subset of ST8sia genes in the vertebrate genome arose from whole genome duplication (WGD) R1 and R2. Subsequent selective ST8Sia gene loss is responsible for the characteristic ST8Sia gene expression pattern observed today in individual species. PMID:18811928
Evolutionary history of the alpha2,8-sialyltransferase (ST8Sia) gene family: tandem duplications in early deuterostomes explain most of the diversity found in the vertebrate ST8Sia genes.

PubMed

Harduin-Lepers, Anne; Petit, Daniel; Mollicone, Rosella; Delannoy, Philippe; Petit, Jean-Michel; Oriol, Rafael

2008-09-23

The animal sialyltransferases, which catalyze the transfer of sialic acid to the glycan moiety of glycoconjugates, are subdivided into four families: ST3Gal, ST6Gal, ST6GalNAc and ST8Sia, based on acceptor sugar specificity and glycosidic linkage formed. Despite low overall sequence identity between each sialyltransferase family, all sialyltransferases share four conserved peptide motifs (L, S, III and VS) that serve as hallmarks for the identification of the sialyltransferases. Currently, twenty subfamilies have been described in mammals and birds. Examples of the four sialyltransferase families have also been found in invertebrates. Focusing on the ST8Sia family, we investigated the origin of the three groups of alpha2,8-sialyltransferases demonstrated in vertebrates to carry out poly-, oligo- and mono-alpha2,8-sialylation. We identified in the genome of invertebrate deuterostomes, orthologs to the common ancestor for each of the three vertebrate ST8Sia groups and a set of novel genes named ST8Sia EX, not found in vertebrates. All these ST8Sia sequences share a new conserved family-motif, named "C-term" that is involved in protein folding, via an intramolecular disulfide bridge. Interestingly, sequences from Branchiostoma floridae orthologous to the common ancestor of polysialyltransferases possess a polysialyltransferase domain (PSTD) and those orthologous to the common ancestor of oligosialyltransferases possess a new ST8Sia III-specific motif similar to the PSTD. In osteichthyans, we have identified two new subfamilies. In addition, we describe the expression profile of ST8Sia genes in Danio rerio. Polysialylation appeared early in the deuterostome lineage. The recent release of several deuterostome genome databases and paralogons combined with synteny analysis allowed us to obtain insight into events at the gene level that led to the diversification of the ST8Sia genes, with their corresponding enzymatic activities, in both invertebrates and vertebrates. The initial expansion and subsequent divergence of the ST8Sia genes resulted as a consequence of a series of ancient duplications and translocations in the invertebrate genome long before the emergence of vertebrates. A second subset of ST8sia genes in the vertebrate genome arose from whole genome duplication (WGD) R1 and R2. Subsequent selective ST8Sia gene loss is responsible for the characteristic ST8Sia gene expression pattern observed today in individual species.
Integration of Bioinformatics and Synthetic Promoters Leads to the Discovery of Novel Elicitor-Responsive cis-Regulatory Sequences in Arabidopsis1[C][W][OA

PubMed Central

Koschmann, Jeannette; Machens, Fabian; Becker, Marlies; Niemeyer, Julia; Schulze, Jutta; Bülow, Lorenz; Stahl, Dietmar J.; Hehl, Reinhard

2012-01-01

A combination of bioinformatic tools, high-throughput gene expression profiles, and the use of synthetic promoters is a powerful approach to discover and evaluate novel cis-sequences in response to specific stimuli. With Arabidopsis (Arabidopsis thaliana) microarray data annotated to the PathoPlant database, 732 different queries with a focus on fungal and oomycete pathogens were performed, leading to 510 up-regulated gene groups. Using the binding site estimation suite of tools, BEST, 407 conserved sequence motifs were identified in promoter regions of these coregulated gene sets. Motif similarities were determined with STAMP, classifying the 407 sequence motifs into 37 families. A comparative analysis of these 37 families with the AthaMap, PLACE, and AGRIS databases revealed similarities to known cis-elements but also led to the discovery of cis-sequences not yet implicated in pathogen response. Using a parsley (Petroselinum crispum) protoplast system and a modified reporter gene vector with an internal transformation control, 25 elicitor-responsive cis-sequences from 10 different motif families were identified. Many of the elicitor-responsive cis-sequences also drive reporter gene expression in an Agrobacterium tumefaciens infection assay in Nicotiana benthamiana. This work significantly increases the number of known elicitor-responsive cis-sequences and demonstrates the successful integration of a diverse set of bioinformatic resources combined with synthetic promoter analysis for data mining and functional screening in plant-pathogen interaction. PMID:22744985
A Structural Overview of RNA-Dependent RNA Polymerases from the Flaviviridae Family

PubMed Central

Wu, Jiqin; Liu, Weichi; Gong, Peng

2015-01-01

RNA-dependent RNA polymerases (RdRPs) from the Flaviviridae family are representatives of viral polymerases that carry out RNA synthesis through a de novo initiation mechanism. They share a ≈ 600-residue polymerase core that displays a canonical viral RdRP architecture resembling an encircled right hand with palm, fingers, and thumb domains surrounding the active site. Polymerase catalytic motifs A–E in the palm and motifs F/G in the fingers are shared by all viral RdRPs with sequence and/or structural conservations regardless of the mechanism of initiation. Different from RdRPs carrying out primer-dependent initiation, Flaviviridae and other de novo RdRPs utilize a priming element often integrated in the thumb domain to facilitate primer-independent initiation. Upon the transition to the elongation phase, this priming element needs to undergo currently unresolved conformational rearrangements to accommodate the growth of the template-product RNA duplex. In the genera of Flavivirus and Pestivirus, the polymerase module in the C-terminal part of the RdRP protein may be regulated in cis by the N-terminal region of the same polypeptide. Either being a methyltransferase in Flavivirus or a functionally unclarified module in Pestivirus, this region could play auxiliary roles for the canonical folding and/or the catalysis of the polymerase, through defined intra-molecular interactions. PMID:26062131
Evolutionary dynamics of a conserved sequence motif in the ribosomal genes of the ciliate Paramecium.

PubMed

Catania, Francesco; Lynch, Michael

2010-05-04

In protozoa, the identification of preserved motifs by comparative genomics is often impeded by difficulties to generate reliable alignments for non-coding sequences. Moreover, the evolutionary dynamics of regulatory elements in 3' untranslated regions (both in protozoa and metazoa) remains a virtually unexplored issue. By screening Paramecium tetraurelia's 3' untranslated regions for 8-mers that were previously found to be preserved in mammalian 3' UTRs, we detect and characterize a motif that is distinctly conserved in the ribosomal genes of this ciliate. The motif appears to be conserved across Paramecium aurelia species but is absent from the ribosomal genes of four additional non-Paramecium species surveyed, including another ciliate, Tetrahymena thermophila. Motif-free ribosomal genes retain fewer paralogs in the genome and appear to be lost more rapidly relative to motif-containing genes. Features associated with the discovered preserved motif are consistent with this 8-mer playing a role in post-transcriptional regulation. Our observations 1) shed light on the evolution of a putative regulatory motif across large phylogenetic distances; 2) are expected to facilitate the understanding of the modulation of ribosomal genes expression in Paramecium; and 3) reveal a largely unexplored--and presumably not restricted to Paramecium--association between the presence/absence of a DNA motif and the evolutionary fate of its host genes.
Conservation of the Human Integrin-Type Beta-Propeller Domain in Bacteria

PubMed Central

Chouhan, Bhanupratap; Denesyuk, Alexander; Heino, Jyrki; Johnson, Mark S.; Denessiouk, Konstantin

2011-01-01

Integrins are heterodimeric cell-surface receptors with key functions in cell-cell and cell-matrix adhesion. Integrin α and β subunits are present throughout the metazoans, but it is unclear whether the subunits predate the origin of multicellular organisms. Several component domains have been detected in bacteria, one of which, a specific 7-bladed β-propeller domain, is a unique feature of the integrin α subunits. Here, we describe a structure-derived motif, which incorporates key features of each blade from the X-ray structures of human αIIbβ3 and αVβ3, includes elements of the FG-GAP/Cage and Ca2+-binding motifs, and is specific only for the metazoan integrin domains. Separately, we searched for the metazoan integrin type β-propeller domains among all available sequences from bacteria and unicellular eukaryotic organisms, which must incorporate seven repeats, corresponding to the seven blades of the β-propeller domain, and so that the newly found structure-derived motif would exist in every repeat. As the result, among 47 available genomes of unicellular eukaryotes we could not find a single instance of seven repeats with the motif. Several sequences contained three repeats, a predicted transmembrane segment, and a short cytoplasmic motif associated with some integrins, but otherwise differ from the metazoan integrin α subunits. Among the available bacterial sequences, we found five examples containing seven sequential metazoan integrin-specific motifs within the seven repeats. The motifs differ in having one Ca2+-binding site per repeat, whereas metazoan integrins have three or four sites. The bacterial sequences are more conserved in terms of motif conservation and loop length, suggesting that the structure is more regular and compact than those example structures from human integrins. Although the bacterial examples are not full-length integrins, the full-length metazoan-type 7-bladed β-propeller domains are present, and sometimes two tandem copies are found. PMID:22022374
Subtle Changes in Motif Positioning Cause Tissue-Specific Effects on Robustness of an Enhancer's Activity

PubMed Central

Erceg, Jelena; Saunders, Timothy E.; Girardot, Charles; Devos, Damien P.; Hufnagel, Lars; Furlong, Eileen E. M.

2014-01-01

Deciphering the specific contribution of individual motifs within cis-regulatory modules (CRMs) is crucial to understanding how gene expression is regulated and how this process is affected by sequence variation. But despite vast improvements in the ability to identify where transcription factors (TFs) bind throughout the genome, we are limited in our ability to relate information on motif occupancy to function from sequence alone. Here, we engineered 63 synthetic CRMs to systematically assess the relationship between variation in the content and spacing of motifs within CRMs to CRM activity during development using Drosophila transgenic embryos. In over half the cases, very simple elements containing only one or two types of TF binding motifs were capable of driving specific spatio-temporal patterns during development. Different motif organizations provide different degrees of robustness to enhancer activity, ranging from binary on-off responses to more subtle effects including embryo-to-embryo and within-embryo variation. By quantifying the effects of subtle changes in motif organization, we were able to model biophysical rules that explain CRM behavior and may contribute to the spatial positioning of CRM activity in vivo. For the same enhancer, the effects of small differences in motif positions varied in developmentally related tissues, suggesting that gene expression may be more susceptible to sequence variation in one tissue compared to another. This result has important implications for human eQTL studies in which many associated mutations are found in cis-regulatory regions, though the mechanism for how they affect tissue-specific gene expression is often not understood. PMID:24391522
DynaMIT: the dynamic motif integration toolkit

PubMed Central

Dassi, Erik; Quattrone, Alessandro

2016-01-01

De-novo motif search is a frequently applied bioinformatics procedure to identify and prioritize recurrent elements in sequences sets for biological investigation, such as the ones derived from high-throughput differential expression experiments. Several algorithms have been developed to perform motif search, employing widely different approaches and often giving divergent results. In order to maximize the power of these investigations and ultimately be able to draft solid biological hypotheses, there is the need for applying multiple tools on the same sequences and merge the obtained results. However, motif reporting formats and statistical evaluation methods currently make such an integration task difficult to perform and mostly restricted to specific scenarios. We thus introduce here the Dynamic Motif Integration Toolkit (DynaMIT), an extremely flexible platform allowing to identify motifs employing multiple algorithms, integrate them by means of a user-selected strategy and visualize results in several ways; furthermore, the platform is user-extendible in all its aspects. DynaMIT is freely available at http://cibioltg.bitbucket.org. PMID:26253738
Motif finding in DNA sequences based on skipping nonconserved positions in background Markov chains.

PubMed

Zhao, Xiaoyan; Sze, Sing-Hoi

2011-05-01

One strategy to identify transcription factor binding sites is through motif finding in upstream DNA sequences of potentially co-regulated genes. Despite extensive efforts, none of the existing algorithms perform very well. We consider a string representation that allows arbitrary ignored positions within the nonconserved portion of single motifs, and use O(2(l)) Markov chains to model the background distributions of motifs of length l while skipping these positions within each Markov chain. By focusing initially on positions that have fixed nucleotides to define core occurrences, we develop an algorithm to identify motifs of moderate lengths. We compare the performance of our algorithm to other motif finding algorithms on a few benchmark data sets, and show that significant improvement in accuracy can be obtained when the sites are sufficiently conserved within a given sample, while comparable performance is obtained when the site conservation rate is low. A software program (PosMotif ) and detailed results are available online at http://faculty.cse.tamu.edu/shsze/posmotif.
Mapping and analysis of Caenorhabditis elegans transcription factor sequence specificities

PubMed Central

Narasimhan, Kamesh; Lambert, Samuel A; Yang, Ally WH; Riddell, Jeremy; Mnaimneh, Sanie; Zheng, Hong; Albu, Mihai; Najafabadi, Hamed S; Reece-Hoyes, John S; Fuxman Bass, Juan I; Walhout, Albertha JM; Weirauch, Matthew T; Hughes, Timothy R

2015-01-01

Caenorhabditis elegans is a powerful model for studying gene regulation, as it has a compact genome and a wealth of genomic tools. However, identification of regulatory elements has been limited, as DNA-binding motifs are known for only 71 of the estimated 763 sequence-specific transcription factors (TFs). To address this problem, we performed protein binding microarray experiments on representatives of canonical TF families in C. elegans, obtaining motifs for 129 TFs. Additionally, we predict motifs for many TFs that have DNA-binding domains similar to those already characterized, increasing coverage of binding specificities to 292 C. elegans TFs (∼40%). These data highlight the diversification of binding motifs for the nuclear hormone receptor and C2H2 zinc finger families and reveal unexpected diversity of motifs for T-box and DM families. Motif enrichment in promoters of functionally related genes is consistent with known biology and also identifies putative regulatory roles for unstudied TFs. DOI: http://dx.doi.org/10.7554/eLife.06967.001 PMID:25905672
Sequence and expression analyses of porcine ISG15 and ISG43 genes.

PubMed

Huang, Jiangnan; Zhao, Shuhong; Zhu, Mengjin; Wu, Zhenfang; Yu, Mei

2009-08-01

The coding sequences of porcine interferon-stimulated gene 15 (ISG15) and the interferon-stimulated gene (ISG43) were cloned from swine spleen mRNA. The amino acid sequences deduced from porcine ISG15 and ISG43 genes coding sequence shared 24-75% and 29-83% similarity with ISG15s and ISG43s from other vertebrates, respectively. Structural analyses revealed that porcine ISG15 comprises two ubiquitin homologues motifs (UBQ) domain and a conserved C-terminal LRLRGG conjugating motif. Porcine ISG43 contains an ubiquitin-processing proteases-like domain. Phylogenetic analyses showed that porcine ISG15 and ISG43 were mostly related to rat ISG15 and cattle ISG43, respectively. Using quantitative real-time PCR assay, significant increased expression levels of porcine ISG15 and ISG43 genes were detected in porcine kidney endothelial cells (PK15) cells treated with poly I:C. We also observed the enhanced mRNA expression of three members of dsRNA pattern-recognition receptors (PRR), TLR3, DDX58 and IFIH1, which have been reported to act as critical receptors in inducing the mRNA expression of ISG15 and ISG43 genes. However, we did not detect any induced mRNA expression of IFNalpha and IFNbeta, suggesting that transcriptional activations of ISG15 and ISG43 were mediated through IFN-independent signaling pathway in the poly I:C treated PK15 cells. Association analyses in a Landrace pig population revealed that ISG15 c.347T>C (BstUI) polymorphism and the ISG43 c.953T>G (BccI) polymorphism were significantly associated with hematological parameters and immune-related traits.
[Structure and evolution of the eukaryotic FANCJ-like proteins].

PubMed

Wuhe, Jike; Zefeng, Wu; Sanhong, Fan; Xuguang, Xi

2015-02-01

The FANCJ-like protein family is a class of ATP-dependent helicases that can catalytically unwind duplex DNA along the 5'-3' direction. It is involved in the processes of DNA damage repair, homologous recombination and G-quadruplex DNA unwinding, and plays a critical role in maintaining genome integrity. In this study, we systemically analyzed FNACJ-like proteins from 47 eukaryotic species and discussed their sequences diversity, origin and evolution, motif organization patterns and spatial structure differences. Four members of FNACJ-like proteins, including XPD, CHL1, RTEL1 and FANCJ, were found in eukaryotes, but some of them were seriously deficient in most fungi and some insects. For example, the Zygomycota fungi lost RTEL1, Basidiomycota and Ascomycota fungi lost RTEL1 and FANCJ, and Diptera insect lost FANCJ. FANCJ-like proteins contain canonical motor domains HD1 and HD2, and the HD1 domain further integrates with three unique domains Fe-S, Arch and Extra-D. Fe-S and Arch domains are relatively conservative in all members of the family, but the Extra-D domain is lost in XPD and differs from one another in rest members. There are 7, 10 and 2 specific motifs found from the three unique domains respectively, while 5 and 12 specific motifs are found from HD1 and HD2 domains except the conserved motifs reported previously. By analyzing the arrangement pattern of these specific motifs, we found that RTEL1 and FANCJ are more closer and share two specific motifs Vb2 and Vc in HD2 domain, which are likely related with their G-quadruplex DNA unwinding activity. The evidence of evolution showed that FACNJ-like proteins were originated from a helicase, which has a HD1 domain inserted by extra Fe-S domain and Arch domain. By three continuous gene duplication events and followed specialization, eukaryotes finally possessed the current four members of FANCJ-like proteins.
In silico characterization and analysis of RTBP1 and NgTRF1 protein through MD simulation and molecular docking - A comparative study.

PubMed

Mukherjee, Koel; Pandey, Dev Mani; Vidyarthi, Ambarish Saran

2015-02-06

Gaining access to sequence and structure information of telomere binding proteins helps in understanding the essential biological processes involve in conserved sequence specific interaction between DNA and the proteins. Rice telomere binding protein (RTBP1) and Nicotiana glutinosa telomere repeat binding factor (NgTRF1) are helix turn helix motif type of proteins that plays role in telomeric DNA protection and length regulation. Both the proteins share same type of domain but till now there is very less communication on the in silico studies of these complete proteins.Here we intend to do a comparative study between two proteins through modeling of the complete proteins, physiochemical characterization, MD simulation and DNA-protein docking. I-TASSER and CLC protein work bench was performed to find out the protein 3D structure as well as the different parameters to characterize the proteins. MD simulation was completed by GROMOS forcefield of GROMACS for 10 ns of time stretch. The simulated 3D structures were docked with template DNA (3D DNA modeled through 3D-DART) of TTTAGGG conserved sequence motif using HADDOCK web server.Digging up all the facts about the proteins it was reveled that around 120 amino acids in the tail part was showing a good sequence similarity between the proteins. Molecular modeling, sequence characterization and secondary structure prediction also indicates the similarity between the protein's structure and sequence. The result of MD simulation highlights on the RMSD, RMSF, Rg, PCA and Energy plots which also conveys the similar type of motional behavior between them. The best complex formation for both the proteins in docking result also indicates for the first interaction site which is mainly the helix3 region of the DNA binding domain. The overall computational analysis reveals that RTBP1 and NgTRF1 proteins display good amount of similarity in their physicochemical properties, structure, dynamics and binding mode.
In Silico Characterization and Analysis of RTBP1 and NgTRF1 Protein Through MD Simulation and Molecular Docking: A Comparative Study.

PubMed

Mukherjee, Koel; Pandey, Dev Mani; Vidyarthi, Ambarish Saran

2015-09-01

Gaining access to sequence and structure information of telomere-binding proteins helps in understanding the essential biological processes involve in conserved sequence-specific interaction between DNA and the proteins. Rice telomere-binding protein (RTBP1) and Nicotiana glutinosa telomere repeat binding factor (NgTRF1) are helix-turn-helix motif type of proteins that plays role in telomeric DNA protection and length regulation. Both the proteins share same type of domain, but till now there is very less communication on the in silico studies of these complete proteins. Here we intend to do a comparative study between two proteins through modeling of the complete proteins, physiochemical characterization, MD simulation and DNA-protein docking. I-TASSER and CLC protein work bench was performed to find out the protein 3D structure as well as the different parameters to characterize the proteins. MD simulation was completed by GROMOS forcefield of GROMACS for 10 ns of time stretch. The simulated 3D structures were docked with template DNA (3D DNA modeled through 3D-DART) of TTTAGGG conserved sequence motif using HADDOCK Web server. By digging up all the facts about the proteins, it was revealed that around 120 amino acids in the tail part were showing a good sequence similarity between the proteins. Molecular modeling, sequence characterization and secondary structure prediction also indicate the similarity between the protein's structure and sequence. The result of MD simulation highlights on the RMSD, RMSF, Rg, PCA and energy plots which also conveys the similar type of motional behavior between them. The best complex formation for both the proteins in docking result also indicates for the first interaction site which is mainly the helix3 region of the DNA-binding domain. The overall computational analysis reveals that RTBP1 and NgTRF1 proteins display good amount of similarity in their physicochemical properties, structure, dynamics and binding mode.
Extensive T-Cell Epitope Repertoire Sharing among Human Proteome, Gastrointestinal Microbiome, and Pathogenic Bacteria: Implications for the Definition of Self

PubMed Central

Bremel, Robert D.; Homan, E. Jane

2015-01-01

T-cell receptor binding to MHC-bound peptides plays a key role in discrimination between self and non-self. Only a subset, typically a pentamer, of amino acids in a MHC-bound peptide form the motif exposed to the T-cell receptor. We categorize and compare the T-cell exposed amino acid motif repertoire of the total proteomes of two groups of bacteria, comprising pathogens and gastrointestinal microbiome organisms, with the human proteome and immunoglobulins. Given the maximum 205, or 3.2 million of such motifs that bind T-cell receptors, there is considerable overlap in motif usage. We show that the human proteome, exclusive of immunoglobulins, only comprises three quarters of the possible motifs, of which 65.3% are also present in both composite bacterial proteomes. Very few motifs are unique to the human proteome. Immunoglobulin variable regions carry a broad diversity of T-cell exposed motifs (TCEMs) that provides a stratified random sample of the motifs found in pathogens, microbiome, and the human proteome. Individual bacterial genera and species vary in the content of immunoglobulin and human proteome matched motifs that they carry. Mycobacteria and Burkholderia spp carry a particularly high content of such matched motifs. Some bacteria retain a unique motif signature and motif sharing pattern with the human proteome. The implication is that distinguishing self from non-self does not depend on individual TCEMs, but on a complex and dynamic overlay of signals wherein the same TCEM may play different roles in different organisms, and the frequency with which a particular TCEM appears influences its effect. The patterns observed provide clues to bacterial immune evasion and to strategies for intervention, including vaccine design. The breadth and distinct frequency patterns of the immunoglobulin-derived peptides suggest a role of immunoglobulins in maintaining a broadly responsive T-cell repertoire. PMID:26557118
Bacterial RecA Protein Promotes Adenoviral Recombination during In Vitro Infection

PubMed Central

Lee, Jeong Yoon; Lee, Ji Sun; Materne, Emma C.; Rajala, Rahul; Ismail, Ashrafali M.; Seto, Donald; Dyer, David W.

2018-01-01

ABSTRACT Adenovirus infections in humans are common and sometimes lethal. Adenovirus-derived vectors are also commonly chosen for gene therapy in human clinical trials. We have shown in previous work that homologous recombination between adenoviral genomes of human adenovirus species D (HAdV-D), the largest and fastest growing HAdV species, is responsible for the rapid evolution of this species. Because adenovirus infection initiates in mucosal epithelia, particularly at the gastrointestinal, respiratory, genitourinary, and ocular surfaces, we sought to determine a possible role for mucosal microbiota in adenovirus genome diversity. By analysis of known recombination hot spots across 38 human adenovirus genomes in species D (HAdV-D), we identified nucleotide sequence motifs similar to bacterial Chi sequences, which facilitate homologous recombination in the presence of bacterial Rec enzymes. These motifs, referred to here as ChiAD, were identified immediately 5′ to the sequence encoding penton base hypervariable loop 2, which expresses the arginine-glycine-aspartate moiety critical to adenoviral cellular entry. Coinfection with two HAdV-Ds in the presence of an Escherichia coli lysate increased recombination; this was blocked in a RecA mutant strain, E. coli DH5α, or upon RecA depletion. Recombination increased in the presence of E. coli lysate despite a general reduction in viral replication. RecA colocalized with viral DNA in HAdV-D-infected cell nuclei and was shown to bind specifically to ChiAD sequences. These results indicate that adenoviruses may repurpose bacterial recombination machinery, a sharing of evolutionary mechanisms across a diverse microbiota, and unique example of viral commensalism. IMPORTANCE Adenoviruses are common human mucosal pathogens of the gastrointestinal, respiratory, and genitourinary tracts and ocular surface. Here, we report finding Chi-like sequences in adenovirus recombination hot spots. Adenovirus coinfection in the presence of bacterial RecA protein facilitated homologous recombination between viruses. Genetic recombination led to evolution of an important external feature on the adenoviral capsid, namely, the penton base protein hypervariable loop 2, which contains the arginine-glycine-aspartic acid motif critical to viral internalization. We speculate that free Rec proteins present in gastrointestinal secretions upon bacterial cell death facilitate the evolution of human adenoviruses through homologous recombination, an example of viral commensalism and the complexity of virus-host interactions, including regional microbiota. PMID:29925671

QuadBase2: web server for multiplexed guanine quadruplex mining and visualization

PubMed Central

Dhapola, Parashar; Chowdhury, Shantanu

2016-01-01

DNA guanine quadruplexes or G4s are non-canonical DNA secondary structures which affect genomic processes like replication, transcription and recombination. G4s are computationally identified by specific nucleotide motifs which are also called putative G4 (PG4) motifs. Despite the general relevance of these structures, there is currently no tool available that can allow batch queries and genome-wide analysis of these motifs in a user-friendly interface. QuadBase2 (quadbase.igib.res.in) presents a completely reinvented web server version of previously published QuadBase database. QuadBase2 enables users to mine PG4 motifs in up to 178 eukaryotes through the EuQuad module. This module interfaces with Ensembl Compara database, to allow users mine PG4 motifs in the orthologues of genes of interest across eukaryotes. PG4 motifs can be mined across genes and their promoter sequences in 1719 prokaryotes through ProQuad module. This module includes a feature that allows genome-wide mining of PG4 motifs and their visualization as circular histograms. TetraplexFinder, the module for mining PG4 motifs in user-provided sequences is now capable of handling up to 20 MB of data. QuadBase2 is a comprehensive PG4 motif mining tool that further expands the configurations and algorithms for mining PG4 motifs in a user-friendly way. PMID:27185890
Role of sequence encoded κB DNA geometry in gene regulation by Dorsal

PubMed Central

Mrinal, Nirotpal; Tomar, Archana; Nagaraju, Javaregowda

2011-01-01

Many proteins of the Rel family can act as both transcriptional activators and repressors. However, mechanism that discerns the ‘activator/repressor’ functions of Rel-proteins such as Dorsal (Drosophila homologue of mammalian NFκB) is not understood. Using genomic, biophysical and biochemical approaches, we demonstrate that the underlying principle of this functional specificity lies in the ‘sequence-encoded structure’ of the κB-DNA. We show that Dorsal-binding motifs exist in distinct activator and repressor conformations. Molecular dynamics of DNA-Dorsal complexes revealed that repressor κB-motifs typically have A-tract and flexible conformation that facilitates interaction with co-repressors. Deformable structure of repressor motifs, is due to changes in the hydrogen bonding in A:T pair in the ‘A-tract’ core. The sixth nucleotide in the nonameric κB-motif, ‘A’ (A6) in the repressor motifs and ‘T’ (T6) in the activator motifs, is critical to confer this functional specificity as A6 → T6 mutation transformed flexible repressor conformation into a rigid activator conformation. These results highlight that ‘sequence encoded κB DNA-geometry’ regulates gene expression by exerting allosteric effect on binding of Rel proteins which in turn regulates interaction with co-regulators. Further, we identified and characterized putative repressor motifs in Dl-target genes, which can potentially aid in functional annotation of Dorsal gene regulatory network. PMID:21890896
An analysis of the positional distribution of DNA motifs in promoter regions and its biological relevance.

PubMed

Casimiro, Ana C; Vinga, Susana; Freitas, Ana T; Oliveira, Arlindo L

2008-02-07

Motif finding algorithms have developed in their ability to use computationally efficient methods to detect patterns in biological sequences. However the posterior classification of the output still suffers from some limitations, which makes it difficult to assess the biological significance of the motifs found. Previous work has highlighted the existence of positional bias of motifs in the DNA sequences, which might indicate not only that the pattern is important, but also provide hints of the positions where these patterns occur preferentially. We propose to integrate position uniformity tests and over-representation tests to improve the accuracy of the classification of motifs. Using artificial data, we have compared three different statistical tests (Chi-Square, Kolmogorov-Smirnov and a Chi-Square bootstrap) to assess whether a given motif occurs uniformly in the promoter region of a gene. Using the test that performed better in this dataset, we proceeded to study the positional distribution of several well known cis-regulatory elements, in the promoter sequences of different organisms (S. cerevisiae, H. sapiens, D. melanogaster, E. coli and several Dicotyledons plants). The results show that position conservation is relevant for the transcriptional machinery. We conclude that many biologically relevant motifs appear heterogeneously distributed in the promoter region of genes, and therefore, that non-uniformity is a good indicator of biological relevance and can be used to complement over-representation tests commonly used. In this article we present the results obtained for the S. cerevisiae data sets.
Efficient Identification of Murine M2 Macrophage Peptide Targeting Ligands by Phage Display and Next-Generation Sequencing.

PubMed

Liu, Gary W; Livesay, Brynn R; Kacherovsky, Nataly A; Cieslewicz, Maryelise; Lutz, Emi; Waalkes, Adam; Jensen, Michael C; Salipante, Stephen J; Pun, Suzie H

2015-08-19

Peptide ligands are used to increase the specificity of drug carriers to their target cells and to facilitate intracellular delivery. One method to identify such peptide ligands, phage display, enables high-throughput screening of peptide libraries for ligands binding to therapeutic targets of interest. However, conventional methods for identifying target binders in a library by Sanger sequencing are low-throughput, labor-intensive, and provide a limited perspective (<0.01%) of the complete sequence space. Moreover, the small sample space can be dominated by nonspecific, preferentially amplifying "parasitic sequences" and plastic-binding sequences, which may lead to the identification of false positives or exclude the identification of target-binding sequences. To overcome these challenges, we employed next-generation Illumina sequencing to couple high-throughput screening and high-throughput sequencing, enabling more comprehensive access to the phage display library sequence space. In this work, we define the hallmarks of binding sequences in next-generation sequencing data, and develop a method that identifies several target-binding phage clones for murine, alternatively activated M2 macrophages with a high (100%) success rate: sequences and binding motifs were reproducibly present across biological replicates; binding motifs were identified across multiple unique sequences; and an unselected, amplified library accurately filtered out parasitic sequences. In addition, we validate the Multiple Em for Motif Elicitation tool as an efficient and principled means of discovering binding sequences.
Targeting MED1 LxxLL Motifs for Tissue-Selective Treatment of Human Breast Cancer

DTIC Science & Technology

2013-09-01

colleagues have successfully conjugated malachite green aptamer to RNA nanoparticles characterized by a 3WJ pRNA motif. The in vitro experiment indi- cated...DNA/RNA sequence FIGURE 19.5 Diagram of RNA nanoparticle harboring malachite green aptamer, survivin siRNA and folate-DNA/RNA sequence for targeting...of RNA Aptamer to RNA Nanoparticles (Figure 19.5; Shu et al. 2011). The sequence for the malachite green aptamer nanoparticle was rationally designed
Targeting MED1 LxxLL Motifs for Tissue-Selective Treatment of Human Breast Cancer

DTIC Science & Technology

2014-09-01

his colleagues have successfully conjugated malachite green aptamer to RNA nanoparticles characterized by a 3WJ pRNA motif. The in vitro experiment...Folate-DNA/RNA sequence FIGURE 19.5 Diagram of RNA nanoparticle harboring malachite green aptamer, survivin siRNA and folate-DNA/RNA sequence for...405Conjugation of RNA Aptamer to RNA Nanoparticles (Figure 19.5; Shu et al. 2011). The sequence for the malachite green aptamer nanoparticle was rationally
DEEP MOTIF DASHBOARD: VISUALIZING AND UNDERSTANDING GENOMIC SEQUENCES USING DEEP NEURAL NETWORKS.

PubMed

Lanchantin, Jack; Singh, Ritambhara; Wang, Beilun; Qi, Yanjun

2017-01-01

Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models: convolutional, recurrent, and convolutional-recurrent networks. Our first visualization method is finding a test sequence's saliency map which uses first-order derivatives to describe the importance of each nucleotide in making the final prediction. Second, considering recurrent models make predictions in a temporal manner (from one end of a TFBS sequence to the other), we introduce temporal output scores, indicating the prediction score of a model over time for a sequential input. Lastly, a class-specific visualization strategy finds the optimal input sequence for a given TFBS positive class via stochastic gradient optimization. Our experimental results indicate that a convolutional-recurrent architecture performs the best among the three architectures. The visualization techniques indicate that CNN-RNN makes predictions by modeling both motifs as well as dependencies among them.
In-silico mining, type and frequency analysis of genic microsatellites of finger millet (Eleusine coracana (L.) Gaertn.): a comparative genomic analysis of NBS-LRR regions of finger millet with rice.

PubMed

Kalyana Babu, B; Pandey, Dinesh; Agrawal, P K; Sood, Salej; Kumar, Anil

2014-05-01

In recent years, the increased availability of the DNA sequences has given the possibility to develop and explore the expressed sequence tags (ESTs) derived SSR markers. In the present study, a total of 1956 ESTs of finger millet were used to find the microsatellite type, distribution, frequency and developed a total of 545 primer pairs from the ESTs of finger millet. Thirty-two EST sequences had more than two microsatellites and 1357 sequences did not have any SSR repeats. The most frequent type of repeats was trimeric motif, however the second place was occupied by dimeric motif followed by tetra-, hexa- and penta repeat motifs. The most common dimer repeat motif was GA and in case of trimeric SSRs, it was CGG. The EST sequences of NBS-LRR region of finger millet and rice showed higher synteny and were found on nearly same positions on the rice chromosome map. A total of eight, out of 15 EST based SSR primers were polymorphic among the selected resistant and susceptible finger millet genotypes. The primer FMBLEST5 could able to differentiate them into resistant and susceptible genotypes. The alleles specific to the resistant and susceptible genotypes were sequenced using the ABI 3130XL genetic analyzer and found similarity to NBS-LRR regions of rice and finger millet and contained the characteristic kinase-2 and kinase 3a motifs of plant R-genes belonged to NBS-LRR region. The In-silico and comparative analysis showed that the genes responsible for blast resistance can be identified, mapped and further introgressed through molecular breeding approaches for enhancing the blast resistance in finger millet.
DNA motif alignment by evolving a population of Markov chains.

PubMed

Bi, Chengpeng

2009-01-30

Deciphering cis-regulatory elements or de novo motif-finding in genomes still remains elusive although much algorithmic effort has been expended. The Markov chain Monte Carlo (MCMC) method such as Gibbs motif samplers has been widely employed to solve the de novo motif-finding problem through sequence local alignment. Nonetheless, the MCMC-based motif samplers still suffer from local maxima like EM. Therefore, as a prerequisite for finding good local alignments, these motif algorithms are often independently run a multitude of times, but without information exchange between different chains. Hence it would be worth a new algorithm design enabling such information exchange. This paper presents a novel motif-finding algorithm by evolving a population of Markov chains with information exchange (PMC), each of which is initialized as a random alignment and run by the Metropolis-Hastings sampler (MHS). It is progressively updated through a series of local alignments stochastically sampled. Explicitly, the PMC motif algorithm performs stochastic sampling as specified by a population-based proposal distribution rather than individual ones, and adaptively evolves the population as a whole towards a global maximum. The alignment information exchange is accomplished by taking advantage of the pooled motif site distributions. A distinct method for running multiple independent Markov chains (IMC) without information exchange, or dubbed as the IMC motif algorithm, is also devised to compare with its PMC counterpart. Experimental studies demonstrate that the performance could be improved if pooled information were used to run a population of motif samplers. The new PMC algorithm was able to improve the convergence and outperformed other popular algorithms tested using simulated and biological motif sequences.
Sites of instability in the human TCF3 (E2A) gene adopt G-quadruplex DNA structures in vitro

PubMed Central

Williams, Jonathan D.; Fleetwood, Sara; Berroyer, Alexandra; Kim, Nayun; Larson, Erik D.

2015-01-01

The formation of highly stable four-stranded DNA, called G-quadruplex (G4), promotes site-specific genome instability. G4 DNA structures fold from repetitive guanine sequences, and increasing experimental evidence connects G4 sequence motifs with specific gene rearrangements. The human transcription factor 3 (TCF3) gene (also termed E2A) is subject to genetic instability associated with severe disease, most notably a common translocation event t(1;19) associated with acute lymphoblastic leukemia. The sites of instability in TCF3 are not randomly distributed, but focused to certain sequences. We asked if G4 DNA formation could explain why TCF3 is prone to recombination and mutagenesis. Here we demonstrate that sequences surrounding the major t(1;19) break site and a region associated with copy number variations both contain G4 sequence motifs. The motifs identified readily adopt G4 DNA structures that are stable enough to interfere with DNA synthesis in physiological salt conditions in vitro. When introduced into the yeast genome, TCF3 G4 motifs promoted gross chromosomal rearrangements in a transcription-dependent manner. Our results provide a molecular rationale for the site-specific instability of human TCF3, suggesting that G4 DNA structures contribute to oncogenic DNA breaks and recombination. PMID:26029241
Structural details (kinks and non-α conformations) in transmembrane helices are intrahelically determined and can be predicted by sequence pattern descriptors

PubMed Central

Rigoutsos, Isidore; Riek, Peter; Graham, Robert M.; Novotny, Jiri

2003-01-01

One of the promising methods of protein structure prediction involves the use of amino acid sequence-derived patterns. Here we report on the creation of non-degenerate motif descriptors derived through data mining of training sets of residues taken from the transmembrane-spanning segments of polytopic proteins. These residues correspond to short regions in which there is a deviation from the regular α-helical character (i.e. π-helices, 310-helices and kinks). A ‘search engine’ derived from these motif descriptors correctly identifies, and discriminates amongst instances of the above ‘non-canonical’ helical motifs contained in the SwissProt/TrEMBL database of protein primary structures. Our results suggest that deviations from α-helicity are encoded locally in sequence patterns only about 7–9 residues long and can be determined in silico directly from the amino acid sequence. Delineation of such variations in helical habit is critical to understanding the complex structure–function relationships of polytopic proteins and for drug discovery. The success of our current methodology foretells development of similar prediction tools capable of identifying other structural motifs from sequence alone. The method described here has been implemented and is available on the World Wide Web at http://cbcsrv.watson.ibm.com/Ttkw.html. PMID:12888523
Structural details (kinks and non-alpha conformations) in transmembrane helices are intrahelically determined and can be predicted by sequence pattern descriptors.

PubMed

Rigoutsos, Isidore; Riek, Peter; Graham, Robert M; Novotny, Jiri

2003-08-01

One of the promising methods of protein structure prediction involves the use of amino acid sequence-derived patterns. Here we report on the creation of non-degenerate motif descriptors derived through data mining of training sets of residues taken from the transmembrane-spanning segments of polytopic proteins. These residues correspond to short regions in which there is a deviation from the regular alpha-helical character (i.e. pi-helices, 3(10)-helices and kinks). A 'search engine' derived from these motif descriptors correctly identifies, and discriminates amongst instances of the above 'non-canonical' helical motifs contained in the SwissProt/TrEMBL database of protein primary structures. Our results suggest that deviations from alpha-helicity are encoded locally in sequence patterns only about 7-9 residues long and can be determined in silico directly from the amino acid sequence. Delineation of such variations in helical habit is critical to understanding the complex structure-function relationships of polytopic proteins and for drug discovery. The success of our current methodology foretells development of similar prediction tools capable of identifying other structural motifs from sequence alone. The method described here has been implemented and is available on the World Wide Web at http://cbcsrv.watson.ibm.com/Ttkw.html.
A survey of motif finding Web tools for detecting binding site motifs in ChIP-Seq data

PubMed Central

2014-01-01

Abstract ChIP-Seq (chromatin immunoprecipitation sequencing) has provided the advantage for finding motifs as ChIP-Seq experiments narrow down the motif finding to binding site locations. Recent motif finding tools facilitate the motif detection by providing user-friendly Web interface. In this work, we reviewed nine motif finding Web tools that are capable for detecting binding site motifs in ChIP-Seq data. We showed each motif finding Web tool has its own advantages for detecting motifs that other tools may not discover. We recommended the users to use multiple motif finding Web tools that implement different algorithms for obtaining significant motifs, overlapping resemble motifs, and non-overlapping motifs. Finally, we provided our suggestions for future development of motif finding Web tool that better assists researchers for finding motifs in ChIP-Seq data. Reviewers This article was reviewed by Prof. Sandor Pongor, Dr. Yuriy Gusev, and Dr. Shyam Prabhakar (nominated by Prof. Limsoon Wong). PMID:24555784
Learning cellular sorting pathways using protein interactions and sequence motifs.

PubMed

Lin, Tien-Ho; Bar-Joseph, Ziv; Murphy, Robert F

2011-11-01

Proper subcellular localization is critical for proteins to perform their roles in cellular functions. Proteins are transported by different cellular sorting pathways, some of which take a protein through several intermediate locations until reaching its final destination. The pathway a protein is transported through is determined by carrier proteins that bind to specific sequence motifs. In this article, we present a new method that integrates protein interaction and sequence motif data to model how proteins are sorted through these sorting pathways. We use a hidden Markov model (HMM) to represent protein sorting pathways. The model is able to determine intermediate sorting states and to assign carrier proteins and motifs to the sorting pathways. In simulation studies, we show that the method can accurately recover an underlying sorting model. Using data for yeast, we show that our model leads to accurate prediction of subcellular localization. We also show that the pathways learned by our model recover many known sorting pathways and correctly assign proteins to the path they utilize. The learned model identified new pathways and their putative carriers and motifs and these may represent novel protein sorting mechanisms. Supplementary results and software implementation are available from http://murphylab.web.cmu.edu/software/2010_RECOMB_pathways/.
ELM server: a new resource for investigating short functional sites in modular eukaryotic proteins

PubMed Central

Puntervoll, Pål; Linding, Rune; Gemünd, Christine; Chabanis-Davidson, Sophie; Mattingsdal, Morten; Cameron, Scott; Martin, David M. A.; Ausiello, Gabriele; Brannetti, Barbara; Costantini, Anna; Ferrè, Fabrizio; Maselli, Vincenza; Via, Allegra; Cesareni, Gianni; Diella, Francesca; Superti-Furga, Giulio; Wyrwicz, Lucjan; Ramu, Chenna; McGuigan, Caroline; Gudavalli, Rambabu; Letunic, Ivica; Bork, Peer; Rychlewski, Leszek; Küster, Bernhard; Helmer-Citterich, Manuela; Hunter, William N.; Aasland, Rein; Gibson, Toby J.

2003-01-01

Multidomain proteins predominate in eukaryotic proteomes. Individual functions assigned to different sequence segments combine to create a complex function for the whole protein. While on-line resources are available for revealing globular domains in sequences, there has hitherto been no comprehensive collection of small functional sites/motifs comparable to the globular domain resources, yet these are as important for the function of multidomain proteins. Short linear peptide motifs are used for cell compartment targeting, protein–protein interaction, regulation by phosphorylation, acetylation, glycosylation and a host of other post-translational modifications. ELM, the Eukaryotic Linear Motif server at http://elm.eu.org/, is a new bioinformatics resource for investigating candidate short non-globular functional motifs in eukaryotic proteins, aiming to fill the void in bioinformatics tools. Sequence comparisons with short motifs are difficult to evaluate because the usual significance assessments are inappropriate. Therefore the server is implemented with several logical filters to eliminate false positives. Current filters are for cell compartment, globular domain clash and taxonomic range. In favourable cases, the filters can reduce the number of retained matches by an order of magnitude or more. PMID:12824381
Identification of the WBSCR9 gene, encoding a novel transcriptional regulator, in the Williams-Beuren syndrome deletion at 7q11.23.

PubMed

Peoples, R J; Cisco, M J; Kaplan, P; Francke, U

1998-01-01

We have identified a novel gene (WBSCR9) within the common Williams-Beuren syndrome (WBS) deletion by interspecies sequence conservation. The WBSCR9 gene encodes a roughly 7-kb transcript with an open reading frame of 1483 amino acids and a predicted protein product size of 170.8 kDa. WBSCR9 is comprised of at least 20 exons extending over 60 kb. The transcript is expressed ubiquitously throughout development and is subject to alternative splicing. Functional motifs identified by sequence homology searches include a bromodomain; a PHD, or C4HC3, finger; several putative nuclear localization signals; four nuclear receptor binding motifs; a polyglutamate stretch and two PEST sequences. Bromodomains, PHD motifs and nuclear receptor binding motifs are cardinal features of proteins that are involved in chromatin remodeling and modulation of transcription. Haploinsufficiency for WBSCR9 gene products may contribute to the complex phenotype of WBS by interacting with tissue-specific regulatory factors during development.
A +1 ribosomal frameshifting motif prevalent among plant amalgaviruses.

PubMed

Nibert, Max L; Pyle, Jesse D; Firth, Andrew E

2016-11-01

Sequence accessions attributable to novel plant amalgaviruses have been found in the Transcriptome Shotgun Assembly database. Sixteen accessions, derived from 12 different plant species, appear to encompass the complete protein-coding regions of the proposed amalgaviruses, which would substantially expand the size of genus Amalgavirus from 4 current species. Other findings include evidence for UUU_CGN as a +1 ribosomal frameshifting motif prevalent among plant amalgaviruses; for a variant version of this motif found thus far in only two amalgaviruses from solanaceous plants; for a region of α-helical coiled coil propensity conserved in a central region of the ORF1 translation product of plant amalgaviruses; and for conserved sequences in a C-terminal region of the ORF2 translation product (RNA-dependent RNA polymerase) of plant amalgaviruses, seemingly beyond the region of conserved polymerase motifs. These results additionally illustrate the value of mining the TSA database and others for novel viral sequences for comparative analyses. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Using Maximum Entropy to Find Patterns in Genomes

NASA Astrophysics Data System (ADS)

Liu, Sophia; Hockenberry, Adam; Lancichinetti, Andrea; Jewett, Michael; Amaral, Luis

The existence of over- and under-represented sequence motifs in genomes provides evidence of selective evolutionary pressures on biological mechanisms such as transcription, translation, ligand-substrate binding, and host immunity. To accurately identify motifs and other genome-scale patterns of interest, it is essential to be able to generate accurate null models that are appropriate for the sequences under study. There are currently no tools available that allow users to create random coding sequences with specified amino acid composition and GC content. Using the principle of maximum entropy, we developed a method that generates unbiased random sequences with pre-specified amino acid and GC content. Our method is the simplest way to obtain maximally unbiased random sequences that are subject to GC usage and primary amino acid sequence constraints. This approach can also be easily be expanded to create unbiased random sequences that incorporate more complicated constraints such as individual nucleotide usage or even di-nucleotide frequencies. The ability to generate correctly specified null models will allow researchers to accurately identify sequence motifs which will lead to a better understanding of biological processes. National Institute of General Medical Science, Northwestern University Presidential Fellowship, National Science Foundation, David and Lucile Packard Foundation, Camille Dreyfus Teacher Scholar Award.
Using RSAT oligo-analysis and dyad-analysis tools to discover regulatory signals in nucleic sequences.

PubMed

Defrance, Matthieu; Janky, Rekin's; Sand, Olivier; van Helden, Jacques

2008-01-01

This protocol explains how to discover functional signals in genomic sequences by detecting over- or under-represented oligonucleotides (words) or spaced pairs thereof (dyads) with the Regulatory Sequence Analysis Tools (http://rsat.ulb.ac.be/rsat/). Two typical applications are presented: (i) predicting transcription factor-binding motifs in promoters of coregulated genes and (ii) discovering phylogenetic footprints in promoters of orthologous genes. The steps of this protocol include purging genomic sequences to discard redundant fragments, discovering over-represented patterns and assembling them to obtain degenerate motifs, scanning sequences and drawing feature maps. The main strength of the method is its statistical ground: the binomial significance provides an efficient control on the rate of false positives. In contrast with optimization-based pattern discovery algorithms, the method supports the detection of under- as well as over-represented motifs. Computation times vary from seconds (gene clusters) to minutes (whole genomes). The execution of the whole protocol should take approximately 1 h.
A low-temperature-responsive element involved in the regulation of the Arabidopsis thaliana At1g71850/At1g71860 divergent gene pair.

PubMed

Liu, Shijuan; Chen, Huiqing; Li, Xiulan; Zhang, Wei

2016-08-01

The bidirectional promoter of the Arabidopsis thaliana gene pair At1g71850/At1g71860 harbors low-temperature-responsive elements, which participate in anti-correlated transcription regulation of the driving genes in response to environmental low temperature. A divergent gene pair is defined as two adjacent genes organized head to head in opposite orientation, sharing a common promoter region. Divergent gene pairs are mainly coexpressed, but some display opposite regulation. The mechanistic basis of such anti-correlated regulation is not well understood. Here, the regulation of the Arabidopsis thaliana gene pair At1g71850/At1g71860 was investigated. Semi-quantitative RT-PCR and Genevestigator analyses showed that while one of the pair was upregulated by exposure to low temperature, the same treatment downregulated the other. Promoter::GUS fusion transgenes were used to show that this behavior was driven by a bidirectional promoter, which harbored an as-1 motif, associated with the low-temperature response; mutation of this sequence produced a significant decrease in cold-responsive expression. With regard to the as-1 motif in the native orientation repressing the promoter's low-temperature responsiveness, the same as-1 motif introduced in the reverse direction showed a slight enhancement in the promoter's responsiveness to low-temperature exposure, indicating that the orientation of the motif was important for the promoter's activity. These findings provide new insights into the complex transcriptional regulation of bidirectional gene pairs as well as plant stress response.

Genome analyses of the sunflower pathogen Plasmopara halstedii provide insights into effector evolution in downy mildews and Phytophthora.

PubMed

Sharma, Rahul; Xia, Xiaojuan; Cano, Liliana M; Evangelisti, Edouard; Kemen, Eric; Judelson, Howard; Oome, Stan; Sambles, Christine; van den Hoogen, D Johan; Kitner, Miloslav; Klein, Joël; Meijer, Harold J G; Spring, Otmar; Win, Joe; Zipper, Reinhard; Bode, Helge B; Govers, Francine; Kamoun, Sophien; Schornack, Sebastian; Studholme, David J; Van den Ackerveken, Guido; Thines, Marco

2015-10-05

Downy mildews are the most speciose group of oomycetes and affect crops of great economic importance. So far, there is only a single deeply-sequenced downy mildew genome available, from Hyaloperonospora arabidopsidis. Further genomic resources for downy mildews are required to study their evolution, including pathogenicity effector proteins, such as RxLR effectors. Plasmopara halstedii is a devastating pathogen of sunflower and a potential pathosystem model to study downy mildews, as several Avr-genes and R-genes have been predicted and unlike Arabidopsis downy mildew, large quantities of almost contamination-free material can be obtained easily. Here a high-quality draft genome of Plasmopara halstedii is reported and analysed with respect to various aspects, including genome organisation, secondary metabolism, effector proteins and comparative genomics with other sequenced oomycetes. Interestingly, the present analyses revealed further variation of the RxLR motif, suggesting an important role of the conservation of the dEER-motif. Orthology analyses revealed the conservation of 28 RxLR-like core effectors among Phytophthora species. Only six putative RxLR-like effectors were shared by the two sequenced downy mildews, highlighting the fast and largely independent evolution of two of the three major downy mildew lineages. This is seemingly supported by phylogenomic results, in which downy mildews did not appear to be monophyletic. The genome resource will be useful for developing markers for monitoring the pathogen population and might provide the basis for new approaches to fight Phytophthora and downy mildew pathogens by targeting core pathogenicity effectors.
A Three-Dimensional RNA Motif in Potato spindle tuber viroid Mediates Trafficking from Palisade Mesophyll to Spongy Mesophyll in Nicotiana benthamiana[W

PubMed Central

Takeda, Ryuta; Petrov, Anton I.; Leontis, Neocles B.; Ding, Biao

2011-01-01

Cell-to-cell trafficking of RNA is an emerging biological principle that integrates systemic gene regulation, viral infection, antiviral response, and cell-to-cell communication. A key mechanistic question is how an RNA is specifically selected for trafficking from one type of cell into another type. Here, we report the identification of an RNA motif in Potato spindle tuber viroid (PSTVd) required for trafficking from palisade mesophyll to spongy mesophyll in Nicotiana benthamiana leaves. This motif, called loop 6, has the sequence 5′-CGA-3′...5′-GAC-3′ flanked on both sides by cis Watson-Crick G/C and G/U wobble base pairs. We present a three-dimensional (3D) structural model of loop 6 that specifies all non-Watson-Crick base pair interactions, derived by isostericity-based sequence comparisons with 3D RNA motifs from the RNA x-ray crystal structure database. The model is supported by available chemical modification patterns, natural sequence conservation/variations in PSTVd isolates and related species, and functional characterization of all possible mutants for each of the loop 6 base pairs. Our findings and approaches have broad implications for studying the 3D RNA structural motifs mediating trafficking of diverse RNA species across specific cellular boundaries and for studying the structure-function relationships of RNA motifs in other biological processes. PMID:21258006
A three-dimensional RNA motif in Potato spindle tuber viroid mediates trafficking from palisade mesophyll to spongy mesophyll in Nicotiana benthamiana.

PubMed

Takeda, Ryuta; Petrov, Anton I; Leontis, Neocles B; Ding, Biao

2011-01-01

Cell-to-cell trafficking of RNA is an emerging biological principle that integrates systemic gene regulation, viral infection, antiviral response, and cell-to-cell communication. A key mechanistic question is how an RNA is specifically selected for trafficking from one type of cell into another type. Here, we report the identification of an RNA motif in Potato spindle tuber viroid (PSTVd) required for trafficking from palisade mesophyll to spongy mesophyll in Nicotiana benthamiana leaves. This motif, called loop 6, has the sequence 5'-CGA-3'...5'-GAC-3' flanked on both sides by cis Watson-Crick G/C and G/U wobble base pairs. We present a three-dimensional (3D) structural model of loop 6 that specifies all non-Watson-Crick base pair interactions, derived by isostericity-based sequence comparisons with 3D RNA motifs from the RNA x-ray crystal structure database. The model is supported by available chemical modification patterns, natural sequence conservation/variations in PSTVd isolates and related species, and functional characterization of all possible mutants for each of the loop 6 base pairs. Our findings and approaches have broad implications for studying the 3D RNA structural motifs mediating trafficking of diverse RNA species across specific cellular boundaries and for studying the structure-function relationships of RNA motifs in other biological processes.
Binding properties of SUMO-interacting motifs (SIMs) in yeast.

PubMed

Jardin, Christophe; Horn, Anselm H C; Sticht, Heinrich

2015-03-01

Small ubiquitin-like modifier (SUMO) conjugation and interaction play an essential role in many cellular processes. A large number of yeast proteins is known to interact non-covalently with SUMO via short SUMO-interacting motifs (SIMs), but the structural details of this interaction are yet poorly characterized. In the present work, sequence analysis of a large dataset of 148 yeast SIMs revealed the existence of a hydrophobic core binding motif and a preference for acidic residues either within or adjacent to the core motif. Thus the sequence properties of yeast SIMs are highly similar to those described for human. Molecular dynamics simulations were performed to investigate the binding preferences for four representative SIM peptides differing in the number and distribution of acidic residues. Furthermore, the relative stability of two previously observed alternative binding orientations (parallel, antiparallel) was assessed. For all SIMs investigated, the antiparallel binding mode remained stable in the simulations and the SIMs were tightly bound via their hydrophobic core residues supplemented by polar interactions of the acidic residues. In contrary, the stability of the parallel binding mode is more dependent on the sequence features of the SIM motif like the number and position of acidic residues or the presence of additional adjacent interaction motifs. This information should be helpful to enhance the prediction of SIMs and their binding properties in different organisms to facilitate the reconstruction of the SUMO interactome.
Distance-dependent duplex DNA destabilization proximal to G-quadruplex/i-motif sequences

PubMed Central

König, Sebastian L. B.; Huppert, Julian L.; Sigel, Roland K. O.; Evans, Amanda C.

2013-01-01

G-quadruplexes and i-motifs are complementary examples of non-canonical nucleic acid substructure conformations. G-quadruplex thermodynamic stability has been extensively studied for a variety of base sequences, but the degree of duplex destabilization that adjacent quadruplex structure formation can cause has yet to be fully addressed. Stable in vivo formation of these alternative nucleic acid structures is likely to be highly dependent on whether sufficient spacing exists between neighbouring duplex- and quadruplex-/i-motif-forming regions to accommodate quadruplexes or i-motifs without disrupting duplex stability. Prediction of putative G-quadruplex-forming regions is likely to be assisted by further understanding of what distance (number of base pairs) is required for duplexes to remain stable as quadruplexes or i-motifs form. Using oligonucleotide constructs derived from precedented G-quadruplexes and i-motif-forming bcl-2 P1 promoter region, initial biophysical stability studies indicate that the formation of G-quadruplex and i-motif conformations do destabilize proximal duplex regions. The undermining effect that quadruplex formation can have on duplex stability is mitigated with increased distance from the duplex region: a spacing of five base pairs or more is sufficient to maintain duplex stability proximal to predicted quadruplex/i-motif-forming regions. PMID:23771141
Motif mismatches in microsatellites: insights from genome-wide investigation among 20 insect species.

PubMed

Behura, Susanta K; Severson, David W

2015-02-01

We present a detailed genome-wide comparative study of motif mismatches of microsatellites among 20 insect species representing five taxonomic orders. The results show that varying proportions (∼15-46%) of microsatellites identified in these species are imperfect in motif structure, and that they also vary in chromosomal distribution within genomes. It was observed that the genomic abundance of imperfect repeats is significantly associated with the length and number of motif mismatches of microsatellites. Furthermore, microsatellites with a higher number of mismatches tend to have lower abundance in the genome, suggesting that sequence heterogeneity of repeat motifs is a key determinant of genomic abundance of microsatellites. This relationship seems to be a general feature of microsatellites even in unrelated species such as yeast, roundworm, mouse and human. We provide a mechanistic explanation of the evolutionary link between motif heterogeneity and genomic abundance of microsatellites by examining the patterns of motif mismatches and allele sequences of single-nucleotide polymorphisms identified within microsatellite loci. Using Drosophila Reference Genetic Panel data, we further show that pattern of allelic variation modulates motif heterogeneity of microsatellites, and provide estimates of allele age of specific imperfect microsatellites found within protein-coding genes. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Cloning and molecular characterization of the betaine aldehyde dehydrogenase involved in the biosynthesis of glycine betaine in white shrimp (Litopenaeus vannamei).

PubMed

Delgado-Gaytán, María F; Rosas-Rodríguez, Jesús A; Yepiz-Plascencia, Gloria; Figueroa-Soto, Ciria G; Valenzuela-Soto, Elisa M

2017-10-01

The enzyme betaine aldehyde dehydrogenase (BADH) catalyzes the irreversible oxidation of betaine aldehyde to glycine betaine (GB), a very efficient osmolyte accumulated during osmotic stress. In this study, we determined the nucleotide sequence of the cDNA for the BADH from the white shrimp Litopenaeus vannamei (LvBADH). The cDNA was 1882 bp long, with a complete open reading frame of 1524 bp, encoding 507 amino acids with a predicted molecular mass of 54.15 kDa and a pI of 5.4. The predicted LvBADH amino acid sequence shares a high degree of identity with marine invertebrate BADHs. Catalytic residues (C-298, E-264 and N-167) and the decapeptide VTLELGGKSP involved in nucleotide binding and highly conserved in BADHs were identified in the amino acid sequence. Phylogenetic analyses classified LvBADH in a clade that includes ALDH9 sequences from marine invertebrates. Molecular modeling of LvBADH revealed that the protein has amino acid residues and sequence motifs essential for the function of the ALDH9 family of enzymes. LvBADH modeling showed three potential monovalent cation binding sites, one site is located in an intra-subunit cavity; other in an inter-subunit cavity and a third in a central-cavity of the protein. The results show that LvBADH shares a high degree of identity with BADH sequences from marine invertebrates and enzymes that belong to the ALDH9 family. Our findings suggest that the LvBADH has molecular mechanisms of regulation similar to those of other BADHs belonging to the ALDH9 family, and that BADH might be playing a role in the osmoregulation capacity of L. vannamei. Copyright © 2017 Elsevier B.V. All rights reserved.
Evolutionary dynamics of a conserved sequence motif in the ribosomal genes of the ciliate Paramecium

PubMed Central

2010-01-01

Background In protozoa, the identification of preserved motifs by comparative genomics is often impeded by difficulties to generate reliable alignments for non-coding sequences. Moreover, the evolutionary dynamics of regulatory elements in 3' untranslated regions (both in protozoa and metazoa) remains a virtually unexplored issue. Results By screening Paramecium tetraurelia's 3' untranslated regions for 8-mers that were previously found to be preserved in mammalian 3' UTRs, we detect and characterize a motif that is distinctly conserved in the ribosomal genes of this ciliate. The motif appears to be conserved across Paramecium aurelia species but is absent from the ribosomal genes of four additional non-Paramecium species surveyed, including another ciliate, Tetrahymena thermophila. Motif-free ribosomal genes retain fewer paralogs in the genome and appear to be lost more rapidly relative to motif-containing genes. Features associated with the discovered preserved motif are consistent with this 8-mer playing a role in post-transcriptional regulation. Conclusions Our observations 1) shed light on the evolution of a putative regulatory motif across large phylogenetic distances; 2) are expected to facilitate the understanding of the modulation of ribosomal genes expression in Paramecium; and 3) reveal a largely unexplored--and presumably not restricted to Paramecium--association between the presence/absence of a DNA motif and the evolutionary fate of its host genes. PMID:20441586
The effect of orthology and coregulation on detecting regulatory motifs.

PubMed

Storms, Valerie; Claeys, Marleen; Sanchez, Aminael; De Moor, Bart; Verstuyf, Annemieke; Marchal, Kathleen

2010-02-03

Computational de novo discovery of transcription factor binding sites is still a challenging problem. The growing number of sequenced genomes allows integrating orthology evidence with coregulation information when searching for motifs. Moreover, the more advanced motif detection algorithms explicitly model the phylogenetic relatedness between the orthologous input sequences and thus should be well adapted towards using orthologous information. In this study, we evaluated the conditions under which complementing coregulation with orthologous information improves motif detection for the class of probabilistic motif detection algorithms with an explicit evolutionary model. We designed datasets (real and synthetic) covering different degrees of coregulation and orthologous information to test how well Phylogibbs and Phylogenetic sampler, as representatives of the motif detection algorithms with evolutionary model performed as compared to MEME, a more classical motif detection algorithm that treats orthologs independently. Under certain conditions detecting motifs in the combined coregulation-orthology space is indeed more efficient than using each space separately, but this is not always the case. Moreover, the difference in success rate between the advanced algorithms and MEME is still marginal. The success rate of motif detection depends on the complex interplay between the added information and the specificities of the applied algorithms. Insights in this relation provide information useful to both developers and users. All benchmark datasets are available at http://homes.esat.kuleuven.be/~kmarchal/Supplementary_Storms_Valerie_PlosONE.
The Effect of Orthology and Coregulation on Detecting Regulatory Motifs

PubMed Central

Storms, Valerie; Claeys, Marleen; Sanchez, Aminael; De Moor, Bart; Verstuyf, Annemieke; Marchal, Kathleen

2010-01-01

Background Computational de novo discovery of transcription factor binding sites is still a challenging problem. The growing number of sequenced genomes allows integrating orthology evidence with coregulation information when searching for motifs. Moreover, the more advanced motif detection algorithms explicitly model the phylogenetic relatedness between the orthologous input sequences and thus should be well adapted towards using orthologous information. In this study, we evaluated the conditions under which complementing coregulation with orthologous information improves motif detection for the class of probabilistic motif detection algorithms with an explicit evolutionary model. Methodology We designed datasets (real and synthetic) covering different degrees of coregulation and orthologous information to test how well Phylogibbs and Phylogenetic sampler, as representatives of the motif detection algorithms with evolutionary model performed as compared to MEME, a more classical motif detection algorithm that treats orthologs independently. Results and Conclusions Under certain conditions detecting motifs in the combined coregulation-orthology space is indeed more efficient than using each space separately, but this is not always the case. Moreover, the difference in success rate between the advanced algorithms and MEME is still marginal. The success rate of motif detection depends on the complex interplay between the added information and the specificities of the applied algorithms. Insights in this relation provide information useful to both developers and users. All benchmark datasets are available at http://homes.esat.kuleuven.be/~kmarchal/Supplementary_Storms_Valerie_PlosONE. PMID:20140085
A versatile palindromic amphipathic repeat coding sequence horizontally distributed among diverse bacterial and eucaryotic microbes

PubMed Central

2010-01-01

Background Intragenic tandem repeats occur throughout all domains of life and impart functional and structural variability to diverse translation products. Repeat proteins confer distinctive surface phenotypes to many unicellular organisms, including those with minimal genomes such as the wall-less bacterial monoderms, Mollicutes. One such repeat pattern in this clade is distributed in a manner suggesting its exchange by horizontal gene transfer (HGT). Expanding genome sequence databases reveal the pattern in a widening range of bacteria, and recently among eucaryotic microbes. We examined the genomic flux and consequences of the motif by determining its distribution, predicted structural features and association with membrane-targeted proteins. Results Using a refined hidden Markov model, we document a 25-residue protein sequence motif tandemly arrayed in variable-number repeats in ORFs lacking assigned functions. It appears sporadically in unicellular microbes from disparate bacterial and eucaryotic clades, representing diverse lifestyles and ecological niches that include host parasitic, marine and extreme environments. Tracts of the repeats predict a malleable configuration of recurring domains, with conserved hydrophobic residues forming an amphipathic secondary structure in which hydrophilic residues endow extensive sequence variation. Many ORFs with these domains also have membrane-targeting sequences that predict assorted topologies; others may comprise reservoirs of sequence variants. We demonstrate expressed variants among surface lipoproteins that distinguish closely related animal pathogens belonging to a subgroup of the Mollicutes. DNA sequences encoding the tandem domains display dyad symmetry. Moreover, in some taxa the domains occur in ORFs selectively associated with mobile elements. These features, a punctate phylogenetic distribution, and different patterns of dispersal in genomes of related taxa, suggest that the repeat may be disseminated by HGT and intra-genomic shuffling. Conclusions We describe novel features of PARCELs (Palindromic Amphipathic Repeat Coding ELements), a set of widely distributed repeat protein domains and coding sequences that were likely acquired through HGT by diverse unicellular microbes, further mobilized and diversified within genomes, and co-opted for expression in the membrane proteome of some taxa. Disseminated by multiple gene-centric vehicles, ORFs harboring these elements enhance accessory gene pools as part of the "mobilome" connecting genomes of various clades, in taxa sharing common niches. PMID:20626840
RSAT 2015: Regulatory Sequence Analysis Tools.

PubMed

Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

2015-07-01

RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
RNA Bricks—a database of RNA 3D motifs and their interactions

PubMed Central

Chojnowski, Grzegorz; Waleń, Tomasz; Bujnicki, Janusz M.

2014-01-01

The RNA Bricks database (http://iimcb.genesilico.pl/rnabricks), stores information about recurrent RNA 3D motifs and their interactions, found in experimentally determined RNA structures and in RNA–protein complexes. In contrast to other similar tools (RNA 3D Motif Atlas, RNA Frabase, Rloom) RNA motifs, i.e. ‘RNA bricks’ are presented in the molecular environment, in which they were determined, including RNA, protein, metal ions, water molecules and ligands. All nucleotide residues in RNA bricks are annotated with structural quality scores that describe real-space correlation coefficients with the electron density data (if available), backbone geometry and possible steric conflicts, which can be used to identify poorly modeled residues. The database is also equipped with an algorithm for 3D motif search and comparison. The algorithm compares spatial positions of backbone atoms of the user-provided query structure and of stored RNA motifs, without relying on sequence or secondary structure information. This enables the identification of local structural similarities among evolutionarily related and unrelated RNA molecules. Besides, the search utility enables searching ‘RNA bricks’ according to sequence similarity, and makes it possible to identify motifs with modified ribonucleotide residues at specific positions. PMID:24220091
Helix-packing motifs in membrane proteins.

PubMed

Walters, R F S; DeGrado, W F

2006-09-12

The fold of a helical membrane protein is largely determined by interactions between membrane-imbedded helices. To elucidate recurring helix-helix interaction motifs, we dissected the crystallographic structures of membrane proteins into a library of interacting helical pairs. The pairs were clustered according to their three-dimensional similarity (rmsd
GPUmotif: An Ultra-Fast and Energy-Efficient Motif Analysis Program Using Graphics Processing Units

PubMed Central

Zandevakili, Pooya; Hu, Ming; Qin, Zhaohui

2012-01-01

Computational detection of TF binding patterns has become an indispensable tool in functional genomics research. With the rapid advance of new sequencing technologies, large amounts of protein-DNA interaction data have been produced. Analyzing this data can provide substantial insight into the mechanisms of transcriptional regulation. However, the massive amount of sequence data presents daunting challenges. In our previous work, we have developed a novel algorithm called Hybrid Motif Sampler (HMS) that enables more scalable and accurate motif analysis. Despite much improvement, HMS is still time-consuming due to the requirement to calculate matching probabilities position-by-position. Using the NVIDIA CUDA toolkit, we developed a graphics processing unit (GPU)-accelerated motif analysis program named GPUmotif. We proposed a “fragmentation" technique to hide data transfer time between memories. Performance comparison studies showed that commonly-used model-based motif scan and de novo motif finding procedures such as HMS can be dramatically accelerated when running GPUmotif on NVIDIA graphics cards. As a result, energy consumption can also be greatly reduced when running motif analysis using GPUmotif. The GPUmotif program is freely available at http://sourceforge.net/projects/gpumotif/ PMID:22662128
GPUmotif: an ultra-fast and energy-efficient motif analysis program using graphics processing units.

PubMed

Zandevakili, Pooya; Hu, Ming; Qin, Zhaohui

2012-01-01

Computational detection of TF binding patterns has become an indispensable tool in functional genomics research. With the rapid advance of new sequencing technologies, large amounts of protein-DNA interaction data have been produced. Analyzing this data can provide substantial insight into the mechanisms of transcriptional regulation. However, the massive amount of sequence data presents daunting challenges. In our previous work, we have developed a novel algorithm called Hybrid Motif Sampler (HMS) that enables more scalable and accurate motif analysis. Despite much improvement, HMS is still time-consuming due to the requirement to calculate matching probabilities position-by-position. Using the NVIDIA CUDA toolkit, we developed a graphics processing unit (GPU)-accelerated motif analysis program named GPUmotif. We proposed a "fragmentation" technique to hide data transfer time between memories. Performance comparison studies showed that commonly-used model-based motif scan and de novo motif finding procedures such as HMS can be dramatically accelerated when running GPUmotif on NVIDIA graphics cards. As a result, energy consumption can also be greatly reduced when running motif analysis using GPUmotif. The GPUmotif program is freely available at http://sourceforge.net/projects/gpumotif/
Repeated functional convergent effects of NaV1.7 on acid insensitivity in hibernating mammals

PubMed Central

Liu, Zhen; Wang, Wei; Zhang, Tong-Zuo; Li, Gong-Hua; He, Kai; Huang, Jing-Fei; Jiang, Xue-Long; Murphy, Robert W.; Shi, Peng

2014-01-01

Hibernating mammals need to be insensitive to acid in order to cope with conditions of high CO2; however, the molecular basis of acid tolerance remains largely unknown. The African naked mole-rat (Heterocephalus glaber) and hibernating mammals share similar environments and physiological features. In the naked mole-rat, acid insensitivity has been shown to be conferred by the functional motif of the sodium ion channel NaV1.7. There is now an opportunity to evaluate acid insensitivity in other taxa. In this study, we tested for functional convergence of NaV1.7 in 71 species of mammals, including 22 species that hibernate. Our analyses revealed a functional convergence of amino acid sequences, which occurred at least six times independently in mammals that hibernate. Evolutionary analyses determined that the convergence results from both parallel and divergent evolution of residues in the functional motif. Our findings not only identify the functional molecules responsible for acid insensitivity in hibernating mammals, but also open new avenues to elucidate the molecular underpinnings of acid insensitivity in mammals. PMID:24352952
Repeated functional convergent effects of NaV1.7 on acid insensitivity in hibernating mammals.

PubMed

Liu, Zhen; Wang, Wei; Zhang, Tong-Zuo; Li, Gong-Hua; He, Kai; Huang, Jing-Fei; Jiang, Xue-Long; Murphy, Robert W; Shi, Peng

2014-02-07

Hibernating mammals need to be insensitive to acid in order to cope with conditions of high CO2; however, the molecular basis of acid tolerance remains largely unknown. The African naked mole-rat (Heterocephalus glaber) and hibernating mammals share similar environments and physiological features. In the naked mole-rat, acid insensitivity has been shown to be conferred by the functional motif of the sodium ion channel NaV1.7. There is now an opportunity to evaluate acid insensitivity in other taxa. In this study, we tested for functional convergence of NaV1.7 in 71 species of mammals, including 22 species that hibernate. Our analyses revealed a functional convergence of amino acid sequences, which occurred at least six times independently in mammals that hibernate. Evolutionary analyses determined that the convergence results from both parallel and divergent evolution of residues in the functional motif. Our findings not only identify the functional molecules responsible for acid insensitivity in hibernating mammals, but also open new avenues to elucidate the molecular underpinnings of acid insensitivity in mammals.
Changes in cell wall properties coincide with overexpression of extensin fusion proteins in suspension cultured tobacco cells.

PubMed

Tan, Li; Pu, Yunqiao; Pattathil, Sivakumar; Avci, Utku; Qian, Jin; Arter, Allison; Chen, Liwei; Hahn, Michael G; Ragauskas, Arthur J; Kieliszewski, Marcia J

2014-01-01

Extensins are one subfamily of the cell wall hydroxyproline-rich glycoproteins, containing characteristic SerHyp4 glycosylation motifs and intermolecular cross-linking motifs such as the TyrXaaTyr sequence. Extensins are believed to form a cross-linked network in the plant cell wall through the tyrosine-derivatives isodityrosine, pulcherosine, and di-isodityrosine. Overexpression of three synthetic genes encoding different elastin-arabinogalactan protein-extensin hybrids in tobacco suspension cultured cells yielded novel cross-linking glycoproteins that shared features of the extensins, arabinogalactan proteins and elastin. The cell wall properties of the three transgenic cell lines were all changed, but in different ways. One transgenic cell line showed decreased cellulose crystallinity and increased wall xyloglucan content; the second transgenic cell line contained dramatically increased hydration capacity and notably increased cell wall biomass, increased di-isodityrosine, and increased protein content; the third transgenic cell line displayed wall phenotypes similar to wild type cells, except changed xyloglucan epitope extractability. These data indicate that overexpression of modified extensins may be a route to engineer plants for bioenergy and biomaterial production.
Solution structure and base pair opening kinetics of the i-motif dimer of d(5mCCTTTACC): a noncanonical structure with possible roles in chromosome stability.

PubMed

Nonin, S; Phan, A T; Leroy, J L

1997-09-15

Repetitive cytosine-rich DNA sequences have been identified in telomeres and centromeres of eukaryotic chromosomes. These sequences play a role in maintaining chromosome stability during replication and may be involved in chromosome pairing during meiosis. The C-rich repeats can fold into an 'i-motif' structure, in which two parallel-stranded duplexes with hemiprotonated C.C+ pairs are intercalated. Previous NMR studies of naturally occurring repeats have produced poor NMR spectra. This led us to investigate oligonucleotides, based on natural sequences, to produce higher quality spectra and thus provide further information as to the structure and possible biological function of the i-motif. NMR spectroscopy has shown that d(5mCCTTTACC) forms an i-motif dimer of symmetry-related and intercalated folded strands. The high-definition structure is computed on the basis of the build-up rates of 29 intraresidue and 35 interresidue nuclear Overhauser effect (NOE) connectivities. The i-motif core includes intercalated interstrand C.C+ pairs stacked in the order 2*.8/1.7*/1*.7/2.8* (where one strand is distinguished by an asterisk and the numbers relate to the base positions within the repeat). The TTTA sequences form two loops which span the two wide grooves on opposite sides of the i-motif core; the i-motif core is extended at both ends by the stacking of A6 onto C2.C8+. The lifetimes of pairs C2.C8+ and 5mC1.C7+ are 1 ms and 1 s, respectively, at 15 degrees C. Anomalous exchange properties of the T3 imino proton indicate hydrogen bonding to A6 N7 via a water bridge. The d(5mCCTTTTCC) deoxyoligonucleotide, in which position 6 is occupied by a thymidine instead of an adenine, also forms a symmetric i-motif dimer. However, in this structure the two TTTT loops are located on the same side of the i-motif core and the C.C+ pairs are formed by equivalent cytidines stacked in the order 8*.8/1.1*/7*.7/2.2*. Oligodeoxynucleotides containing two C-rich repeats can fold and dimerize into an i-motif. The change of folding topology resulting from the substitution of a single nucleoside emphasizes the influence of the loop residues on the i-motif structure formed by two folded strands.

TrawlerWeb: an online de novo motif discovery tool for next-generation sequencing datasets.

PubMed

Dang, Louis T; Tondl, Markus; Chiu, Man Ho H; Revote, Jerico; Paten, Benedict; Tano, Vincent; Tokolyi, Alex; Besse, Florence; Quaife-Ryan, Greg; Cumming, Helen; Drvodelic, Mark J; Eichenlaub, Michael P; Hallab, Jeannette C; Stolper, Julian S; Rossello, Fernando J; Bogoyevitch, Marie A; Jans, David A; Nim, Hieu T; Porrello, Enzo R; Hudson, James E; Ramialison, Mirana

2018-04-05

A strong focus of the post-genomic era is mining of the non-coding regulatory genome in order to unravel the function of regulatory elements that coordinate gene expression (Nat 489:57-74, 2012; Nat 507:462-70, 2014; Nat 507:455-61, 2014; Nat 518:317-30, 2015). Whole-genome approaches based on next-generation sequencing (NGS) have provided insight into the genomic location of regulatory elements throughout different cell types, organs and organisms. These technologies are now widespread and commonly used in laboratories from various fields of research. This highlights the need for fast and user-friendly software tools dedicated to extracting cis-regulatory information contained in these regulatory regions; for instance transcription factor binding site (TFBS) composition. Ideally, such tools should not require prior programming knowledge to ensure they are accessible for all users. We present TrawlerWeb, a web-based version of the Trawler_standalone tool (Nat Methods 4:563-5, 2007; Nat Protoc 5:323-34, 2010), to allow for the identification of enriched motifs in DNA sequences obtained from next-generation sequencing experiments in order to predict their TFBS composition. TrawlerWeb is designed for online queries with standard options common to web-based motif discovery tools. In addition, TrawlerWeb provides three unique new features: 1) TrawlerWeb allows the input of BED files directly generated from NGS experiments, 2) it automatically generates an input-matched biologically relevant background, and 3) it displays resulting conservation scores for each instance of the motif found in the input sequences, which assists the researcher in prioritising the motifs to validate experimentally. Finally, to date, this web-based version of Trawler_standalone remains the fastest online de novo motif discovery tool compared to other popular web-based software, while generating predictions with high accuracy. TrawlerWeb provides users with a fast, simple and easy-to-use web interface for de novo motif discovery. This will assist in rapidly analysing NGS datasets that are now being routinely generated. TrawlerWeb is freely available and accessible at: http://trawler.erc.monash.edu.au .
The heptanucleotide motif GAGACGC is a key component of a cis-acting promoter element that is critical for SnSAG1 expression in Sarcocystis neurona.

PubMed

Gaji, Rajshekhar Y; Howe, Daniel K

2009-07-01

The apicomplexan parasite Sarcocystis neurona undergoes a complex process of intracellular development, during which many genes are temporally regulated. The described study was undertaken to begin identifying the basic promoter elements that control gene expression in S. neurona. Sequence analysis of the 5'-flanking region of five S. neurona genes revealed a conserved heptanucleotide motif GAGACGC that is similar to the WGAGACG motif described upstream of multiple genes in Toxoplasma gondii. The promoter region for the major surface antigen gene SnSAG1, which contains three heptanucleotide motifs within 135 bases of the transcription start site, was dissected by functional analysis using a dual luciferase reporter assay. These analyses revealed that a minimal promoter fragment containing all three motifs was sufficient to drive reporter molecule expression, with the presence and orientation of the 5'-most heptanucleotide motif being absolutely critical for promoter function. Further studies should help to identify additional sequence elements important for promoter function and for controlling gene expression during intracellular development by this apicomplexan pathogen.
Sperm flagella protein components: Human meichroacidin constructed by the membrane occupation and recognition nexus motif

PubMed Central

MATSUOKA, YASUHIRO; NISHIMURA, HIROMI; NUMAZAWA, KAHORI; TSUCHIDA, JUNJI; MIYAGAWA, YASUSHI; TSUJIMURA, AKIRA; MATSUMIYA, KIYOMI; OKUYAMA, AKIHIKO; NISHIMUNE, YOSHITAKE

2005-01-01

Background and Aims: In a previous study, the authors of the present study cloned mouse meichroacidin (MCA), which is expressed in stages of spermatogenesis from pachytene spermatocytes through round spermatid germ cells. MCA protein contains the membrane occupation and recognition nexus (MORN) motif and localizes to a male meiotic metaphase chromosome. Recently, a MCA homolog of carp (Cyprinus carpio), MORN motif‐containing sperm‐specific axonemal protein (MSAP), was reportedly identified and localized in sperm flagella. Present knowledge of human spermiogenesis requires the identification of proteins in human sperm. The present study identified the human orthologue of MCA. Methods: Colony hybridization using a human testis plasmid cDNA library was carried out to clone human MCA (h‐MCA) cDNA. Northern blot, Western blot, and immunohistochemical analyses were carried out. Results: h‐MCA was found to be specifically expressed in the testes. The h‐MCA amino acid sequence shared 79.8% identity with mouse MCA and contained MORN motifs. h‐MCA localized in the sperm flagellum and basal body, as does MSAP in carp. Conclusion: Expression and localization analyses showed that h‐MCA is a component of the sperm flagellum and basal body and might play an important role in the development of the sperm flagellum in humans. (Reprod Med Biol 2005; 4: 213–219) PMID:29699225
Functional analysis of the Arabidopsis PLDZ2 promoter reveals an evolutionarily conserved low-Pi-responsive transcriptional enhancer element

PubMed Central

Oropeza-Aburto, Araceli; Cruz-Ramírez, Alfredo; Acevedo-Hernández, Gustavo J.; Pérez-Torres, Claudia-Anahí; Caballero-Pérez, Juan; Herrera-Estrella, Luis

2012-01-01

Plants have evolved a plethora of responses to cope with phosphate (Pi) deficiency, including the transcriptional activation of a large set of genes. Among Pi-responsive genes, the expression of the Arabidopsis phospholipase DZ2 (PLDZ2) is activated to participate in the degradation of phospholipids in roots in order to release Pi to support other cellular activities. A deletion analysis was performed to identify the regions determining the strength, tissue-specific expression, and Pi responsiveness of this regulatory region. This study also reports the identification and characterization of a transcriptional enhancer element that is present in the PLDZ2 promoter and able to confer Pi responsiveness to a minimal, inactive 35S promoter. This enhancer also shares the cytokinin and sucrose responsive properties observed for the intact PLDZ2 promoter. The EZ2 element contains two P1BS motifs, each of which is the DNA binding site of transcription factor PHR1. Mutation analysis showed that the P1BS motifs present in EZ2 are necessary but not sufficient for the enhancer function, revealing the importance of adjacent sequences. The structural organization of EZ2 is conserved in the orthologous genes of at least eight families of rosids, suggesting that architectural features such as the distance between the two P1BS motifs are also important for the regulatory properties of this enhancer element. PMID:22210906
Identification of early zygotic genes in the yellow fever mosquito Aedes aegypti and discovery of a motif involved in early zygotic genome activation.

PubMed

Biedler, James K; Hu, Wanqi; Tae, Hongseok; Tu, Zhijian

2012-01-01

During early embryogenesis the zygotic genome is transcriptionally silent and all mRNAs present are of maternal origin. The maternal-zygotic transition marks the time over which embryogenesis changes its dependence from maternal RNAs to zygotically transcribed RNAs. Here we present the first systematic investigation of early zygotic genes (EZGs) in a mosquito species and focus on genes involved in the onset of transcription during 2-4 hr. We used transcriptome sequencing to identify the "pure" (without maternal expression) EZGs by analyzing transcripts from four embryonic time ranges of 0-2, 2-4, 4-8, and 8-12 hr, which includes the time of cellular blastoderm formation and up to the start of gastrulation. Blast of 16,789 annotated transcripts vs. the transcriptome reads revealed evidence for 63 (P<0.001) and 143 (P<0.05) nonmaternally derived transcripts having a significant increase in expression at 2-4 hr. One third of the 63 EZG transcripts do not have predicted introns compared to 10% of all Ae. aegypti genes. We have confirmed by RT-PCR that zygotic transcription starts as early as 2-3 hours. A degenerate motif VBRGGTA was found to be overrepresented in the upstream sequences of the identified EZGs using a motif identification software called SCOPE. We find evidence for homology between this motif and the TAGteam motif found in Drosophila that has been implicated in EZG activation. A 38 bp sequence in the proximal upstream sequence of a kinesin light chain EZG (KLC2.1) contains two copies of the mosquito motif. This sequence was shown to support EZG transcription by luciferase reporter assays performed on injected early embryos, and confers early zygotic activity to a heterologous promoter from a divergent mosquito species. The results of these studies are consistent with the model of early zygotic genome activation via transcriptional activators, similar to what has been found recently in Drosophila.
A R/K-rich motif in the C-terminal of the homeodomain is required for complete translocating of NKX2.5 protein into nucleus.

PubMed

Ouyang, Ping; Zhang, He; Fan, Zhaolan; Wei, Pei; Huang, Zhigang; Wang, Sen; Li, Tao

2016-11-05

NKX2.5 plays important roles in heart development. Being a transcription factor, NKX2.5 exerts its biological functions in nucleus. However, the sequence motif that localize NKX2.5 into nucleus is still not clear. Here, we found a R/K-rich sequence motif from Q187 to R197 (QNRRYKCKRQR) was required for exclusive nuclear localization of NKX2.5. Eight truncated plasmids (E109X, Q149X, Q170X, Q187X, Q198X, Y256X, Y259X, and C264X) which were associated with congenital heart disease (CHD) were constructed. Compared with the wild type NKX2.5, the proteins E109X, Q149X, Q170X, Q187X without intact homeodomain (HD) showed no transcriptional activity while Q198X, Y256X, Y259X and C264X with intact HD showed 50 to 66% transcriptional activity. E109X, Q149X, Q170X, Q187X without intact HD localized in the cytoplasm and nucleus simultaneously and Q198X, Y256X, Y259X and C264X with intact HD localized completely in nucleus. These results inferred the indispensability of 187QNRRYKCKRQR197 in exclusive nucleus localization. Additionally, this sequence motif was very conservative among human, mouse and rat, indicating this motif was important for NKX2.5 function. Thus, we concluded that R/K-rich sequence motif 187QNRRYKCKRQR197 played a central role for NKX2.5 nuclear localization. Our findings provided a clue to understand the mechanisms between the truncated NKX2.5 mutants and CHD. Copyright © 2016 Elsevier B.V. All rights reserved.
Genomic characterization and expression profiles upon bacterial infection of a novel cystatin B homologue from disk abalone (Haliotis discus discus).

PubMed

Premachandra, H K A; Wan, Qiang; Elvitigala, Don Anushka Sandaruwan; De Zoysa, Mahanama; Choi, Cheol Young; Whang, Ilson; Lee, Jehee

2012-12-01

Cystatins are a large family of cysteine proteinase inhibitors which are involved in diverse biological and pathological processes. In the present study, we identified a gene related to cystatin superfamily, AbCyt B, from disk abalone Haliotis discus discus by expressed sequence tag (EST) analysis and BAC library screening. The complete cDNA sequence of AbCyt B is comprised of 1967 nucleotides with a 306 bp open reading frame (ORF) encoding for 101 amino acids. The amino acid sequence consists of a single cystatin-like domain, which has a cysteine proteinase inhibitor signature, a conserved Gly in N-terminal region, QVVAG motif and a variant of PW motif. No signal peptide, disulfide bonds or carbohydrate side chains were identified. Analysis of deduced amino acid sequence revealed that AbCyt B shares up to 44.7% identity and 65.7% similarity with the cystatin B genes from other organisms. The genomic sequence of AbCyt B is approximately 8.4 Kb, consisting of three exons and two introns. Phylogenetic tree analysis showed that AbCyt B was closely related to the cystatin B from pacific oyster (Crassostrea gigas) under the family 1.Functional analysis of recombinant AbCyt B protein exhibited inhibitory activity against the papain, with almost 84% inhibition at a concentration of 3.5 μmol/L. In tissue expression analysis, AbCyt B transcripts were expressed abundantly in the hemocyte, gill, mantle, and digestive tract, while weakly in muscle, testis, and hepatopancreas. After the immune challenge with Vibrio parahemolyticus, the AbCyt B showed significant (P<0.05) up-regulation of relative mRNA expression in gill and hemocytes at 24 and 6 h of post infection, respectively. These results collectively suggest that AbCyst B is a potent inhibitor of cysteine proteinases and is also potentially involved in immune responses against invading bacterial pathogens in abalone. Copyright © 2012 Elsevier Ltd. All rights reserved.
High-resolution profiling of linear B-cell epitopes from mucin-associated surface proteins (MASPs) of Trypanosoma cruzi during human infections

PubMed Central

Durante, Ignacio M.; La Spina, Pablo E.; Carmona, Santiago J.; Agüero, Fernán

2017-01-01

Background The Trypanosoma cruzi genome bears a huge family of genes and pseudogenes coding for Mucin-Associated Surface Proteins (MASPs). MASP molecules display a ‘mosaic’ structure, with highly conserved flanking regions and a strikingly variable central and mature domain made up of different combinations of a large repertoire of short sequence motifs. MASP molecules are highly expressed in mammal-dwelling stages of T. cruzi and may be involved in parasite-host interactions and/or in diverting the immune response. Methods/Principle findings High-density microarrays composed of fully overlapped 15mer peptides spanning the entire sequences of 232 non-redundant MASPs (~25% of the total MASP content) were screened with chronic Chagasic sera. This strategy led to the identification of 86 antigenic motifs, each one likely representing a single linear B-cell epitope, which were mapped to 69 different MASPs. These motifs could be further grouped into 31 clusters of structurally- and likely antigenically-related sequences, and fully characterized. In contrast to previous reports, we show that MASP antigenic motifs are restricted to the central and mature region of MASP polypeptides, consistent with their intracellular processing. The antigenicity of these motifs displayed significant positive correlation with their genome dosage and their relative position within the MASP polypeptide. In addition, we verified the biased genetic co-occurrence of certain antigenic motifs within MASP polypeptides, compatible with proposed intra-family recombination events underlying the evolution of their coding genes. Sequences spanning 7 MASP antigenic motifs were further evaluated using distinct synthesis/display approaches and a large panel of serum samples. Overall, the serological recognition of MASP antigenic motifs exhibited a remarkable non normal distribution among the T. cruzi seropositive population, thus reducing their applicability in conventional serodiagnosis. As previously observed in in vitro and animal infection models, immune signatures supported the concurrent expression of several MASPs during human infection. Conclusions/Significance In spite of their conspicuous expression and potential roles in parasite biology, this study constitutes the first unbiased, high-resolution profiling of linear B-cell epitopes from T. cruzi MASPs during human infection. PMID:28961244
ACGT-containing abscisic acid response element (ABRE) and coupling element 3 (CE3) are functionally equivalent.

PubMed

Hobo, T; Asada, M; Kowyama, Y; Hattori, T

1999-09-01

ACGT-containing ABA response elements (ABREs) have been functionally identified in the promoters of various genes. In addition, single copies of ABRE have been found to require a cis-acting, coupling element to achieve ABA induction. A coupling element 3 (CE3) sequence, originally identified as such in the barley HVA1 promoter, is found approximately 30 bp downstream of motif A (ACGT-containing ABRE) in the promoter of the Osem gene. The relationship between these two elements was further defined by linker-scan analyses of a 55 bp fragment of the Osem promoter, which is sufficient for ABA-responsiveness and VP1 activation. The analyses revealed that both motif A and CE3 sequence were required not only for ABA-responsiveness but also for VP1 activation. Since the sequences of motif A and CE3 were found to be similar, motif-exchange experiments were carried out. The experiments demonstrated that motif A and CE3 were interchangeable by each other with respect to both ABA and VP1 regulation. In addition, both sequences were shown to be recognized by a VP1-interacting, ABA-responsive bZIP factor TRAB1. These results indicate that ACGT-containing ABREs and CE3 are functionally equivalent cis-acting elements. Furthermore, TRAB1 was shown to bind two other non-ACGT ABREs. Based on these results, all these ABREs including CE3 are proposed to be categorized into a single class of cis-acting elements.
A new subfamily LIP of the major intrinsic proteins.

PubMed

Khabudaev, Kirill Vladimirovich; Petrova, Darya Petrovna; Grachev, Mikhail Aleksandrovich; Likhoshway, Yelena Valentinovna

2014-03-04

Proteins of the major intrinsic protein (MIP) family, or aquaporins, have been detected in almost all organisms. These proteins are important in cells and organisms because they allow for passive transmembrane transport of water and other small, uncharged polar molecules. We compared the predicted amino acid sequences of 20 MIPs from several algae species of the phylum Heterokontophyta (Kingdom Chromista) with the sequences of MIPs from other organisms. Multiple sequence alignments revealed motifs that were homologous to functionally important NPA motifs and the so-called ar/R-selective filter of glyceroporins and aquaporins. The MIP sequences of the studied chromists fell into several clusters that belonged to different groups of MIPs from a wide variety of organisms from different Kingdoms. Two of these proteins belong to Plasma membrane intrinsic proteins (PIPs), four of them belong to GlpF-like intrinsic proteins (GIPs), and one of them belongs to a specific MIPE subfamily from green algae. Three proteins belong to the unclassified MIPs, two of which are of bacterial origin. Eight of the studied MIPs contain an NPM-motif in place of the second conserved NPA-motif typical of the majority of MIPs. The MIPs of heterokonts within all detected clusters can differ from other MIPs in the same cluster regarding the structure of the ar/R-selective filter and other generally conserved motifs. We proposed placing nine MIPs from heterokonts into a new group, which we have named the LIPs (large intrinsic proteins). The possible substrate specificities of the studied MIPs are discussed.
Local Renyi entropic profiles of DNA sequences.

PubMed

Vinga, Susana; Almeida, Jonas S

2007-10-16

In a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Rényi entropy of probability density estimation (pdf) using the Parzen's window method and applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM). Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation. This report extends the concepts of continuous entropy by defining DNA sequence entropic profiles using the new pdf estimations to refine the density estimation of motifs. The new methodology enables two results. On the one hand it shows that the entropic profiles are directly related with the statistical significance of motifs, allowing the study of under and over-representation of segments. On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region. The computational applications, developed in Matlab m-code, the corresponding binary executables and additional material and examples are made publicly available at http://kdbio.inesc-id.pt/~svinga/ep/. The ability to detect local conservation from a scale-independent representation of symbolic sequences is particularly relevant for biological applications where conserved motifs occur in multiple, overlapping scales, with significant future applications in the recognition of foreign genomic material and inference of motif structures.
Local Renyi entropic profiles of DNA sequences

PubMed Central

Vinga, Susana; Almeida, Jonas S

2007-01-01

Background In a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Rényi entropy of probability density estimation (pdf) using the Parzen's window method and applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM). Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation. This report extends the concepts of continuous entropy by defining DNA sequence entropic profiles using the new pdf estimations to refine the density estimation of motifs. Results The new methodology enables two results. On the one hand it shows that the entropic profiles are directly related with the statistical significance of motifs, allowing the study of under and over-representation of segments. On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region. The computational applications, developed in Matlab m-code, the corresponding binary executables and additional material and examples are made publicly available at . Conclusion The ability to detect local conservation from a scale-independent representation of symbolic sequences is particularly relevant for biological applications where conserved motifs occur in multiple, overlapping scales, with significant future applications in the recognition of foreign genomic material and inference of motif structures. PMID:17939871
Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep Neural Networks

PubMed Central

Lanchantin, Jack; Singh, Ritambhara; Wang, Beilun; Qi, Yanjun

2018-01-01

Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models: convolutional, recurrent, and convolutional-recurrent networks. Our first visualization method is finding a test sequence’s saliency map which uses first-order derivatives to describe the importance of each nucleotide in making the final prediction. Second, considering recurrent models make predictions in a temporal manner (from one end of a TFBS sequence to the other), we introduce temporal output scores, indicating the prediction score of a model over time for a sequential input. Lastly, a class-specific visualization strategy finds the optimal input sequence for a given TFBS positive class via stochastic gradient optimization. Our experimental results indicate that a convolutional-recurrent architecture performs the best among the three architectures. The visualization techniques indicate that CNN-RNN makes predictions by modeling both motifs as well as dependencies among them. PMID:27896980
Learning Cellular Sorting Pathways Using Protein Interactions and Sequence Motifs

PubMed Central

Lin, Tien-Ho; Bar-Joseph, Ziv

2011-01-01

Abstract Proper subcellular localization is critical for proteins to perform their roles in cellular functions. Proteins are transported by different cellular sorting pathways, some of which take a protein through several intermediate locations until reaching its final destination. The pathway a protein is transported through is determined by carrier proteins that bind to specific sequence motifs. In this article, we present a new method that integrates protein interaction and sequence motif data to model how proteins are sorted through these sorting pathways. We use a hidden Markov model (HMM) to represent protein sorting pathways. The model is able to determine intermediate sorting states and to assign carrier proteins and motifs to the sorting pathways. In simulation studies, we show that the method can accurately recover an underlying sorting model. Using data for yeast, we show that our model leads to accurate prediction of subcellular localization. We also show that the pathways learned by our model recover many known sorting pathways and correctly assign proteins to the path they utilize. The learned model identified new pathways and their putative carriers and motifs and these may represent novel protein sorting mechanisms. Supplementary results and software implementation are available from http://murphylab.web.cmu.edu/software/2010_RECOMB_pathways/. PMID:21999284
Effective Feature Selection for Classification of Promoter Sequences.

PubMed

K, Kouser; P G, Lavanya; Rangarajan, Lalitha; K, Acharya Kshitish

2016-01-01

Exploring novel computational methods in making sense of biological data has not only been a necessity, but also productive. A part of this trend is the search for more efficient in silico methods/tools for analysis of promoters, which are parts of DNA sequences that are involved in regulation of expression of genes into other functional molecules. Promoter regions vary greatly in their function based on the sequence of nucleotides and the arrangement of protein-binding short-regions called motifs. In fact, the regulatory nature of the promoters seems to be largely driven by the selective presence and/or the arrangement of these motifs. Here, we explore computational classification of promoter sequences based on the pattern of motif distributions, as such classification can pave a new way of functional analysis of promoters and to discover the functionally crucial motifs. We make use of Position Specific Motif Matrix (PSMM) features for exploring the possibility of accurately classifying promoter sequences using some of the popular classification techniques. The classification results on the complete feature set are low, perhaps due to the huge number of features. We propose two ways of reducing features. Our test results show improvement in the classification output after the reduction of features. The results also show that decision trees outperform SVM (Support Vector Machine), KNN (K Nearest Neighbor) and ensemble classifier LibD3C, particularly with reduced features. The proposed feature selection methods outperform some of the popular feature transformation methods such as PCA and SVD. Also, the methods proposed are as accurate as MRMR (feature selection method) but much faster than MRMR. Such methods could be useful to categorize new promoters and explore regulatory mechanisms of gene expressions in complex eukaryotic species.
Sequence Bundles: a novel method for visualising, discovering and exploring sequence motifs

PubMed Central

2014-01-01

Background We introduce Sequence Bundles--a novel data visualisation method for representing multiple sequence alignments (MSAs). We identify and address key limitations of the existing bioinformatics data visualisation methods (i.e. the Sequence Logo) by enabling Sequence Bundles to give salient visual expression to sequence motifs and other data features, which would otherwise remain hidden. Methods For the development of Sequence Bundles we employed research-led information design methodologies. Sequences are encoded as uninterrupted, semi-opaque lines plotted on a 2-dimensional reconfigurable grid. Each line represents a single sequence. The thickness and opacity of the stack at each residue in each position indicates the level of conservation and the lines' curved paths expose patterns in correlation and functionality. Several MSAs can be visualised in a composite image. The Sequence Bundles method is designed to favour a tangible, continuous and intuitive display of information. Results We have developed a software demonstration application for generating a Sequence Bundles visualisation of MSAs provided for the BioVis 2013 redesign contest. A subsequent exploration of the visualised line patterns allowed for the discovery of a number of interesting features in the dataset. Reported features include the extreme conservation of sequences displaying a specific residue and bifurcations of the consensus sequence. Conclusions Sequence Bundles is a novel method for visualisation of MSAs and the discovery of sequence motifs. It can aid in generating new insight and hypothesis making. Sequence Bundles is well disposed for future implementation as an interactive visual analytics software, which can complement existing visualisation tools. PMID:25237395
Transcription factor ThWRKY4 binds to a novel WLS motif and a RAV1A element in addition to the W-box to regulate gene expression.

PubMed

Xu, Hongyun; Shi, Xinxin; Wang, Zhibo; Gao, Caiqiu; Wang, Chao; Wang, Yucheng

2017-08-01

WRKY transcription factors play important roles in many biological processes, and mainly bind to the W-box element to regulate gene expression. Previously, we characterized a WRKY gene from Tamarix hispida, ThWRKY4, in response to abiotic stress, and showed that it bound to the W-box motif. However, whether ThWRKY4 could bind to other motifs remains unknown. In this study, we employed a Transcription Factor-Centered Yeast one Hybrid (TF-Centered Y1H) screen to study the motifs recognized by ThWRKY4. In addition to the W-box core cis-element (termed W-box), we identified that ThWRKY4 could bind to two other motifs: the RAV1A element (CAACA) and a novel motif with sequence of GTCTA (W-box like sequence, WLS). The distributions of these motifs were screened in the promoter regions of genes regulated by some WRKYs. The results showed that the W-box, RAV1A, and WLS motifs were all present in high numbers, suggesting that they play key roles in gene expression mediated by WRKYs. Furthermore, five WRKY proteins from different WRKY subfamilies in Arabidopsis thaliana were selected and confirmed to bind to the RAV1A and WLS motifs, indicating that they are recognized commonly by WRKYs. These findings will help to further reveal the functions of WRKY proteins. Copyright © 2017 Elsevier B.V. All rights reserved.
A naturally occurring, noncanonical GTP aptamer made of simple tandem repeats

PubMed Central

Curtis, Edward A; Liu, David R

2014-01-01

Recently, we used in vitro selection to identify a new class of naturally occurring GTP aptamer called the G motif. Here we report the discovery and characterization of a second class of naturally occurring GTP aptamer, the “CA motif.” The primary sequence of this aptamer is unusual in that it consists entirely of tandem repeats of CA-rich motifs as short as three nucleotides. Several active variants of the CA motif aptamer lack the ability to form consecutive Watson-Crick base pairs in any register, while others consist of repeats containing only cytidine and adenosine residues, indicating that noncanonical interactions play important roles in its structure. The circular dichroism spectrum of the CA motif aptamer is distinct from that of A-form RNA and other major classes of nucleic acid structures. Bioinformatic searches indicate that the CA motif is absent from most archaeal and bacterial genomes, but occurs in at least 70 percent of approximately 400 eukaryotic genomes examined. These searches also uncovered several phylogenetically conserved examples of the CA motif in rodent (mouse and rat) genomes. Together, these results reveal the existence of a second class of naturally occurring GTP aptamer whose sequence requirements, like that of the G motif, are not consistent with those of a canonical secondary structure. They also indicate a new and unexpected potential biochemical activity of certain naturally occurring tandem repeats. PMID:24824832
Finding the target sites of RNA-binding proteins

PubMed Central

Li, Xiao; Kazan, Hilal; Lipshitz, Howard D; Morris, Quaid D

2014-01-01

RNA–protein interactions differ from DNA–protein interactions because of the central role of RNA secondary structure. Some RNA-binding domains (RBDs) recognize their target sites mainly by their shape and geometry and others are sequence-specific but are sensitive to secondary structure context. A number of small- and large-scale experimental approaches have been developed to measure RNAs associated in vitro and in vivo with RNA-binding proteins (RBPs). Generalizing outside of the experimental conditions tested by these assays requires computational motif finding. Often RBP motif finding is done by adapting DNA motif finding methods; but modeling secondary structure context leads to better recovery of RBP-binding preferences. Genome-wide assessment of mRNA secondary structure has recently become possible, but these data must be combined with computational predictions of secondary structure before they add value in predicting in vivo binding. There are two main approaches to incorporating structural information into motif models: supplementing primary sequence motif models with preferred secondary structure contexts (e.g., MEMERIS and RNAcontext) and directly modeling secondary structure recognized by the RBP using stochastic context-free grammars (e.g., CMfinder and RNApromo). The former better reconstruct known binding preferences for sequence-specific RBPs but are not suitable for modeling RBPs that recognize shape and geometry of RNAs. Future work in RBP motif finding should incorporate interactions between multiple RBDs and multiple RBPs in binding to RNA. WIREs RNA 2014, 5:111–130. doi: 10.1002/wrna.1201 PMID:24217996
Selection of functional 2A sequences within foot-and-mouth disease virus; requirements for the NPGP motif with a distinct codon bias.

PubMed

Kjær, Jonas; Belsham, Graham J

2018-01-01

Foot-and-mouth disease virus (FMDV) has a positive-sense ssRNA genome including a single, large, open reading frame. Splitting of the encoded polyprotein at the 2A/2B junction is mediated by the 2A peptide (18 residues long), which induces a nonproteolytic, cotranslational "cleavage" at its own C terminus. A conserved feature among variants of 2A is the C-terminal motif N 16 P 17 G 18 /P 19 , where P 19 is the first residue of 2B. It has been shown previously that certain amino acid substitutions can be tolerated at residues E 14 , S 15 , and N 16 within the 2A sequence of infectious FMDVs, but no variants at residues P 17 , G 18 , or P 19 have been identified. In this study, using highly degenerate primers, we analyzed if any other residues can be present at each position of the NPG/P motif within infectious FMDV. No alternative forms of this motif were found to be encoded by rescued FMDVs after two, three, or four passages. However, surprisingly, a clear codon preference for the wt nucleotide sequence encoding the NPGP motif within these viruses was observed. Indeed, the codons selected to code for P 17 and P 19 within this motif were distinct; thus the synonymous codons are not equivalent. © 2018 Kjær and Belsham; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

Motif-based analysis of large nucleotide data sets using MEME-ChIP

PubMed Central

Ma, Wenxiu; Noble, William S; Bailey, Timothy L

2014-01-01

MEME-ChIP is a web-based tool for analyzing motifs in large DNA or RNA data sets. It can analyze peak regions identified by ChIP-seq, cross-linking sites identified by cLIP-seq and related assays, as well as sets of genomic regions selected using other criteria. MEME-ChIP performs de novo motif discovery, motif enrichment analysis, motif location analysis and motif clustering, providing a comprehensive picture of the DNA or RNA motifs that are enriched in the input sequences. MEME-ChIP performs two complementary types of de novo motif discovery: weight matrix–based discovery for high accuracy; and word-based discovery for high sensitivity. Motif enrichment analysis using DNA or RNA motifs from human, mouse, worm, fly and other model organisms provides even greater sensitivity. MEME-ChIP’s interactive HTML output groups and aligns significant motifs to ease interpretation. this protocol takes less than 3 h, and it provides motif discovery approaches that are distinct and complementary to other online methods. PMID:24853928
T-Reg Comparator: an analysis tool for the comparison of position weight matrices

PubMed Central

Roepcke, Stefan; Grossmann, Steffen; Rahmann, Sven; Vingron, Martin

2005-01-01

T-Reg Comparator is a novel software tool designed to support research into transcriptional regulation. Sequence motifs representing transcription factor binding sites are usually encoded as position weight matrices. The user inputs a set of such weight matrices or binding site sequences and our program matches them against the T-Reg database, which is presently built on data from the Transfac [E. Wingender (2004) In Silico Biol., 4, 55–61] and Jaspar [A. Sandelin, W. Alkema, P. Engstrom, W. W. Wasserman and B. Lenhard (2004) Nucleic Acids Res., 32, D91–D94]. Our tool delivers a detailed report on similarities between user-supplied motifs and motifs in the database. Apart from simple one-to-one relationships, T-Reg Comparator is also able to detect similarities between submatrices. In addition, we provide a user interface to a program for sequence scanning with weight matrices. Typical areas of application for T-Reg Comparator are motif and regulatory module finding and annotation of regulatory genomic regions. T-Reg Comparator is available at . PMID:15980506
T-Reg Comparator: an analysis tool for the comparison of position weight matrices.

PubMed

Roepcke, Stefan; Grossmann, Steffen; Rahmann, Sven; Vingron, Martin

2005-07-01

T-Reg Comparator is a novel software tool designed to support research into transcriptional regulation. Sequence motifs representing transcription factor binding sites are usually encoded as position weight matrices. The user inputs a set of such weight matrices or binding site sequences and our program matches them against the T-Reg database, which is presently built on data from the Transfac [E. Wingender (2004) In Silico Biol., 4, 55-61] and Jaspar [A. Sandelin, W. Alkema, P. Engstrom, W. W. Wasserman and B. Lenhard (2004) Nucleic Acids Res., 32, D91-D94]. Our tool delivers a detailed report on similarities between user-supplied motifs and motifs in the database. Apart from simple one-to-one relationships, T-Reg Comparator is also able to detect similarities between submatrices. In addition, we provide a user interface to a program for sequence scanning with weight matrices. Typical areas of application for T-Reg Comparator are motif and regulatory module finding and annotation of regulatory genomic regions. T-Reg Comparator is available at http://treg.molgen.mpg.de.
Analysis of alkaptonuria (AKU) mutations and polymorphisms reveals that the CCC sequence motif is a mutational hot spot in the homogentisate 1,2 dioxygenase gene (HGO).

PubMed Central

Beltrán-Valero de Bernabé, D; Jimenez, F J; Aquaron, R; Rodríguez de Córdoba, S

1999-01-01

We recently showed that alkaptonuria (AKU) is caused by loss-of-function mutations in the homogentisate 1,2 dioxygenase gene (HGO). Herein we describe haplotype and mutational analyses of HGO in seven new AKU pedigrees. These analyses identified two novel single-nucleotide polymorphisms (INV4+31A-->G and INV11+18A-->G) and six novel AKU mutations (INV1-1G-->A, W60G, Y62C, A122D, P230T, and D291E), which further illustrates the remarkable allelic heterogeneity found in AKU. Reexamination of all 29 mutations and polymorphisms thus far described in HGO shows that these nucleotide changes are not randomly distributed; the CCC sequence motif and its inverted complement, GGG, are preferentially mutated. These analyses also demonstrated that the nucleotide substitutions in HGO do not involve CpG dinucleotides, which illustrates important differences between HGO and other genes for the occurrence of mutation at specific short-sequence motifs. Because the CCC sequence motifs comprise a significant proportion (34.5%) of all mutated bases that have been observed in HGO, we conclude that the CCC triplet is a mutational hot spot in HGO. PMID:10205262
Single-molecule study of thymidine glycol and i-motif through the alpha-hemolysin ion channel

NASA Astrophysics Data System (ADS)

He, Lidong

Nanopore-based devices have emerged as a single-molecule detection and analysis tool for a wide range of applications. Through electrophoretically driving DNA molecules across a nanosized pore, a lot of information can be received, including unfolding kinetics and DNA-protein interactions. This single-molecule method has the potential to sequence kilobase length DNA polymers without amplification or labeling, approaching "the third generation" genome sequencing for around $1000 within 24 hours. alpha-Hemolysin biological nanopores have the advantages of excellent stability, low-noise level, and precise site-directed mutagenesis for engineering this protein nanopore. The first work presented in this thesis established the current signal of the thymidine glycol lesion in DNA oligomers through an immobilization experiment. The thymidine glycol enantiomers were differentiated from each other by different current blockage levels. Also, the effect of bulky hydrophobic adducts to the current blockage was investigated. Secondly, the alpha-hemolysin nanopore was used to study the human telomere i-motif and RET oncogene i-motif at a single-molecule level. In Chapter 3, it was demonstrated that the alpha-hemolysin nanopore can differentiate an i-motif form and single-strand DNA form at different pH values based on the same sequence. In addition, it shows potential to differentiate the folding topologies generated from the same DNA sequence.
Cytosine-phosphate-guanine oligodeoxynucleotides containing GACGTT motifs enhance the immune responses elicited by keyhole limpet hemocyanin antigen in dairy cattle.

PubMed

Chu, Chun-Yen; Lee, Shang-Chun; Liu, Shyh-Shyan; Lin, Yu-Ming; Shen, Perng-Chi; Yu, Chi; Lee, Kuo-Hua; Zhao, Xin; Lee, Jai-Wei

2011-10-01

Adjuvants are important components of vaccine formulations. Effective adjuvants line innate and adaptive immunity by signaling through pathogen recognition receptors. Synthetic cytosine-phosphate-guanine (CpG) oligodeoxynucleotides (ODNs) have been shown to have potentials as adjuvants for vaccines. However, the immunostimulatory effect of CpG is species-specific and depends on the sequence of CpG motifs. A CpG ODN (2135), containing 3 identical copies of GTCGTT motif, was previously reported to have the strongest effects on bovine peripheral blood mononuclear cells (PBMC). Based on the sequence of 2135, we replaced the GTCGTT motif with 11 other sequences containing CG and investigated their effects on bovine lymphocyte proliferation. Results showed that the CpG ODNs containing 3 copies of GACGTT motif had the highest lymphocyte stimulation index (7.91±1.18), which was significantly (P<0.05) higher than that of 2135 (4.25±0.56). The CpG ODNs containing 3 copies of GACGTT motif also significantly increased the mRNA expression of interferon (IFN)-α, interleukin (IL)-12, and IL-21 in bovine PBMC. When dairy cows were immunized with the keyhole limpet hemocyanin (KLH) antigen formulated with CpG ODNs containing 3 copies of GACGTT, production of KLH-specific antibodies in serum and in milk whey was significantly (P<0.05) enhanced. IFN-γ in whole blood stimulated by KLH was also significantly (P<0.05) increased in cows immunized with KLH plus CpG ODNs. Our results indicate that CpG ODNs containing 3 copies of the GACGTT motifs is a potential adjuvant for bovine vaccines.
DNA polymerase preference determines PCR priming efficiency.

PubMed

Pan, Wenjing; Byrne-Steele, Miranda; Wang, Chunlin; Lu, Stanley; Clemmons, Scott; Zahorchak, Robert J; Han, Jian

2014-01-30

Polymerase chain reaction (PCR) is one of the most important developments in modern biotechnology. However, PCR is known to introduce biases, especially during multiplex reactions. Recent studies have implicated the DNA polymerase as the primary source of bias, particularly initiation of polymerization on the template strand. In our study, amplification from a synthetic library containing a 12 nucleotide random portion was used to provide an in-depth characterization of DNA polymerase priming bias. The synthetic library was amplified with three commercially available DNA polymerases using an anchored primer with a random 3' hexamer end. After normalization, the next generation sequencing (NGS) results of the amplified libraries were directly compared to the unamplified synthetic library. Here, high throughput sequencing was used to systematically demonstrate and characterize DNA polymerase priming bias. We demonstrate that certain sequence motifs are preferred over others as primers where the six nucleotide sequences at the 3' end of the primer, as well as the sequences four base pairs downstream of the priming site, may influence priming efficiencies. DNA polymerases in the same family from two different commercial vendors prefer similar motifs, while another commercially available enzyme from a different DNA polymerase family prefers different motifs. Furthermore, the preferred priming motifs are GC-rich. The DNA polymerase preference for certain sequence motifs was verified by amplification from single-primer templates. We incorporated the observed DNA polymerase preference into a primer-design program that guides the placement of the primer to an optimal location on the template. DNA polymerase priming bias was characterized using a synthetic library amplification system and NGS. The characterization of DNA polymerase priming bias was then utilized to guide the primer-design process and demonstrate varying amplification efficiencies among three commercially available DNA polymerases. The results suggest that the interaction of the DNA polymerase with the primer:template junction during the initiation of DNA polymerization is very important in terms of overall amplification bias and has broader implications for both the primer design process and multiplex PCR.
Sequence basis of Barnacle Cement Nanostructure is Defined by Proteins with Silk Homology

NASA Astrophysics Data System (ADS)

So, Christopher R.; Fears, Kenan P.; Leary, Dagmar H.; Scancella, Jenifer M.; Wang, Zheng; Liu, Jinny L.; Orihuela, Beatriz; Rittschof, Dan; Spillmann, Christopher M.; Wahl, Kathryn J.

2016-11-01

Barnacles adhere by producing a mixture of cement proteins (CPs) that organize into a permanently bonded layer displayed as nanoscale fibers. These cement proteins share no homology with any other marine adhesives, and a common sequence-basis that defines how nanostructures function as adhesives remains undiscovered. Here we demonstrate that a significant unidentified portion of acorn barnacle cement is comprised of low complexity proteins; they are organized into repetitive sequence blocks and found to maintain homology to silk motifs. Proteomic analysis of aggregate bands from PAGE gels reveal an abundance of Gly/Ala/Ser/Thr repeats exemplified by a prominent, previously unidentified, 43 kDa protein in the solubilized adhesive. Low complexity regions found throughout the cement proteome, as well as multiple lysyl oxidases and peroxidases, establish homology with silk-associated materials such as fibroin, silk gum sericin, and pyriform spidroins from spider silk. Distinct primary structures defined by homologous domains shed light on how barnacles use low complexity in nanofibers to enable adhesion, and serves as a starting point for unraveling the molecular architecture of a robust and unique class of adhesive nanostructures.
Polypeptide p41 of a Norwalk-Like Virus Is a Nucleic Acid-Independent Nucleoside Triphosphatase

PubMed Central

Pfister, Thomas; Wimmer, Eckard

2001-01-01

Southampton virus (SHV) is a member of the Norwalk-like viruses (NLVs), one of four genera of the family Caliciviridae. The genome of SHV contains three open reading frames (ORFs). ORF 1 encodes a polyprotein that is autocatalytically processed into six proteins, one of which is p41. p41 shares sequence motifs with protein 2C of picornaviruses and superfamily 3 helicases. We have expressed p41 of SHV in bacteria. Purified p41 exhibited nucleoside triphosphate (NTP)-binding and NTP hydrolysis activities. The NTPase activity was not stimulated by single-stranded nucleic acids. SHV p41 had no detectable helicase activity. Protein sequence comparison between the consensus sequences of NLV p41 and enterovirus protein 2C revealed regions of high similarity. According to secondary structure prediction, the conserved regions were located within a putative central domain of alpha helices and beta strands. This study reveals for the first time an NTPase activity associated with a calicivirus-encoded protein. Based on enzymatic properties and sequence information, a functional relationship between NLV p41 and enterovirus 2C is discussed in regard to the role of 2C-like proteins in virus replication. PMID:11160659
The common equine class I molecule Eqca-1*00101 (ELA-A3.1) is characterized by narrow peptide binding and T cell epitope repertoires.

PubMed

Bergmann, Tobias; Moore, Carrie; Sidney, John; Miller, Donald; Tallmadge, Rebecca; Harman, Rebecca M; Oseroff, Carla; Wriston, Amanda; Shabanowitz, Jeffrey; Hunt, Donald F; Osterrieder, Nikolaus; Peters, Bjoern; Antczak, Douglas F; Sette, Alessandro

2015-11-01

Here we describe a detailed quantitative peptide-binding motif for the common equine leukocyte antigen (ELA) class I allele Eqca-1*00101, present in roughly 25 % of Thoroughbred horses. We determined a preliminary binding motif by sequencing endogenously bound ligands. Subsequently, a positional scanning combinatorial library (PSCL) was used to further characterize binding specificity and derive a quantitative motif involving aspartic acid in position 2 and hydrophobic residues at the C-terminus. Using this motif, we selected and tested 9- and 10-mer peptides derived from the equine herpesvirus type 1 (EHV-1) proteome for their capacity to bind Eqca-1*00101. PSCL predictions were very efficient, with an receiver operating characteristic (ROC) curve performance of 0.877, and 87 peptides derived from 40 different EHV-1 proteins were identified with affinities of 500 nM or higher. Quantitative analysis revealed that Eqca-1*00101 has a narrow peptide-binding repertoire, in comparison to those of most human, non-human primate, and mouse class I alleles. Peripheral blood mononuclear cells from six EHV-1-infected, or vaccinated but uninfected, Eqca-1*00101-positive horses were used in IFN-γ enzyme-linked immunospot (ELISPOT) assays. When we screened the 87 Eqca-1*00101-binding peptides for T cell reactivity, only one Eqca-1*00101 epitope, derived from the intermediate-early protein ICP4, was identified. Thus, despite its common occurrence in several horse breeds, Eqca-1*00101 is associated with a narrow binding repertoire and a similarly narrow T cell response to an important equine viral pathogen. Intriguingly, these features are shared with other human and macaque major histocompatibility complex (MHC) molecules with a similar specificity for D in position 2 or 3 in their main anchor motif.
The common equine class I molecule Eqca-1*00101 (ELA-A3.1) is characterized by narrow peptide binding and T cell epitope repertoires

PubMed Central

Bergmann, Tobias; Moore, Carrie; Sidney, John; Miller, Donald; Tallmadge, Rebecca; Harman, Rebecca M.; Oseroff, Carla; Wriston, Amanda; Shabanowitz, Jeffrey; Hunt, Donald F.; Osterrieder, Nikolaus; Peters, Bjoern; Antczak, Douglas F.; Sette, Alessandro

2016-01-01

Here we describe a detailed quantitative peptide-binding motif for the common equine leukocyte antigen (ELA) class I allele Eqca-1*00101, present in roughly 25 % of Thoroughbred horses. We determined a preliminary binding motif by sequencing endogenously bound ligands. Subsequently, a positional scanning combinatorial library (PSCL) was used to further characterize binding specificity and derive a quantitative motif involving aspartic acid in position 2 and hydrophobic residues at the C-terminus. Using this motif, we selected and tested 9- and 10-mer peptides derived from the equine herpesvirus type 1 (EHV-1) proteome for their capacity to bind Eqca-1*00101. PSCL predictions were very efficient, with an receiver operating characteristic (ROC) curve performance of 0.877, and 87 peptides derived from 40 different EHV-1 proteins were identified with affinities of 500 nM or higher. Quantitative analysis revealed that Eqca-1*00101 has a narrow peptide-binding repertoire, in comparison to those of most human, non-human primate, and mouse class I alleles. Peripheral blood mononuclear cells from six EHV-1-infected, or vaccinated but uninfected, Eqca-1*00101-positive horses were used in IFN-γ enzyme-linked immunospot (ELISPOT) assays. When we screened the 87 Eqca-1*00101-binding peptides for T cell reactivity, only one Eqca-1*00101 epitope, derived from the intermediate-early protein ICP4, was identified. Thus, despite its common occurrence in several horse breeds, Eqca-1*00101 is associated with a narrow binding repertoire and a similarly narrow T cell response to an important equine viral pathogen. Intriguingly, these features are shared with other human and macaque major histocompatibility complex (MHC) molecules with a similar specificity for D in position 2 or 3 in their main anchor motif. PMID:26399241
Structure of Rot, a global regulator of virulence genes in Staphylococcus aureus.

PubMed

Zhu, Yuwei; Fan, Xiaojiao; Zhang, Xu; Jiang, Xuguang; Niu, Liwen; Teng, Maikun; Li, Xu

2014-09-01

Staphylococcus aureus is a highly versatile pathogen that can infect human tissue by producing a large arsenal of virulence factors that are tightly regulated by a complex regulatory network. Rot, which shares sequence similarity with SarA homologues, is a global regulator that regulates numerous virulence genes. However, the recognition model of Rot for the promoter region of target genes and the putative regulation mechanism remain elusive. In this study, the 1.77 Å resolution X-ray crystal structure of Rot is reported. The structure reveals that two Rot molecules form a compact homodimer, each of which contains a typical helix-turn-helix module and a β-hairpin motif connected by a flexible loop. Fluorescence polarization results indicate that Rot preferentially recognizes AT-rich dsDNA with ~30-base-pair nucleotides and that the conserved positively charged residues on the winged-helix motif are vital for binding to the AT-rich dsDNA. It is proposed that the DNA-recognition model of Rot may be similar to that of SarA, SarR and SarS, in which the helix-turn-helix motifs of each monomer interact with the major grooves of target dsDNA and the winged motifs contact the minor grooves. Interestingly, the structure shows that Rot adopts a novel dimerization model that differs from that of other SarA homologues. As expected, perturbation of the dimer interface abolishes the dsDNA-binding ability of Rot, suggesting that Rot functions as a dimer. In addition, the results have been further confirmed in vivo by measuring the transcriptional regulation of α-toxin, a major virulence factor produced by most S. aureus strains.
The ancient claudin Dni2 facilitates yeast cell fusion by compartmentalizing Dni1 into a membrane subdomain.

PubMed

Curto, M-Ángeles; Moro, Sandra; Yanguas, Francisco; Gutiérrez-González, Carmen; Valdivieso, M-Henar

2018-05-01

Dni1 and Dni2 facilitate cell fusion during mating. Here, we show that these proteins are interdependent for their localization in a plasma membrane subdomain, which we have termed the mating fusion domain. Dni1 compartmentation in the domain is required for cell fusion. The contribution of actin, sterol-dependent membrane organization, and Dni2 to this compartmentation was analysed, and the results showed that Dni2 plays the most relevant role in the process. In turn, the Dni2 exit from the endoplasmic reticulum depends on Dni1. These proteins share the presence of a cysteine motif in their first extracellular loop related to the claudin GLWxxC(8-10 aa)C signature motif. Structure-function analyses show that mutating each Dni1 conserved cysteine has mild effects, and that only simultaneous elimination of several cysteines leads to a mating defect. On the contrary, eliminating each single cysteine and the C-terminal tail in Dni2 abrogates Dni1 compartmentation and cell fusion. Sequence alignments show that claudin trans-membrane helixes bear small-XXX-small motifs at conserved positions. The fourth Dni2 trans-membrane helix tends to form homo-oligomers in Escherichia plasma membrane, and two concatenated small-XXX-small motifs are required for efficient oligomerization and for Dni2 export from the yeast endoplasmic reticulum. Together, our results strongly suggest that Dni2 is an ancient claudin that blocks Dni1 diffusion from the intercellular region where two plasma membranes are in close proximity, and that this function is required for Dni1 to facilitate cell fusion.
Using SCOPE to identify potential regulatory motifs in coregulated genes.

PubMed

Martyanov, Viktor; Gross, Robert H

2011-05-31

SCOPE is an ensemble motif finder that uses three component algorithms in parallel to identify potential regulatory motifs by over-representation and motif position preference. Each component algorithm is optimized to find a different kind of motif. By taking the best of these three approaches, SCOPE performs better than any single algorithm, even in the presence of noisy data. In this article, we utilize a web version of SCOPE to examine genes that are involved in telomere maintenance. SCOPE has been incorporated into at least two other motif finding programs and has been used in other studies. The three algorithms that comprise SCOPE are BEAM, which finds non-degenerate motifs (ACCGGT), PRISM, which finds degenerate motifs (ASCGWT), and SPACER, which finds longer bipartite motifs (ACCnnnnnnnnGGT). These three algorithms have been optimized to find their corresponding type of motif. Together, they allow SCOPE to perform extremely well. Once a gene set has been analyzed and candidate motifs identified, SCOPE can look for other genes that contain the motif which, when added to the original set, will improve the motif score. This can occur through over-representation or motif position preference. Working with partial gene sets that have biologically verified transcription factor binding sites, SCOPE was able to identify most of the rest of the genes also regulated by the given transcription factor. Output from SCOPE shows candidate motifs, their significance, and other information both as a table and as a graphical motif map. FAQs and video tutorials are available at the SCOPE web site which also includes a "Sample Search" button that allows the user to perform a trial run. Scope has a very friendly user interface that enables novice users to access the algorithm's full power without having to become an expert in the bioinformatics of motif finding. As input, SCOPE can take a list of genes, or FASTA sequences. These can be entered in browser text fields, or read from a file. The output from SCOPE contains a list of all identified motifs with their scores, number of occurrences, fraction of genes containing the motif, and the algorithm used to identify the motif. For each motif, result details include a consensus representation of the motif, a sequence logo, a position weight matrix, and a list of instances for every motif occurrence (with exact positions and "strand" indicated). Results are returned in a browser window and also optionally by email. Previous papers describe the SCOPE algorithms in detail.
Genome-wide analysis of the homeodomain-leucine zipper (HD-ZIP) gene family in peach (Prunus persica).

PubMed

Zhang, C H; Ma, R J; Shen, Z J; Sun, X; Korir, N K; Yu, M L

2014-04-08

In this study, 33 homeodomain-leucine zipper (HD-ZIP) genes were identified in peach using the HD-ZIP amino acid sequences of Arabidopsis thaliana as a probe. Based on the phylogenetic analysis and the individual gene or protein characteristics, the HD-ZIP gene family in peach can be classified into 4 subfamilies, HD-ZIP I, II, III, and IV, containing 14, 7, 4, and 8 members, respectively. The most closely related peach HD-ZIP members within the same subfamilies shared very similar gene structure in terms of either intron/exon numbers or lengths. Almost all members of the same subfamily shared common motif compositions, thereby implying that the HD-ZIP proteins within the same subfamily may have functional similarity. The 33 peach HD-ZIP genes were distributed across scaffolds 1 to 7. Although the primary structure varied among HD-ZIP family proteins, their tertiary structures were similar. The results from this study will be useful in selecting candidate genes from specific subfamilies for functional analysis.
NullSeq: A Tool for Generating Random Coding Sequences with Desired Amino Acid and GC Contents.

PubMed

Liu, Sophia S; Hockenberry, Adam J; Lancichinetti, Andrea; Jewett, Michael C; Amaral, Luís A N

2016-11-01

The existence of over- and under-represented sequence motifs in genomes provides evidence of selective evolutionary pressures on biological mechanisms such as transcription, translation, ligand-substrate binding, and host immunity. In order to accurately identify motifs and other genome-scale patterns of interest, it is essential to be able to generate accurate null models that are appropriate for the sequences under study. While many tools have been developed to create random nucleotide sequences, protein coding sequences are subject to a unique set of constraints that complicates the process of generating appropriate null models. There are currently no tools available that allow users to create random coding sequences with specified amino acid composition and GC content for the purpose of hypothesis testing. Using the principle of maximum entropy, we developed a method that generates unbiased random sequences with pre-specified amino acid and GC content, which we have developed into a python package. Our method is the simplest way to obtain maximally unbiased random sequences that are subject to GC usage and primary amino acid sequence constraints. Furthermore, this approach can easily be expanded to create unbiased random sequences that incorporate more complicated constraints such as individual nucleotide usage or even di-nucleotide frequencies. The ability to generate correctly specified null models will allow researchers to accurately identify sequence motifs which will lead to a better understanding of biological processes as well as more effective engineering of biological systems.
Canonical Bcl-2 motifs of the Na+/K+ pump revealed by the BH3 mimetic chelerythrine: early signal transducers of apoptosis?

PubMed

Lauf, Peter K; Heiny, Judith; Meller, Jarek; Lepera, Michael A; Koikov, Leonid; Alter, Gerald M; Brown, Thomas L; Adragna, Norma C

2013-01-01

Chelerythrine [CET], a protein kinase C [PKC] inhibitor, is a prop-apoptotic BH3-mimetic binding to BH1-like motifs of Bcl-2 proteins. CET action was examined on PKC phosphorylation-dependent membrane transporters (Na+/K+ pump/ATPase [NKP, NKA], Na+-K+-2Cl+ [NKCC] and K+-Cl- [KCC] cotransporters, and channel-supported K+ loss) in human lens epithelial cells [LECs]. K+ loss and K+ uptake, using Rb+ as congener, were measured by atomic absorption/emission spectrophotometry with NKP and NKCC inhibitors, and Cl- replacement by NO3ˉ to determine KCC. 3H-Ouabain binding was performed on a pig renal NKA in the presence and absence of CET. Bcl-2 protein and NKA sequences were aligned and motifs identified and mapped using PROSITE in conjunction with BLAST alignments and analysis of conservation and structural similarity based on prediction of secondary and crystal structures. CET inhibited NKP and NKCC by >90% (IC50 values ~35 and ~15 μM, respectively) without significant KCC activity change, and stimulated K+ loss by ~35% at 10-30 μM. Neither ATP levels nor phosphorylation of the NKA α1 subunit changed. 3H-ouabain was displaced from pig renal NKA only at 100 fold higher CET concentrations than the ligand. Sequence alignments of NKA with BH1- and BH3-like motifs containing pro-survival Bcl-2 and BclXl proteins showed more than one BH1-like motif within NKA for interaction with CET or with BH3 motifs. One NKA BH1-like motif (ARAAEILARDGPN) was also found in all P-type ATPases. Also, NKA possessed a second motif similar to that near the BH3 region of Bcl-2. Findings support the hypothesis that CET inhibits NKP by binding to BH1-like motifs and disrupting the α1 subunit catalytic activity through conformational changes. By interacting with Bcl-2 proteins through their complementary BH1- or BH3-like-motifs, NKP proteins may be sensors of normal and pathological cell functions, becoming important yet unrecognized signal transducers in the initial phases of apoptosis. CET action on NKCC1 and K+ channels may involve PKC-regulated mechanisms; however, limited sequence homologies to BH1-like motifs cannot exclude direct effects.
A novel N-terminal motif of dipeptidyl peptidase-like proteins produces rapid inactivation of KV4.2 channels by a pore-blocking mechanism.

PubMed

Jerng, Henry H; Dougherty, Kevin; Covarrubias, Manuel; Pfaffinger, Paul J

2009-11-01

The somatodendritic subthreshold A-type K(+) current in neurons (I(SA)) depends on its kinetic and voltage-dependent properties to regulate membrane excitability, action potential repetitive firing, and signal integration. Key functional properties of the K(V)4 channel complex underlying I(SA) are determined by dipeptidyl peptidase-like proteins known as dipeptidyl peptidase 6 (DPP6) and dipeptidyl peptidase 10 (DPP10). Among the multiple known DPP10 isoforms with alternative N-terminal sequences, DPP10a confers exceptionally fast inactivation to K(V)4.2 channels. To elucidate the molecular basis of this fast inactivation, we investigated the structure-function relationship of the DPP10a N-terminal region and its interaction with the K(V)4.2 channel. Here, we show that DPP10a shares a conserved N-terminal sequence (MNQTA) with DPP6a (aka DPP6-E), which also induces fast inactivation. Deletion of the NQTA sequence in DPP10a eliminates this dramatic fast inactivation, and perfusion of MNQTA peptide to the cytoplasmic face of inside-out patches inhibits the K(V)4.2 current. DPP10a-induced fast inactivation exhibits competitive interactions with internally applied tetraethylammonium (TEA), and elevating the external K(+) concentration accelerates recovery from DPP10a-mediated fast inactivation. These results suggest that fast inactivation induced by DPP10a or DPP6a is mediated by a common N-terminal inactivation motif via a pore-blocking mechanism. This mechanism may offer an attractive target for novel pharmacological interventions directed at impairing I(SA) inactivation and reducing neuronal excitability.
Genome-Wide Identification and Evolution Analysis of Trehalose-6-Phosphate Synthase Gene Family in Nelumbo nucifera

PubMed Central

Jin, Qijiang; Hu, Xin; Li, Xin; Wang, Bei; Wang, Yanjie; Jiang, Hongwei; Mattson, Neil; Xu, Yingchun

2016-01-01

Trehalose-6-phosphate synthase (TPS) plays a key role in plant carbohydrate metabolism and the perception of carbohydrate availability. In the present work, the publicly available Nelumbo nucifera (lotus) genome sequence database was analyzed which led to identification of nine lotus TPS genes (NnTPS). It was found that at least two introns are included in the coding sequences of NnTPS genes. When the motif compositions were analyzed we found that NnTPS generally shared the similar motifs, implying that they have similar functions. The dN/dS ratios were always less than 1 for different domains and regions outside domains, suggesting purifying selection on the lotus TPS gene family. The regions outside TPS domain evolved relatively faster than NnTPS domains. A phylogenetic tree was constructed using all predicted coding sequences of lotus TPS genes, together with those from Arabidopsis, poplar, soybean, and rice. The result indicated that those TPS genes could be clearly divided into two main subfamilies (I-II), where each subfamily could be further divided into 2 (I) and 5 (II) subgroups. Analyses of divergence and adaptive evolution show that purifying selection may have been the main force driving evolution of plant TPS genes. Some of the critical sites that contributed to divergence may have been under positive selection. Transcriptome data analysis revealed that most NnTPS genes were predominantly expressed in sink tissues. Expression pattern of NnTPS genes under copper and submergence stress indicated that NNU_014679 and NNU_022788 might play important roles in lotus energy metabolism and participate in stress response. Our results can facilitate further functional studies of TPS genes in lotus. PMID:27746792
PpeTAC1 promotes the horizontal growth of branches in peach trees and is a member of a functionally conserved gene family found in diverse plants species.

PubMed

Dardick, Chris; Callahan, Ann; Horn, Renate; Ruiz, Karina B; Zhebentyayeva, Tetyana; Hollender, Courtney; Whitaker, Michael; Abbott, Albert; Scorza, Ralph

2013-08-01

Trees are capable of tremendous architectural plasticity, allowing them to maximize their light exposure under highly competitive environments. One key component of tree architecture is the branch angle, yet little is known about the molecular basis for the spatial patterning of branches in trees. Here, we report the identification of a candidate gene for the br mutation in Prunus persica (peach) associated with vertically oriented growth of branches, referred to as 'pillar' or 'broomy'. Ppa010082, annotated as hypothetical protein in the peach genome sequence, was identified as a candidate gene for br using a next generation sequence-based mapping approach. Sequence similarity searches identified rice TAC1 (tiller angle control 1) as a putative ortholog, and we thus named it PpeTAC1. In monocots, TAC1 is known to lead to less compact growth by increasing the tiller angle. In Arabidopsis, an attac1 mutant showed more vertical branch growth angles, suggesting that the gene functions universally to promote the horizontal growth of branches. TAC1 genes belong to a gene family (here named IGT for a shared conserved motif) found in all plant genomes, consisting of two clades: one containing TAC1-like genes; the other containing LAZY1, which contains an EAR motif, and promotes vertical shoot growth in Oryza sativa (rice) and Arabidopsis through influencing polar auxin transport. The data suggest that IGT genes are ancient, and play conserved roles in determining shoot growth angles in plants. Understanding how IGT genes modulate branch angles will provide insights into how different architectural growth habits evolved in terrestrial plants. © 2013 The Authors The Plant Journal © 2013 John Wiley & Sons Ltd.

Immune Selection In Vitro Reveals Human Immunodeficiency Virus Type 1 Nef Sequence Motifs Important for Its Immune Evasion Function In Vivo

PubMed Central

Lee, Patricia; Ng, Hwee L.; Yang, Otto O.

2012-01-01

Human immunodeficiency virus type 1 (HIV-1) Nef downregulates major histocompatibility complex class I (MHC-I), impairing the clearance of infected cells by CD8+ cytotoxic T lymphocytes (CTLs). While sequence motifs mediating this function have been determined by in vitro mutagenesis studies of laboratory-adapted HIV-1 molecular clones, it is unclear whether the highly variable Nef sequences of primary isolates in vivo rely on the same sequence motifs. To address this issue, nef quasispecies from nine chronically HIV-1-infected persons were examined for sequence evolution and altered MHC-I downregulatory function under Gag-specific CTL immune pressure in vitro. This selection resulted in decreased nef diversity and strong purifying selection. Site-by-site analysis identified 13 codons undergoing purifying selection and 1 undergoing positive selection. Of the former, only 6 have been reported to have roles in Nef function, including 4 associated with MHC-I downregulation. Functional testing of naturally occurring in vivo polymorphisms at the 7 sites with no previously known functional role revealed 3 mutations (A84D, Y135F, and G140R) that ablated MHC-I downregulation and 3 (N52A, S169I, and V180E) that partially impaired MHC-I downregulation. Globally, the CTL pressure in vitro selected functional Nef from the in vivo quasispecies mixtures that predominately lacked MHC-I downregulatory function at the baseline. Overall, these data demonstrate that CTL pressure exerts a strong purifying selective pressure for MHC-I downregulation and identifies novel functional motifs present in Nef sequences in vivo. PMID:22553319
Improved prediction of MHC class I and class II epitopes using a novel Gibbs sampling approach.

PubMed

Nielsen, Morten; Lundegaard, Claus; Worning, Peder; Hvid, Christina Sylvester; Lamberth, Kasper; Buus, Søren; Brunak, Søren; Lund, Ole

2004-06-12

Prediction of which peptides will bind a specific major histocompatibility complex (MHC) constitutes an important step in identifying potential T-cell epitopes suitable as vaccine candidates. MHC class II binding peptides have a broad length distribution complicating such predictions. Thus, identifying the correct alignment is a crucial part of identifying the core of an MHC class II binding motif. In this context, we wish to describe a novel Gibbs motif sampler method ideally suited for recognizing such weak sequence motifs. The method is based on the Gibbs sampling method, and it incorporates novel features optimized for the task of recognizing the binding motif of MHC classes I and II. The method locates the binding motif in a set of sequences and characterizes the motif in terms of a weight-matrix. Subsequently, the weight-matrix can be applied to identifying effectively potential MHC binding peptides and to guiding the process of rational vaccine design. We apply the motif sampler method to the complex problem of MHC class II binding. The input to the method is amino acid peptide sequences extracted from the public databases of SYFPEITHI and MHCPEP and known to bind to the MHC class II complex HLA-DR4(B1*0401). Prior identification of information-rich (anchor) positions in the binding motif is shown to improve the predictive performance of the Gibbs sampler. Similarly, a consensus solution obtained from an ensemble average over suboptimal solutions is shown to outperform the use of a single optimal solution. In a large-scale benchmark calculation, the performance is quantified using relative operating characteristics curve (ROC) plots and we make a detailed comparison of the performance with that of both the TEPITOPE method and a weight-matrix derived using the conventional alignment algorithm of ClustalW. The calculation demonstrates that the predictive performance of the Gibbs sampler is higher than that of ClustalW and in most cases also higher than that of the TEPITOPE method.
Informative priors based on transcription factor structural class improve de novo motif discovery.

PubMed

Narlikar, Leelavati; Gordân, Raluca; Ohler, Uwe; Hartemink, Alexander J

2006-07-15

An important problem in molecular biology is to identify the locations at which a transcription factor (TF) binds to DNA, given a set of DNA sequences believed to be bound by that TF. In previous work, we showed that information in the DNA sequence of a binding site is sufficient to predict the structural class of the TF that binds it. In particular, this suggests that we can predict which locations in any DNA sequence are more likely to be bound by certain classes of TFs than others. Here, we argue that traditional methods for de novo motif finding can be significantly improved by adopting an informative prior probability that a TF binding site occurs at each sequence location. To demonstrate the utility of such an approach, we present priority, a powerful new de novo motif finding algorithm. Using data from TRANSFAC, we train three classifiers to recognize binding sites of basic leucine zipper, forkhead, and basic helix loop helix TFs. These classifiers are used to equip priority with three class-specific priors, in addition to a default prior to handle TFs of other classes. We apply priority and a number of popular motif finding programs to sets of yeast intergenic regions that are reported by ChIP-chip to be bound by particular TFs. priority identifies motifs the other methods fail to identify, and correctly predicts the structural class of the TF recognizing the identified binding sites. Supplementary material and code can be found at http://www.cs.duke.edu/~amink/.
BLSSpeller: exhaustive comparative discovery of conserved cis-regulatory elements.

PubMed

De Witte, Dieter; Van de Velde, Jan; Decap, Dries; Van Bel, Michiel; Audenaert, Pieter; Demeester, Piet; Dhoedt, Bart; Vandepoele, Klaas; Fostier, Jan

2015-12-01

The accurate discovery and annotation of regulatory elements remains a challenging problem. The growing number of sequenced genomes creates new opportunities for comparative approaches to motif discovery. Putative binding sites are then considered to be functional if they are conserved in orthologous promoter sequences of multiple related species. Existing methods for comparative motif discovery usually rely on pregenerated multiple sequence alignments, which are difficult to obtain for more diverged species such as plants. As a consequence, misaligned regulatory elements often remain undetected. We present a novel algorithm that supports both alignment-free and alignment-based motif discovery in the promoter sequences of related species. Putative motifs are exhaustively enumerated as words over the IUPAC alphabet and screened for conservation using the branch length score. Additionally, a confidence score is established in a genome-wide fashion. In order to take advantage of a cloud computing infrastructure, the MapReduce programming model is adopted. The method is applied to four monocotyledon plant species and it is shown that high-scoring motifs are significantly enriched for open chromatin regions in Oryza sativa and for transcription factor binding sites inferred through protein-binding microarrays in O.sativa and Zea mays. Furthermore, the method is shown to recover experimentally profiled ga2ox1-like KN1 binding sites in Z.mays. BLSSpeller was written in Java. Source code and manual are available at http://bioinformatics.intec.ugent.be/blsspeller Klaas.Vandepoele@psb.vib-ugent.be or jan.fostier@intec.ugent.be. Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
BLSSpeller: exhaustive comparative discovery of conserved cis-regulatory elements

PubMed Central

De Witte, Dieter; Van de Velde, Jan; Decap, Dries; Van Bel, Michiel; Audenaert, Pieter; Demeester, Piet; Dhoedt, Bart; Vandepoele, Klaas; Fostier, Jan

2015-01-01

Motivation: The accurate discovery and annotation of regulatory elements remains a challenging problem. The growing number of sequenced genomes creates new opportunities for comparative approaches to motif discovery. Putative binding sites are then considered to be functional if they are conserved in orthologous promoter sequences of multiple related species. Existing methods for comparative motif discovery usually rely on pregenerated multiple sequence alignments, which are difficult to obtain for more diverged species such as plants. As a consequence, misaligned regulatory elements often remain undetected. Results: We present a novel algorithm that supports both alignment-free and alignment-based motif discovery in the promoter sequences of related species. Putative motifs are exhaustively enumerated as words over the IUPAC alphabet and screened for conservation using the branch length score. Additionally, a confidence score is established in a genome-wide fashion. In order to take advantage of a cloud computing infrastructure, the MapReduce programming model is adopted. The method is applied to four monocotyledon plant species and it is shown that high-scoring motifs are significantly enriched for open chromatin regions in Oryza sativa and for transcription factor binding sites inferred through protein-binding microarrays in O.sativa and Zea mays. Furthermore, the method is shown to recover experimentally profiled ga2ox1-like KN1 binding sites in Z.mays. Availability and implementation: BLSSpeller was written in Java. Source code and manual are available at http://bioinformatics.intec.ugent.be/blsspeller Contact: Klaas.Vandepoele@psb.vib-ugent.be or jan.fostier@intec.ugent.be Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26254488
Conserved features of eukaryotic hsp70 genes revealed by comparison with the nucleotide sequence of human hsp70.

PubMed Central

Hunt, C; Morimoto, R I

1985-01-01

We have determined the nucleotide sequence of the human hsp70 gene and 5' flanking region. The hsp70 gene is transcribed as an uninterrupted primary transcript of 2440 nucleotides composed of a 5' noncoding leader sequence of 212 nucleotides, a 3' noncoding region of 242 nucleotides, and a continuous open reading frame of 1986 nucleotides that encodes a protein with predicted molecular mass of 69,800 daltons. Upstream of the 5' terminus are the canonical TATAAA box, the sequence ATTGG that corresponds in the inverted orientation to the CCAAT motif, and the dyad sequence CTGGAAT/ATTCCCG that shares homology in 12 of 14 positions with the consensus transcription regulatory sequence common to Drosophila heat shock genes. Comparison of the predicted amino acid sequences of human hsp70 with the published sequences of Drosophila hsp70 and Escherichia coli dnaK reveals that human hsp70 is 73% identical to Drosophila hsp70 and 47% identical to E. coli dnaK. Surprisingly, the nucleotide sequences of the human and Drosophila genes are 72% identical and human and E. coli genes are 50% identical, which is more highly conserved than necessary given the degeneracy of the genetic code. The lack of accumulated silent nucleotide substitutions leads us to propose that there may be additional information in the nucleotide sequence of the hsp70 gene or the corresponding mRNA that precludes the maximum divergence allowed in the silent codon positions. PMID:3931075
Hypomethylation of CNG targets induced with dihydroxypropyladenine is rapidly reversed in the course of mitotic cell division in tobacco.

PubMed

Koukalová, B.; Votruba, I.; Fojtová, M.; Holý, A.; Kovarík, A.

2002-10-01

We followed the mitotic transmission of an experimentally induced hypomethylated state of several tobacco repetitive sequences in callus culture and plants. The initial hypomethylation was induced by a hypomethylation drug, dihydroxypropyladenine (DHPA), the competitive inhibitor of cellular S-adenosylhomocysteine hydrolase, which is known to preferentially inhibit methylation at CNG and non-symmetrical motifs while having a negligible effect on methylation at CG motifs. The deprivation of this drug resulted in an almost immediate remethylation of cytosines at CNG motifs ( MspI and EcoRII sites) leading us to conclude that, the hypomethylation effect of dihydroxypropyladenine is rather transient and differs from that of 5-azacytidine which often induces heritable changes in methylation patterns. The results suggest that de novo methylation of CNG motifs is a rapid and meiotically independent process on DNA sequences with pre-existing CG methylation.
The presence of the ancestral insect telomeric motif in kissing bugs (Triatominae) rules out the hypothesis of its loss in evolutionarily advanced Heteroptera (Cimicomorpha)

PubMed Central

Pita, Sebastián; Panzera, Francisco; Mora, Pablo; Vela, Jesús; Palomeque, Teresa; Lorite, Pedro

2016-01-01

Abstract Next-generation sequencing data analysis on Triatoma infestans Klug, 1834 (Heteroptera, Cimicomorpha, Reduviidae) revealed the presence of the ancestral insect (TTAGG)n telomeric motif in its genome. Fluorescence in situ hybridization confirms that chromosomes bear this telomeric sequence in their chromosomal ends. Furthermore, motif amount estimation was about 0.03% of the total genome, so that the average telomere length in each chromosomal end is almost 18 kb long. We also detected the presence of (TTAGG)n telomeric repeat in mitotic and meiotic chromosomes in other three species of Triatominae: Triatoma dimidiata Latreille, 1811, Dipetalogaster maxima Uhler, 1894, and Rhodnius prolixus Ståhl, 1859. This is the first report of the (TTAGG)n telomeric repeat in the infraorder Cimicomorpha, contradicting the currently accepted hypothesis that evolutionarily recent heteropterans lack this ancestral insect telomeric sequence. PMID:27830050
Cyclotides Associate with Leaf Vasculature and Are the Products of a Novel Precursor in Petunia (Solanaceae)*

PubMed Central

Poth, Aaron G.; Mylne, Joshua S.; Grassl, Julia; Lyons, Russell E.; Millar, A. Harvey; Colgrave, Michelle L.; Craik, David J.

2012-01-01

Cyclotides are a large family of plant peptides that are structurally defined by their cyclic backbone and a trifecta of disulfide bonds, collectively known as the cyclic cystine knot (CCK) motif. Structurally similar cyclotides have been isolated from plants within the Rubiaceae, Violaceae, and Fabaceae families and share the CCK motif with trypsin-inhibitory knottins from a plant in the Cucurbitaceae family. Cyclotides have previously been reported to be encoded by dedicated genes or as a domain within a knottin-encoding PA1-albumin-like gene. Here we report the discovery of cyclotides and related non-cyclic peptides we called “acyclotides” from petunia of the agronomically important Solanaceae plant family. Transcripts for petunia cyclotides and acyclotides encode the shortest known cyclotide precursors. Despite having a different precursor structure, their sequences suggest that petunia cyclotides mature via the same biosynthetic route as other cyclotides. We assessed the spatial distribution of cyclotides within a petunia leaf section by MALDI imaging and observed that the major cyclotide component Phyb A was non-uniformly distributed. Dissected leaf midvein extracts contained significantly higher concentrations of this cyclotide compared with the lamina and outer margins of leaves. This is the third distinct type of cyclotide precursor, and Solanaceae is the fourth phylogenetically disparate plant family to produce these structurally conserved cyclopeptides, suggesting either convergent evolution upon the CCK structure or movement of cyclotide-encoding sequences within the plant kingdom. PMID:22700981
De novo design of RNA-binding proteins with a prion-like domain related to ALS/FTD proteinopathies.

PubMed

Mitsuhashi, Kana; Ito, Daisuke; Mashima, Kyoko; Oyama, Munenori; Takahashi, Shinichi; Suzuki, Norihiro

2017-12-04

Aberrant RNA-binding proteins form the core of the neurodegeneration cascade in spectrums of disease, such as amyotrophic lateral sclerosis (ALS)/frontotemporal dementia (FTD). Six ALS-related molecules, TDP-43, FUS, TAF15, EWSR1, heterogeneous nuclear (hn)RNPA1 and hnRNPA2 are RNA-binding proteins containing candidate mutations identified in ALS patients and those share several common features, including harboring an aggregation-prone prion-like domain (PrLD) containing a glycine/serine-tyrosine-glycine/serine (G/S-Y-G/S)-motif-enriched low-complexity sequence and rich in glutamine and/or asparagine. Additinally, these six molecules are components of RNA granules involved in RNA quality control and become mislocated from the nucleus to form cytoplasmic inclusion bodies (IBs) in the ALS/FTD-affected brain. To reveal the essential mechanisms involved in ALS/FTD-related cytotoxicity associated with RNA-binding proteins containing PrLDs, we designed artificial RNA-binding proteins harboring G/S-Y-G/S-motif repeats with and without enriched glutamine residues and nuclear-import/export-signal sequences and examined their cytotoxicity in vitro. These proteins recapitulated features of ALS-linked molecules, including insoluble aggregation, formation of cytoplasmic IBs and components of RNA granules, and cytotoxicity instigation. These findings indicated that these artificial RNA-binding proteins mimicked features of ALS-linked molecules and allowed the study of mechanisms associated with gain of toxic functions related to ALS/FTD pathogenesis.
The Janus Kinase (JAK) FERM and SH2 Domains: Bringing Specificity to JAK-Receptor Interactions.

PubMed

Ferrao, Ryan; Lupardus, Patrick J

2017-01-01

The Janus kinases (JAKs) are non-receptor tyrosine kinases essential for signaling in response to cytokines and interferons and thereby control many essential functions in growth, development, and immune regulation. JAKs are unique among tyrosine kinases for their constitutive yet non-covalent association with class I and II cytokine receptors, which upon cytokine binding bring together two JAKs to create an active signaling complex. JAK association with cytokine receptors is facilitated by N-terminal FERM and SH2 domains, both of which are classical mediators of peptide interactions. Together, the JAK FERM and SH2 domains mediate a bipartite interaction with two distinct receptor peptide motifs, the proline-rich "Box1" and hydrophobic "Box2," which are present in the intracellular domain of cytokine receptors. While the general sidechain chemistry of Box1 and Box2 peptides is conserved between receptors, they share very weak primary sequence homology, making it impossible to posit why certain JAKs preferentially interact with and signal through specific subsets of cytokine receptors. Here, we review the structure and function of the JAK FERM and SH2 domains in light of several recent studies that reveal their atomic structure and elucidate interaction mechanisms with both the Box1 and Box2 receptor motifs. These crystal structures demonstrate how evolution has repurposed the JAK FERM and SH2 domains into a receptor-binding module that facilitates interactions with multiple receptors possessing diverse primary sequences.
Nectinepsin: a new extracellular matrix protein of the pexin family. Characterization of a novel cDNA encoding a protein with an RGD cell binding motif.

PubMed

Blancher, C; Omri, B; Bidou, L; Pessac, B; Crisanti, P

1996-10-18

We report the isolation and characterization of a novel cDNA from quail neuroretina encoding a putative protein named nectinepsin. The nectinepsin cDNA identifies a major 2.2-kilobase mRNA that is detected from ED 5 in neuroretina and is increasingly abundant during embryonic development. A nectinepsin mRNA is also found in quail liver, brain, and intestine and in mouse retina. The deduced nectinepsin amino acid sequence contains the RGD cell binding motif of integrin ligands. Furthermore, nectinepsin shares substantial homologies with vitronectin and structural protein similarities with most of the matricial metalloproteases. However, the presence of a specific sequence and the lack of heparin and collagen binding domains of the vitronectin indicate that nectinepsin is a new extracellular matrix protein. Furthermore, genomic Southern blot studies suggest that nectinepsin and vitronectin are encoded by different genes. Western blot analysis with an anti-human vitronectin antiserum revealed, in addition to the 65- and 70-kDa vitronectin bands, an immunoreactive protein of about 54 kDa in all tissues containing nectinepsin mRNA. It seems likely that the form of vitronectin found in chick egg yolk plasma by Nagano et al. ((1992) J. Biol. Chem. 267, 24863-24870) is the protein that corresponds to the nectinepsin cDNA. This new protein could be an important molecule involved in the early steps of the development.
Domain organization, genomic structure, evolution, and regulation of expression of the aggrecan gene family.

PubMed

Schwartz, N B; Pirok, E W; Mensch, J R; Domowicz, M S

1999-01-01

Proteoglycans are complex macromolecules, consisting of a polypeptide backbone to which are covalently attached one or more glycosaminoglycan chains. Molecular cloning has allowed identification of the genes encoding the core proteins of various proteoglycans, leading to a better understanding of the diversity of proteoglycan structure and function, as well as to the evolution of a classification of proteoglycans on the basis of emerging gene families that encode the different core proteins. One such family includes several proteoglycans that have been grouped with aggrecan, the large aggregating chondroitin sulfate proteoglycan of cartilage, based on a high number of sequence similarities within the N- and C-terminal domains. Thus far these proteoglycans include versican, neurocan, and brevican. It is now apparent that these proteins, as a group, are truly a gene family with shared structural motifs on the protein and nucleotide (mRNA) levels, and with nearly identical genomic organizations. Clearly a common ancestral origin is indicated for the members of the aggrecan family of proteoglycans. However, differing patterns of amplification and divergence have also occurred within certain exons across species and family members, leading to the class-characteristic protein motifs in the central carbohydrate-rich region exclusively. Thus the overall domain organization strongly suggests that sequence conservation in the terminal globular domains underlies common functions, whereas differences in the central portions of the genes account for functional specialization among the members of this gene family.
Identification of a new protein in the centrosome-like "atractophore" of Trichomonas vaginalis.

PubMed

Bricheux, Geneviève; Coffe, Gérard; Brugerolle, Guy

2007-06-01

The human parasite Trichomonas vaginalis has specific structural bodies, atractophores, associated at one end to the kinetosomes and at the other to the spindle during division. A monoclonal antibody specific for a component of this structure was obtained. It recognizes a protein with a predicted molecular mass of 477 kDa. Sequence analysis of this protein shows that P477 belongs to the family of large coiled-coil proteins, sharing a highly versatile protein folding motif adaptable to many biological functions. P477-might act as an anchor to localize cellular activities and components to the golgi centrosomal region. It may represent a new class of structural proteins, since similar proteins were found in many protozoans.
A search for structurally similar cellular internal ribosome entry sites

PubMed Central

Baird, Stephen D.; Lewis, Stephen M.; Turcotte, Marcel; Holcik, Martin

2007-01-01

Internal ribosome entry sites (IRES) allow ribosomes to be recruited to mRNA in a cap-independent manner. Some viruses that impair cap-dependent translation initiation utilize IRES to ensure that the viral RNA will efficiently compete for the translation machinery. IRES are also employed for the translation of a subset of cellular messages during conditions that inhibit cap-dependent translation initiation. IRES from viruses like Hepatitis C and Classical Swine Fever virus share a similar structure/function without sharing primary sequence similarity. Of the cellular IRES structures derived so far, none were shown to share an overall structural similarity. Therefore, we undertook a genome-wide search of human 5′UTRs (untranslated regions) with an empirically derived structure of the IRES from the key inhibitor of apoptosis, X-linked inhibitor of apoptosis protein (XIAP), to identify novel IRES that share structure/function similarity. Three of the top matches identified by this search that exhibit IRES activity are the 5′UTRs of Aquaporin 4, ELG1 and NF-kappaB repressing factor (NRF). The structures of AQP4 and ELG1 IRES have limited similarity to the XIAP IRES; however, they share trans-acting factors that bind the XIAP IRES. We therefore propose that cellular IRES are not defined by overall structure, as viral IRES, but are instead dependent upon short motifs and trans-acting factors for their function. PMID:17591613
Microsporidia, amitochondrial protists, possess a 70-kDa heat shock protein gene of mitochondrial evolutionary origin.

PubMed

Peyretaillade, E; Broussolle, V; Peyret, P; Méténier, G; Gouy, M; Vivarès, C P

1998-06-01

An intronless gene encoding a protein of 592 amino acid residues with similarity to 70-kDa heat shock proteins (HSP70s) has been cloned and sequenced from the amitochondrial protist Encephalitozoon cuniculi (phylum Microsporidia). Southern blot analyses show the presence of a single gene copy located on chromosome XI. The encoded protein exhibits an N-terminal hydrophobic leader sequence and two motifs shared by proteobacterial and mitochondrially expressed HSP70 homologs. Phylogenetic analysis using maximum likelihood and evolutionary distances place the E. cuniculi sequence in the cluster of mitochondrially expressed HSP70s, with a higher evolutionary rate than those of homologous sequences. Similar results were obtained after cloning a fragment of the homologous gene in the closely related species E. hellem. The presence of a nuclear targeting signal-like sequence supports a role of the Encephalitozoon HSP70 as a molecular chaperone of nuclear proteins. No evidence for cytosolic or endoplasmic reticulum forms of HSP70 was obtained through PCR amplification. These data suggest that Encephalitozoon species have evolved from an ancestor bearing mitochondria, which is in disagreement with the postulated presymbiotic origin of Microsporidia. The specific role and intracellular localization of the mitochondrial HSP70-like protein remain to be elucidated.
Cellulose in Cyanobacteria. Origin of Vascular Plant Cellulose Synthase?

PubMed Central

Nobles, David R.; Romanovicz, Dwight K.; Brown, R. Malcolm

2001-01-01

Although cellulose biosynthesis among the cyanobacteria has been suggested previously, we present the first conclusive evidence, to our knowledge, of the presence of cellulose in these organisms. Based on the results of x-ray diffraction, electron microscopy of microfibrils, and cellobiohydrolase I-gold labeling, we report the occurrence of cellulose biosynthesis in nine species representing three of the five sections of cyanobacteria. Sequence analysis of the genomes of four cyanobacteria revealed the presence of multiple amino acid sequences bearing the DDD35QXXRW motif conserved in all cellulose synthases. Pairwise alignments demonstrated that CesAs from plants were more similar to putative cellulose synthases from Anabaena sp. Pasteur Culture Collection 7120 and Nostoc punctiforme American Type Culture Collection 29133 than any other cellulose synthases in the database. Multiple alignments of putative cellulose synthases from Anabaena sp. Pasteur Culture Collection 7120 and N. punctiforme American Type Culture Collection 29133 with the cellulose synthases of other prokaryotes, Arabidopsis, Gossypium hirsutum, Populus alba × Populus tremula, corn (Zea mays), and Dictyostelium discoideum showed that cyanobacteria share an insertion between conserved regions U1 and U2 found previously only in eukaryotic sequences. Furthermore, phylogenetic analysis indicates that the cyanobacterial cellulose synthases share a common branch with CesAs of vascular plants in a manner similar to the relationship observed with cyanobacterial and chloroplast 16s rRNAs, implying endosymbiotic transfer of CesA from cyanobacteria to plants and an ancient origin for cellulose synthase in eukaryotes. PMID:11598227
SSR allelic variation in almond (Prunus dulcis Mill.).

PubMed

Xie, Hua; Sui, Yi; Chang, Feng-Qi; Xu, Yong; Ma, Rong-Cai

2006-01-01

Sixteen SSR markers including eight EST-SSR and eight genomic SSRs were used for genetic diversity analysis of 23 Chinese and 15 international almond cultivars. EST- and genomic SSR markers previously reported in species of Prunus, mainly peach, proved to be useful for almond genetic analysis. DNA sequences of 117 alleles of six of the 16 SSR loci were analysed to reveal sequence variation among the 38 almond accessions. For the four SSR loci with AG/CT repeats, no insertions or deletions were observed in the flanking regions of the 98 alleles sequenced. Allelic size variation of these loci resulted exclusively from differences in the structures of repeat motifs, which involved interruptions or occurrences of new motif repeats in addition to varying number of AG/CT repeats. Some alleles had a high number of uninterrupted repeat motifs, indicating that SSR mutational patterns differ among alleles at a given SSR locus within the almond species. Allelic homoplasy was observed in the SSR loci because of base substitutions, interruptions or compound repeat motifs. Substitutions in the repeat regions were found at two SSR loci, suggesting that point mutations operate on SSRs and hinder the further SSR expansion by introducing repeat interruptions to stabilize SSR loci. Furthermore, it was shown that some potential point mutations in the flanking regions are linked with new SSR repeat motif variation in almond and peach.
A Motif in the Clathrin Heavy Chain Required for the Hsc70/Auxilin Uncoating Reaction

PubMed Central

Rapoport, Iris; Boll, Werner; Yu, Anan; Böcking, Till

2008-01-01

The 70-kDa heat-shock cognate protein (Hsc70) chaperone is an ATP-dependent “disassembly enzyme” for many subcellular structures, including clathrin-coated vesicles where it functions as an uncoating ATPase. Hsc70, and its cochaperone auxilin together catalyze coat disassembly. Like other members of the Hsp70 chaperone family, it is thought that ATP-bound Hsc70 recognizes the clathrin triskelion through an unfolded exposed hydrophobic segment. The best candidate is the unstructured C terminus (residues 1631–1675) of the heavy chain at the foot of the tripod below the hub, containing the sequence motif QLMLT, closely related to the sequence bound preferentially by the substrate groove of Hsc70 (Fotin et al., 2004b). To test this hypothesis, we generated in insect cells recombinant mammalian triskelions that in vitro form clathrin cages and clathrin/AP-2 coats exactly like those assembled from native clathrin. We show that coats assembled from recombinant clathrin are good substrates for ATP- and auxilin-dependent, Hsc70-catalyzed uncoating. Finally, we show that this uncoating reaction proceeds normally when the coats contain recombinant heavy chains truncated C-terminal to the QLMLT motif, but very inefficiently when the motif is absent. Thus, the QLMLT motif is required for Hsc-70–facilitated uncoating, consistent with the proposal that this sequence is a specific target of the chaperone. PMID:17978091
NoFold: RNA structure clustering without folding or alignment.

PubMed

Middleton, Sarah A; Kim, Junhyong

2014-11-01

Structures that recur across multiple different transcripts, called structure motifs, often perform a similar function-for example, recruiting a specific RNA-binding protein that then regulates translation, splicing, or subcellular localization. Identifying common motifs between coregulated transcripts may therefore yield significant insight into their binding partners and mechanism of regulation. However, as most methods for clustering structures are based on folding individual sequences or doing many pairwise alignments, this results in a tradeoff between speed and accuracy that can be problematic for large-scale data sets. Here we describe a novel method for comparing and characterizing RNA secondary structures that does not require folding or pairwise alignment of the input sequences. Our method uses the idea of constructing a distance function between two objects by their respective distances to a collection of empirical examples or models, which in our case consists of 1973 Rfam family covariance models. Using this as a basis for measuring structural similarity, we developed a clustering pipeline called NoFold to automatically identify and annotate structure motifs within large sequence data sets. We demonstrate that NoFold can simultaneously identify multiple structure motifs with an average sensitivity of 0.80 and precision of 0.98 and generally exceeds the performance of existing methods. We also perform a cross-validation analysis of the entire set of Rfam families, achieving an average sensitivity of 0.57. We apply NoFold to identify motifs enriched in dendritically localized transcripts and report 213 enriched motifs, including both known and novel structures. © 2014 Middleton and Kim; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

Allergen cross reactions: a problem greater than ever thought?

PubMed

Pfiffner, P; Truffer, R; Matsson, P; Rasi, C; Mari, A; Stadler, B M

2010-12-01

Cross reactions are an often observed phenomenon in patients with allergy. Sensitization against some allergens may cause reactions against other seemingly unrelated allergens. Today, cross reactions are being investigated on a per-case basis, analyzing blood serum specific IgE (sIgE) levels and clinical features of patients suffering from cross reactions. In this study, we evaluated the level of sIgE compared to patients' total IgE assuming epitope specificity is a consequence of sequence similarity. Our objective was to evaluate our recently published model of molecular sequence similarities underlying cross reactivity using serum-derived data from IgE determinations of standard laboratory tests. We calculated the probabilities of protein cross reactivity based on conserved sequence motifs and compared these in silico predictions to a database consisting of 5362 sera with sIgE determinations. Cumulating sIgE values of a patient resulted in a median of 25-30% total IgE. Comparing motif cross reactivity predictions to sIgE levels showed that on average three times fewer motifs than extracts were recognized in a given serum (correlation coefficient: 0.967). Extracts belonging to the same motif group co-reacted in a high percentage of sera (up to 80% for some motifs). Cumulated sIgE levels are exaggerated because of a high level of observed cross reactions. Thus, not only bioinformatic prediction of allergenic motifs, but also serological routine testing of allergic patients implies that the immune system may recognize only a small number of allergenic structures. © 2010 John Wiley & Sons A/S.
Identification of a novel calcium binding motif based on the detection of sequence insertions in the animal peroxidase domain of bacterial proteins.

PubMed

Santamaría-Hernando, Saray; Krell, Tino; Ramos-González, María-Isabel

2012-01-01

Proteins of the animal heme peroxidase (ANP) superfamily differ greatly in size since they have either one or two catalytic domains that match profile PS50292. The orf PP_2561 of Pseudomonas putida KT2440 that we have called PepA encodes a two-domain ANP. The alignment of these domains with those of PepA homologues revealed a variable number of insertions with the consensus G-x-D-G-x-x-[GN]-[TN]-x-D-D. This motif has also been detected in the structure of pseudopilin (pdb 3G20), where it was found to be involved in Ca(2+) coordination although a sequence analysis did not reveal the presence of any known calcium binding motifs in this protein. Isothermal titration calorimetry revealed that a peptide containing this consensus motif bound specifically calcium ions with affinities ranging between 33-79 µM depending on the pH. Microcalorimetric titrations of the purified N-terminal ANP-like domain of PepA revealed Ca(2+) binding with a K(D) of 12 µM and stoichiometry of 1.25 calcium ions per protein monomer. This domain exhibited peroxidase activity after its reconstitution with heme. These data led to the definition of a novel calcium binding motif that we have termed PERCAL and which was abundantly present in animal peroxidase-like domains of bacterial proteins. Bacterial heme peroxidases thus possess two different types of calcium binding motifs, namely PERCAL and the related hemolysin type calcium binding motif, with the latter being located outside the catalytic domains and in their C-terminal end. A phylogenetic tree of ANP-like catalytic domains of bacterial proteins with PERCAL motifs, including single domain peroxidases, was divided into two major clusters, representing domains with and without PERCAL motif containing insertions. We have verified that the recently reported classification of bacterial heme peroxidases in two families (cd09819 and cd09821) is unrelated to these insertions. Sequences matching PERCAL were detected in all kingdoms of life.
De Novo Regulatory Motif Discovery Identifies Significant Motifs in Promoters of Five Classes of Plant Dehydrin Genes.

PubMed

Zolotarov, Yevgen; Strömvik, Martina

2015-01-01

Plants accumulate dehydrins in response to osmotic stresses. Dehydrins are divided into five different classes, which are thought to be regulated in different manners. To better understand differences in transcriptional regulation of the five dehydrin classes, de novo motif discovery was performed on 350 dehydrin promoter sequences from a total of 51 plant genomes. Overrepresented motifs were identified in the promoters of five dehydrin classes. The Kn dehydrin promoters contain motifs linked with meristem specific expression, as well as motifs linked with cold/dehydration and abscisic acid response. KS dehydrin promoters contain a motif with a GATA core. SKn and YnSKn dehydrin promoters contain motifs that match elements connected with cold/dehydration, abscisic acid and light response. YnKn dehydrin promoters contain motifs that match abscisic acid and light response elements, but not cold/dehydration response elements. Conserved promoter motifs are present in the dehydrin classes and across different plant lineages, indicating that dehydrin gene regulation is likely also conserved.
The ARTT motif and a unified structural understanding of substraterecognition in ADP ribosylating bacterial toxins and eukaryotic ADPribosyltransferases

DOE Office of Scientific and Technical Information (OSTI.GOV)

Han, S.; Tainer, J.A.

2001-08-01

ADP-ribosylation is a widely occurring and biologically critical covalent chemical modification process in pathogenic mechanisms, intracellular signaling systems, DNA repair, and cell division. The reaction is catalyzed by ADP-ribosyltransferases, which transfer the ADP-ribose moiety of NAD to a target protein with nicotinamide release. A family of bacterial toxins and eukaryotic enzymes has been termed the mono-ADP-ribosyltransferases, in distinction to the poly-ADP-ribosyltransferases, which catalyze the addition of multiple ADP-ribose groups to the carboxyl terminus of eukaryotic nucleoproteins. Despite the limited primary sequence homology among the different ADP-ribosyltransferases, a central cleft bearing NAD-binding pocket formed by the two perpendicular b-sheet core hasmore » been remarkably conserved between bacterial toxins and eukaryotic mono- and poly-ADP-ribosyltransferases. The majority of bacterial toxins and eukaryotic mono-ADP-ribosyltransferases are characterized by conserved His and catalytic Glu residues. In contrast, Diphtheria toxin, Pseudomonas exotoxin A, and eukaryotic poly-ADP-ribosyltransferases are characterized by conserved Arg and catalytic Glu residues. The NAD-binding core of a binary toxin and a C3-like toxin family identified an ARTT motif (ADP-ribosylating turn-turn motif) that is implicated in substrate specificity and recognition by structural and mutagenic studies. Here we apply structure-based sequence alignment and comparative structural analyses of all known structures of ADP-ribosyltransfeases to suggest that this ARTT motif is functionally important in many ADP-ribosylating enzymes that bear a NAD binding cleft as characterized by conserved Arg and catalytic Glu residues. Overall, structure-based sequence analysis reveals common core structures and conserved active sites of ADP-ribosyltransferases to support similar NAD binding mechanisms but differing mechanisms of target protein binding via sequence variations within the ARTT motif structural framework. Thus, we propose here that the ARTT motif represents an experimentally testable general recognition motif region for many ADP-ribosyltransferases and thereby potentially provides a unified structural understanding of substrate recognition in ADP-ribosylation processes.« less
Mutually Exclusive Formation of G-Quadruplex and i-Motif Is a General Phenomenon Governed by Steric Hindrance in Duplex DNA.

PubMed

Cui, Yunxi; Kong, Deming; Ghimire, Chiran; Xu, Cuixia; Mao, Hanbin

2016-04-19

G-Quadruplex and i-motif are tetraplex structures that may form in opposite strands at the same location of a duplex DNA. Recent discoveries have indicated that the two tetraplex structures can have conflicting biological activities, which poses a challenge for cells to coordinate. Here, by performing innovative population analysis on mechanical unfolding profiles of tetraplex structures in double-stranded DNA, we found that formations of G-quadruplex and i-motif in the two complementary strands are mutually exclusive in a variety of DNA templates, which include human telomere and promoter fragments of hINS and hTERT genes. To explain this behavior, we placed G-quadruplex- and i-motif-hosting sequences in an offset fashion in the two complementary telomeric DNA strands. We found simultaneous formation of the G-quadruplex and i-motif in opposite strands, suggesting that mutual exclusivity between the two tetraplexes is controlled by steric hindrance. This conclusion was corroborated in the BCL-2 promoter sequence, in which simultaneous formation of two tetraplexes was observed due to possible offset arrangements between G-quadruplex and i-motif in opposite strands. The mutual exclusivity revealed here sets a molecular basis for cells to efficiently coordinate opposite biological activities of G-quadruplex and i-motif at the same dsDNA location.
Structural and Functional Studies of Fatty Acyl Adenylate Ligases from E. coli and L. pneumophila

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Z.; Swaminathan, S.; Zhou, R.

2011-02-18

Fatty acyl-AMP ligase (FAAL) is a new member of a family of adenylate-forming enzymes that were recently discovered in Mycobacterium tuberculosis. They are similar in sequence to fatty acyl-coenzyme A (CoA) ligases (FACLs). However, while FACLs perform a two-step catalytic reaction, AMP ligation followed by CoA ligation using ATP and CoA as cofactors, FAALs produce only the acyl adenylate and are unable to perform the second step. We report X-ray crystal structures of full-length FAAL from Escherichia coli (EcFAAL) and FAAL from Legionella pneumophila (LpFAAL) bound to acyl adenylate, determined at resolution limits of 3.0 and 1.85 {angstrom}, respectively. Themore » structures share a larger N-terminal domain and a smaller C-terminal domain, which together resemble the previously determined structures of FAAL and FACL proteins. Our two structures occur in quite different conformations. EcFAAL adopts the adenylate-forming conformation typical of FACLs, whereas LpFAAL exhibits a unique intermediate conformation. Both EcFAAL and LpFAAL have insertion motifs that distinguish them from the FACLs. Structures of EcFAAL and LpFAAL reveal detailed interactions between this insertion motif and the interdomain hinge region and with the C-terminal domain. We suggest that the insertion motifs support sufficient interdomain motions to allow substrate binding and product release during acyl adenylate formation, but they preclude CoA binding, thereby preventing CoA ligation.« less
Structural and Functional Studies of Fatty Acyl Adenylate Ligases from E. coli and L. pneumophila

DOE Office of Scientific and Technical Information (OSTI.GOV)

Z Zhang; R Zhou; J Sauder

2011-12-31

Fatty acyl-AMP ligase (FAAL) is a new member of a family of adenylate-forming enzymes that were recently discovered in Mycobacterium tuberculosis. They are similar in sequence to fatty acyl-coenzyme A (CoA) ligases (FACLs). However, while FACLs perform a two-step catalytic reaction, AMP ligation followed by CoA ligation using ATP and CoA as cofactors, FAALs produce only the acyl adenylate and are unable to perform the second step. We report X-ray crystal structures of full-length FAAL from Escherichia coli (EcFAAL) and FAAL from Legionella pneumophila (LpFAAL) bound to acyl adenylate, determined at resolution limits of 3.0 and 1.85 {angstrom}, respectively. Themore » structures share a larger N-terminal domain and a smaller C-terminal domain, which together resemble the previously determined structures of FAAL and FACL proteins. Our two structures occur in quite different conformations. EcFAAL adopts the adenylate-forming conformation typical of FACLs, whereas LpFAAL exhibits a unique intermediate conformation. Both EcFAAL and LpFAAL have insertion motifs that distinguish them from the FACLs. Structures of EcFAAL and LpFAAL reveal detailed interactions between this insertion motif and the interdomain hinge region and with the C-terminal domain. We suggest that the insertion motifs support sufficient interdomain motions to allow substrate binding and product release during acyl adenylate formation, but they preclude CoA binding, thereby preventing CoA ligation.« less
Open reading frame 5 (ORF5), encoding a ferredoxinlike protein, and nifQ are cotranscribed with nifE, nifN, nifX, and ORF4 in Rhodobacter capsulatus.

PubMed Central

Moreno-Vivian, C; Hennecke, S; Pühler, A; Klipp, W

1989-01-01

DNA sequence analysis of a 1,600-base-pair fragment located downstream of nifENX in nif region A of Rhodobacter capsulatus revealed two additional open reading frames (ORFs): ORF5, encoding a ferredoxinlike protein, and nifQ. The ferredoxinlike gene product contained two cysteine motifs, typical of ferredoxins coordinating two 4Fe-4S clusters, but the distance between these two motifs was unusual for low-molecular-weight ferredoxins. The R. capsulatus nifQ gene product shared a high degree of homology with Klebsiella pneumoniae and Azotobacter vinelandii NifQ, including a typical cysteine motif located in the C-terminal part. nifQ insertion mutants and also an ORF5-nifQ double deletion mutant showed normal diazotrophic growth only in the presence of high concentrations of molybdate. This demonstrated that the gene encoding the ferredoxinlike protein is not essential for nitrogen fixation. No NifA-activated consensus promoter could be found in the intergenic region between nifENX-ORF4 and ORF5-nifQ. Analyses of a nifQ-lacZYA fusion revealed that transcription of nifQ was initiated at a promoter in front of nifE. In contrast to other nitrogen-fixing organisms, R. capsulatus nifE, nifN, nifX, ORF4, ORF5, and nifQ were organized in one transcriptional unit. PMID:2708314
Structure of Rhodococcus equi virulence-associated protein B (VapB) reveals an eight-stranded antiparallel β-barrel consisting of two Greek-key motifs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Geerds, Christina; Wohlmann, Jens; Haas, Albert

The structure of VapB, a member of the Vap protein family that is involved in virulence of the bacterial pathogen R. equi, was determined by SAD phasing and reveals an eight-stranded antiparallel β-barrel similar to avidin, suggestive of a binding function. Made up of two Greek-key motifs, the topology of VapB is unusual or even unique. Members of the virulence-associated protein (Vap) family from the pathogen Rhodococcus equi regulate virulence in an unknown manner. They do not share recognizable sequence homology with any protein of known structure. VapB and VapA are normally associated with isolates from pigs and horses, respectively.more » To contribute to a molecular understanding of Vap function, the crystal structure of a protease-resistant VapB fragment was determined at 1.4 Å resolution. The structure was solved by SAD phasing employing the anomalous signal of one endogenous S atom and two bound Co ions with low occupancy. VapB is an eight-stranded antiparallel β-barrel with a single helix. Structural similarity to avidins suggests a potential binding function. Unlike other eight- or ten-stranded β-barrels found in avidins, bacterial outer membrane proteins, fatty-acid-binding proteins and lysozyme inhibitors, Vaps do not have a next-neighbour arrangement but consist of two Greek-key motifs with strand order 41238567, suggesting an unusual or even unique topology.« less
Motif discovery and motif finding from genome-mapped DNase footprint data.

PubMed

Kulakovskiy, Ivan V; Favorov, Alexander V; Makeev, Vsevolod J

2009-09-15

Footprint data is an important source of information on transcription factor recognition motifs. However, a footprinting fragment can contain no sequences similar to known protein recognition sites. Inspection of genome fragments nearby can help to identify missing site positions. Genome fragments containing footprints were supplied to a pipeline that constructed a position weight matrix (PWM) for different motif lengths and selected the optimal PWM. Fragments were aligned with the SeSiMCMC sampler and a new heuristic algorithm, Bigfoot. Footprints with missing hits were found for approximately 50% of factors. Adding only 2 bp on both sides of a footprinting fragment recovered most hits. We automatically constructed motifs for 41 Drosophila factors. New motifs can recognize footprints with a greater sensitivity at the same false positive rate than existing models. Also we discuss possible overfitting of constructed motifs. Software and the collection of regulatory motifs are freely available at http://line.imb.ac.ru/DMMPMM.
Identification and preliminary characterization of a protein motif related to the zinc finger.

PubMed Central

Lovering, R; Hanson, I M; Borden, K L; Martin, S; O'Reilly, N J; Evan, G I; Rahman, D; Pappin, D J; Trowsdale, J; Freemont, P S

1993-01-01

We have identified a protein motif, related to the zinc finger, which defines a newly discovered family of proteins. The motif was found in the sequence of the human RING1 gene, which is proximal to the major histocompatibility complex region on chromosome six. We propose naming this motif the "RING finger" and it is found in 27 proteins, all of which have putative DNA binding functions. We have synthesized a peptide corresponding to the RING1 motif and examined a number of properties, including metal and DNA binding. We provide evidence to support the suggestion that the RING finger motif is the DNA binding domain of this newly defined family of proteins. Images Fig. 1 Fig. 4 PMID:7681583
Defining a Conformational Consensus Motif in Cotransin-Sensitive Signal Sequences: A Proteomic and Site-Directed Mutagenesis Study

PubMed Central

Klein, Wolfgang; Westendorf, Carolin; Schmidt, Antje; Conill-Cortés, Mercè; Rutz, Claudia; Blohs, Marcus; Beyermann, Michael; Protze, Jonas; Krause, Gerd; Krause, Eberhard; Schülein, Ralf

2015-01-01

The cyclodepsipeptide cotransin was described to inhibit the biosynthesis of a small subset of proteins by a signal sequence-discriminatory mechanism at the Sec61 protein-conducting channel. However, it was not clear how selective cotransin is, i.e. how many proteins are sensitive. Moreover, a consensus motif in signal sequences mediating cotransin sensitivity has yet not been described. To address these questions, we performed a proteomic study using cotransin-treated human hepatocellular carcinoma cells and the stable isotope labelling by amino acids in cell culture technique in combination with quantitative mass spectrometry. We used a saturating concentration of cotransin (30 micromolar) to identify also less-sensitive proteins and to discriminate the latter from completely resistant proteins. We found that the biosynthesis of almost all secreted proteins was cotransin-sensitive under these conditions. In contrast, biosynthesis of the majority of the integral membrane proteins was cotransin-resistant. Cotransin sensitivity of signal sequences was neither related to their length nor to their hydrophobicity. Instead, in the case of signal anchor sequences, we identified for the first time a conformational consensus motif mediating cotransin sensitivity. PMID:25806945
Sequences characterization of microsatellite DNA sequences in Pacific abalone ( Haliotis discus hannai)

NASA Astrophysics Data System (ADS)

Li, Qi; Akihiro, Kijima

2007-01-01

The microsatellite-enriched library was constructed using magnetic bead hybridization selection method, and the microsatellite DNA sequences were analyzed in Pacific abalone Haliotis discus hannai. Three hundred and fifty white colonies were screened using PCR-based technique, and 84 clones were identified to potentially contain microsatellite repeat motif. The 84 clones were sequenced, and 42 microsatellites and 4 minisatellites with a minimum of five repeats were found (13.1% of white colonies screened). Besides the motif of CA contained in the oligoprobe, we also found other 16 types of microsatellite repeats including a dinucleotide repeat, two tetranucleotide repeats, twelve pentanucleotide repeats and a hexanucleotide repeat. According to Weber (1990), the microsatellite sequences obtained could be categorized structurally into perfect repeats (73.3%), imperfect repeats (13.3%), and compound repeats (13.4%). Among the microsatellite repeats, relatively short arrays (<20 repeats) were most abundant, accounting for 75.0%. The largest length of microsatellites was 48 repeats, and the average number of repeats was 13.4. The data on the composition and length distribution of microsatellites obtained in the present study can be useful for choosing the repeat motifs for microsatellite isolation in other abalone species.
A putative N-terminal nuclear export sequence is sufficient for Mps1 nuclear exclusion during interphase.

PubMed

Jia, Haiwei; Zhang, Xiaojuan; Wang, Wenjun; Bai, Yuanyuan; Ling, Youguo; Cao, Cheng; Ma, Runlin Z; Zhong, Hui; Wang, Xue; Xu, Quanbin

2015-02-27

Mps1, an essential component of the mitotic checkpoint, is also an important interphase regulator and has roles in DNA damage response, cytokinesis and centrosome duplication. Mps1 predominantly resides in the cytoplasm and relocates into the nucleus at the late G2 phase. So far, the mechanism underlying the Mps1 translocation between the cytoplasm and nucleus has been unclear. In this work, a dynamic export process of Mps1 from the nucleus to cytoplasm in interphase was revealed- a process blocked by the Crm1 inhibitor, Leptomycin B, suggesting that export of Mps1 is Crm1 dependent. Consistent with this speculation, a direct association between Mps1 and Crm1 was found. Furthermore, a putative nuclear export sequence (pNES) motif at the N-terminal of Mps1 was identified by analyzing the motif of Mps1. This motif shows a high sequence similarity to the classic NES, a fusion of this motif with EGFP results in dramatic exclusion of the fusion protein from the nucleus. Additionally, Mps1 mutant loss of pNES integrity was shown by replacing leucine with alanine which produced a diffused subcellular distribution, compared to the wild type protein which resides predominantly in cytoplasm. Taken these findings together, it was concluded that the pNES sequence is sufficient for the Mps1 export from nucleus during interphase.
Phylum-Level Conservation of Regulatory Information in Nematodes despite Extensive Non-coding Sequence Divergence

PubMed Central

Gordon, Kacy L.; Arthur, Robert K.; Ruvinsky, Ilya

2015-01-01

Gene regulatory information guides development and shapes the course of evolution. To test conservation of gene regulation within the phylum Nematoda, we compared the functions of putative cis-regulatory sequences of four sets of orthologs (unc-47, unc-25, mec-3 and elt-2) from distantly-related nematode species. These species, Caenorhabditis elegans, its congeneric C. briggsae, and three parasitic species Meloidogyne hapla, Brugia malayi, and Trichinella spiralis, represent four of the five major clades in the phylum Nematoda. Despite the great phylogenetic distances sampled and the extensive sequence divergence of nematode genomes, all but one of the regulatory elements we tested are able to drive at least a subset of the expected gene expression patterns. We show that functionally conserved cis-regulatory elements have no more extended sequence similarity to their C. elegans orthologs than would be expected by chance, but they do harbor motifs that are important for proper expression of the C. elegans genes. These motifs are too short to be distinguished from the background level of sequence similarity, and while identical in sequence they are not conserved in orientation or position. Functional tests reveal that some of these motifs contribute to proper expression. Our results suggest that conserved regulatory circuitry can persist despite considerable turnover within cis elements. PMID:26020930
Sequence requirement of the ade6-4095 meiotic recombination hotspot in Schizosaccharomyces pombe.

PubMed

Foulis, Steven J; Fowler, Kyle R; Steiner, Walter W

2018-02-01

Homologous recombination occurs at a greatly elevated frequency in meiosis compared to mitosis and is initiated by programmed double-strand DNA breaks (DSBs). DSBs do not occur at uniform frequency throughout the genome in most organisms, but occur preferentially at a limited number of sites referred to as hotspots. The location of hotspots have been determined at nucleotide-level resolution in both the budding and fission yeasts, and while several patterns have emerged regarding preferred locations for DSB hotspots, it remains unclear why particular sites experience DSBs at much higher frequency than other sites with seemingly similar properties. Short sequence motifs, which are often sites for binding of transcription factors, are known to be responsible for a number of hotspots. In this study we identified the minimum sequence required for activity of one of such motif identified in a screen of random sequences capable of producing recombination hotspots. The experimentally determined sequence, GGTCTRGACC, closely matches the previously inferred sequence. Full hotspot activity requires an effective sequence length of 9.5 bp, whereas moderate activity requires an effective sequence length of approximately 8.2 bp and shows significant association with DSB hotspots. In combination with our previous work, this result is consistent with a large number of different sequence motifs capable of producing recombination hotspots, and supports a model in which hotspots can be rapidly regenerated by mutation as they are lost through recombination.
Maternal lineages of peach genotypes

USDA-ARS?s Scientific Manuscript database

Simple sequence repeats (SSRs) in chloroplast genomes are useful markers to determine maternal lineages. The SSR mining results revealed that most chloroplast SSRs among three Prunus chloroplast genomes were conserved in locations and motif types, but polymorphic in motif and/or amplicon lengths. Fi...
Changes in Cell Wall Properties Coincide with Overexpression of Extensin Fusion Proteins in Suspension Cultured Tobacco Cells

DOE PAGES

Tan, Li; Pu, Yunqiao; Pattathil, Sivakumar; ...

2014-12-23

Extensins are one subfamily of the cell wall hydroxyproline-rich glycoproteins, containing characteristic SerHyp4 glycosylation motifs and intermolecular cross-linking motifs such as the TyrXaaTyr sequence. Extensins are believed to form a cross-linked network in the plant cell wall through the tyrosine-derivatives isodityrosine, pulcherosine, and di-isodityrosine. Overexpression of three synthetic genes encoding different elastin-arabinogalactan protein-extensin hybrids in tobacco suspension cultured cells yielded novel cross-linking glycoproteins that shared features of the extensins, arabinogalactan proteins and elastin. The cell wall properties of the three transgenic cell lines were all changed, but in different ways. One transgenic cell line showed decreased cellulose crystallinity and increasedmore » wall xyloglucan content; the second transgenic cell line contained dramatically increased hydration capacity and notably increased cell wall biomass, increased di-isodityrosine, and increased protein content; the third transgenic cell line displayed wall phenotypes similar to wild type cells, except changed xyloglucan epitope extractability. In conclusion, these data indicate that overexpression of modified extensins may be a route to engineer plants for bioenergy and biomaterial production.« less
Substrate Specificity and Possible Heterologous Targets of Phytaspase, a Plant Cell Death Protease*

PubMed Central

Galiullina, Raisa A.; Kasperkiewicz, Paulina; Chichkova, Nina V.; Szalek, Aleksandra; Serebryakova, Marina V.; Poreba, Marcin; Drag, Marcin; Vartapetian, Andrey B.

2015-01-01

Plants lack aspartate-specific cell death proteases homologous to animal caspases. Instead, a subtilisin-like serine-dependent plant protease named phytaspase shown to be involved in the accomplishment of programmed death of plant cells is able to hydrolyze a number of peptide-based caspase substrates. Here, we determined the substrate specificity of rice (Oryza sativa) phytaspase by using the positional scanning substrate combinatorial library approach. Phytaspase was shown to display an absolute specificity of hydrolysis after an aspartic acid residue. The preceding amino acid residues, however, significantly influence the efficiency of hydrolysis. Efficient phytaspase substrates demonstrated a remarkable preference for an aromatic amino acid residue in the P3 position. The deduced optimum phytaspase recognition motif has the sequence IWLD and is strikingly hydrophobic. The established pattern was confirmed through synthesis and kinetic analysis of cleavage of a set of optimized peptide substrates. An amino acid motif similar to the phytaspase cleavage site is shared by the human gastrointestinal peptide hormones gastrin and cholecystokinin. In agreement with the established enzyme specificity, phytaspase was shown to hydrolyze gastrin-1 and cholecystokinin at the predicted sites in vitro, thus destroying the active moieties of the hormones. PMID:26283788
Complexity of the 5' Untranslated Region of EIF4A3, a Critical Factor for Craniofacial and Neural Development.

PubMed

Hsia, Gabriella S P; Musso, Camila M; Alvizi, Lucas; Brito, Luciano A; Kobayashi, Gerson S; Pavanello, Rita C M; Zatz, Mayana; Gardham, Alice; Wakeling, Emma; Zechi-Ceide, Roseli M; Bertola, Debora; Passos-Bueno, Maria Rita

2018-01-01

Repeats in coding and non-coding regions have increasingly been associated with many human genetic disorders, such as Richieri-Costa-Pereira syndrome (RCPS). RCPS, mostly characterized by midline cleft mandible, Robin sequence and limb defects, is an autosomal-recessive acrofacial dysostosis mainly reported in Brazilian patients. This disorder is caused by decreased levels of EIF4A3 , mostly due to an increased number of repeats at the EIF4A3 5'UTR. EIF4A3 5'UTR alleles are CG-rich and vary in size and organization of three types of motifs. An exclusive allelic pattern was identified among affected individuals, in which the CGCA-motif is the most prevalent, herein referred as "disease-associated CGCA-20nt motif." The origin of the pathogenic alleles containing the disease-associated motif, as well as the functional effects of the 5'UTR motifs on EIF4A3 expression, to date, are entirely unknown. Here, we characterized 43 different EIF4A3 5'UTR alleles in a cohort of 380 unaffected individuals. We identified eight heterozygous unaffected individuals harboring the disease-associated CGCA-20nt motif and our haplotype analyses indicate that there are more than one haplotype associated with RCPS. The combined analysis of number, motif organization and haplotypic diversity, as well as the observation of two apparently distinct haplotypes associated with the disease-associated CGCA-20nt motif, suggest that the RCPS alleles might have arisen from independent unequal crossing-over events between ancient alleles at least twice. Moreover, we have shown that the number and sequence of motifs in the 5'UTR region is associated with EIF4A3 repression, which is not mediated by CpG methylation. In conclusion, this study has shown that the large number of repeats in EIF4A3 does not represent a dynamic mutation and RCPS can arise in any population harboring alleles with the CGCA-20nt motif. We also provided further evidence that EIF4A3 5'UTR is a regulatory region and the size and sequence type of the repeats at 5'UTR may contribute to clinical variability in RCPS.

Exact calculation of distributions on integers, with application to sequence alignment.

PubMed

Newberg, Lee A; Lawrence, Charles E

2009-01-01

Computational biology is replete with high-dimensional discrete prediction and inference problems. Dynamic programming recursions can be applied to several of the most important of these, including sequence alignment, RNA secondary-structure prediction, phylogenetic inference, and motif finding. In these problems, attention is frequently focused on some scalar quantity of interest, a score, such as an alignment score or the free energy of an RNA secondary structure. In many cases, score is naturally defined on integers, such as a count of the number of pairing differences between two sequence alignments, or else an integer score has been adopted for computational reasons, such as in the test of significance of motif scores. The probability distribution of the score under an appropriate probabilistic model is of interest, such as in tests of significance of motif scores, or in calculation of Bayesian confidence limits around an alignment. Here we present three algorithms for calculating the exact distribution of a score of this type; then, in the context of pairwise local sequence alignments, we apply the approach so as to find the alignment score distribution and Bayesian confidence limits.
Conformational Preference of ‘CαNN’ Short Peptide Motif towards Recognition of Anions

PubMed Central

Banerjee, Raja

2013-01-01

Among several ‘anion binding motifs’, the recently described ‘CαNN’ motif occurring in the loop regions preceding a helix, is conserved through evolution both in sequence and its conformation. To establish the significance of the conserved sequence and their intrinsic affinity for anions, a series of peptides containing the naturally occurring ‘CαNN’ motif at the N-terminus of a designed helix, have been modeled and studied in a context free system using computational techniques. Appearance of a single interacting site with negative binding free-energy for both the sulfate and phosphate ions, as evidenced in docking experiments, establishes that the ‘CαNN’ segment has an intrinsic affinity for anions. Molecular Dynamics (MD) simulation studies reveal that interaction with anion triggers a conformational switch from non-helical to helical state at the ‘CαNN’ segment, which extends the length of the anchoring-helix by one turn at the N-terminus. Computational experiments substantiate the significance of sequence/structural context and justify the conserved nature of the ‘CαNN’ sequence for anion recognition through “local” interaction. PMID:23516403
Comparative analysis of the XopD T3S effector family in plant pathogenic bacteria

PubMed Central

Kim, Jung-Gun; Taylor, Kyle W.; Mudgett, Mary Beth

2011-01-01

SUMMARY XopD is a type III effector protein that is required for Xanthomonas campestris pathovar vesicatoria (Xcv) growth in tomato. It is a modular protein consisting of an N-terminal DNA-binding domain, two EAR transcriptional repressor motifs, and a C-terminal SUMO protease. In tomato, XopD functions as a transcriptional repressor, resulting in the suppression of defense responses at late stages of infection. A survey of available genome sequences for phytopathogenic bacteria revealed that XopD homologs are limited to species within three Genera of Proteobacteria – Xanthomonas, Acidovorax, and Pseudomonas. While the EAR motif(s) and SUMO protease domain are conserved in all the XopD-like proteins, variation exists in the length and sequence identity of the N-terminal domains. Comparative analysis of the DNA sequences surrounding xopD and xopD-like genes led to revised annotation of the xopD gene. Edman degradation sequence analysis and functional complementation studies confirmed that the xopD gene from Xcv encodes a 760 amino acid protein with a longer N-terminal domain than previously predicted. None of the XopD-like proteins studied complemented Xcv ΔxopD mutant phenotypes in tomato leaves suggesting that the N-terminus of XopD defines functional specificity. Xcv ΔxopD strains expressing chimeric fusion proteins containing the N-terminus of XopD fused to the EAR motif(s) and SUMO protease domain of the XopD-like protein from Xanthomonas campestris pathovar campestris strain B100 were fully virulent in tomato demonstrating that the N-terminus of XopD controls specificity in tomato. PMID:21726373
Distribution and diversity of ribosome binding sites in prokaryotic genomes.

PubMed

Omotajo, Damilola; Tate, Travis; Cho, Hyuk; Choudhary, Madhusudan

2015-08-14

Prokaryotic translation initiation involves the proper docking, anchoring, and accommodation of mRNA to the 30S ribosomal subunit. Three initiation factors (IF1, IF2, and IF3) and some ribosomal proteins mediate the assembly and activation of the translation initiation complex. Although the interaction between Shine-Dalgarno (SD) sequence and its complementary sequence in the 16S rRNA is important in initiation, some genes lacking an SD ribosome binding site (RBS) are still well expressed. The objective of this study is to examine the pattern of distribution and diversity of RBS in fully sequenced bacterial genomes. The following three hypotheses were tested: SD motifs are prevalent in bacterial genomes; all previously identified SD motifs are uniformly distributed across prokaryotes; and genes with specific cluster of orthologous gene (COG) functions differ in their use of SD motifs. Data for 2,458 bacterial genomes, previously generated by Prodigal (PROkaryotic DYnamic programming Gene-finding ALgorithm) and currently available at the National Center for Biotechnology Information (NCBI), were analyzed. Of the total genes examined, ~77.0% use an SD RBS, while ~23.0% have no RBS. Majority of the genes with the most common SD motifs are distributed in a manner that is representative of their abundance for each COG functional category, while motifs 13 (5'-GGA-3'/5'-GAG-3'/5'-AGG-3') and 27 (5'-AGGAGG-3') appear to be predominantly used by genes for information storage and processing, and translation and ribosome biogenesis, respectively. These findings suggest that an SD sequence is not obligatory for translation initiation; instead, other signals, such as the RBS spacer, may have an overarching influence on translation of mRNAs. Subsequent analyses of the 5' secondary structure of these mRNAs may provide further insight into the translation initiation mechanism.
Transmissible Gastroenteritis Coronavirus Genome Packaging Signal Is Located at the 5′ End of the Genome and Promotes Viral RNA Incorporation into Virions in a Replication-Independent Process

PubMed Central

Morales, Lucia; Mateos-Gomez, Pedro A.; Capiscol, Carmen; del Palacio, Lorena; Sola, Isabel

2013-01-01

Preferential RNA packaging in coronaviruses involves the recognition of viral genomic RNA, a crucial process for viral particle morphogenesis mediated by RNA-specific sequences, known as packaging signals. An essential packaging signal component of transmissible gastroenteritis coronavirus (TGEV) has been further delimited to the first 598 nucleotides (nt) from the 5′ end of its RNA genome, by using recombinant viruses transcribing subgenomic mRNA that included potential packaging signals. The integrity of the entire sequence domain was necessary because deletion of any of the five structural motifs defined within this region abrogated specific packaging of this viral RNA. One of these RNA motifs was the stem-loop SL5, a highly conserved motif in coronaviruses located at nucleotide positions 106 to 136. Partial deletion or point mutations within this motif also abrogated packaging. Using TGEV-derived defective minigenomes replicated in trans by a helper virus, we have shown that TGEV RNA packaging is a replication-independent process. Furthermore, the last 494 nt of the genomic 3′ end were not essential for packaging, although this region increased packaging efficiency. TGEV RNA sequences identified as necessary for viral genome packaging were not sufficient to direct packaging of a heterologous sequence derived from the green fluorescent protein gene. These results indicated that TGEV genome packaging is a complex process involving many factors in addition to the identified RNA packaging signal. The identification of well-defined RNA motifs within the TGEV RNA genome that are essential for packaging will be useful for designing packaging-deficient biosafe coronavirus-derived vectors and providing new targets for antiviral therapies. PMID:23966403
A dinucleotide motif in oligonucleotides shows potent immunomodulatory activity and overrides species-specific recognition observed with CpG motif.

PubMed

Kandimalla, Ekambar R; Bhagat, Lakshmi; Zhu, Fu-Gang; Yu, Dong; Cong, Yan-Ping; Wang, Daqing; Tang, Jimmy X; Tang, Jin-Yan; Knetter, Cathrine F; Lien, Egil; Agrawal, Sudhir

2003-11-25

Bacterial and synthetic DNAs containing CpG dinucleotides in specific sequence contexts activate the vertebrate immune system through Toll-like receptor 9 (TLR9). In the present study, we used a synthetic nucleoside with a bicyclic heterobase [1-(2'-deoxy-beta-d-ribofuranosyl)-2-oxo-7-deaza-8-methyl-purine; R] to replace the C in CpG, resulting in an RpG dinucleotide. The RpG dinucleotide was incorporated in mouse- and human-specific motifs in oligodeoxynucleotides (oligos) and 3'-3-linked oligos, referred to as immunomers. Oligos containing the RpG motif induced cytokine secretion in mouse spleen-cell cultures. Immunomers containing RpG dinucleotides showed activity in transfected-HEK293 cells stably expressing mouse TLR9, suggesting direct involvement of TLR9 in the recognition of RpG motif. In J774 macrophages, RpG motifs activated NF-kappa B and mitogen-activated protein kinase pathways. Immunomers containing the RpG dinucleotide induced high levels of IL-12 and IFN-gamma, but lower IL-6 in time- and concentration-dependent fashion in mouse spleen-cell cultures costimulated with IL-2. Importantly, immunomers containing GTRGTT and GARGTT motifs were recognized to a similar extent by both mouse and human immune systems. Additionally, both mouse- and human-specific RpG immunomers potently stimulated proliferation of peripheral blood mononuclear cells obtained from diverse vertebrate species, including monkey, pig, horse, sheep, goat, rat, and chicken. An immunomer containing GTRGTT motif prevented conalbumin-induced and ragweed allergen-induced allergic inflammation in mice. We show that a synthetic bicyclic nucleotide is recognized in the C position of a CpG dinucleotide by immune cells from diverse vertebrate species without bias for flanking sequences, suggesting a divergent nucleotide motif recognition pattern of TLR9.
The amino acid motif L/IIxxFE defines a novel actin-binding sequence in PDZ-RhoGEF

PubMed Central

Banerjee, Jayashree; Fischer, Christopher C.; Wedegaertner, Philip B.

2009-01-01

PDZ-RhoGEF is a member of the regulator of G protein signaling (RGS) domain-containing RhoGEFs (RGS-RhoGEFs) that link activated heterotrimeric G protein α subunits of the G12 family to activation of the small GTPase RhoA. Unique among the RGS-RhoGEFs, PDZ-RhoGEF contains a short sequence that localizes the protein to the actin cytoskeleton. In this report, we demonstrate that the actin-binding domain, located between amino acids 561–585, directly binds to F-actin in vitro. Extensive mutagenesis identifies isoleucine 568, isoleucine 569, phenylalanine 572, and glutamic acid 573 as necessary for binding to actin and for co-localization with the actin cytoskeleton in cells. These results define a novel actin-binding sequence in PDZ-RhoGEF with a critical amino acid motif of IIxxFE. Moreover, sequence analysis identifies a similar actin-binding motif in the N-terminus of the RhoGEF frabin, and, as with PDZ-RhoGEF, mutagenesis and actin interaction experiments demonstrate a motif of LIxxFE, consisting of the key amino acids leucine 23, isoleucine 24, phenylalanine 27, and glutamic acid 28. Taken together, results with PDZ-RhoGEF and frabin identify a novel actin binding sequence. Lastly, inducible dimerization of the actin-binding region of PDZ-RhoGEF revealed a dimerization-dependent actin bundling activity in vitro. PDZ-RhoGEF exists in cells as a dimer, raising the possibility that PDZ-RhoGEF could influence actin structure independent of its ability to activate RhoA. PMID:19618964
Entropic Profiler – detection of conservation in genomes using information theory

PubMed Central

Fernandes, Francisco; Freitas, Ana T; Almeida, Jonas S; Vinga, Susana

2009-01-01

Background In the last decades, with the successive availability of whole genome sequences, many research efforts have been made to mathematically model DNA. Entropic Profiles (EP) were proposed recently as a new measure of continuous entropy of genome sequences. EP represent local information plots related to DNA randomness and are based on information theory and statistical concepts. They express the weighed relative abundance of motifs for each position in genomes. Their study is very relevant because under or over-representation segments are often associated with significant biological meaning. Findings The Entropic Profiler application here presented is a new tool designed to detect and extract under and over-represented DNA segments in genomes by using EP. It allows its computation in a very efficient way by recurring to improved algorithms and data structures, which include modified suffix trees. Available through a web interface and as downloadable source code, it allows to study positions and to search for motifs inside the whole sequence or within a specified range. DNA sequences can be entered from different sources, including FASTA files, pre-loaded examples or resuming a previously saved work. Besides the EP value plots, p-values and z-scores for each motif are also computed, along with the Chaos Game Representation of the sequence. Conclusion EP are directly related with the statistical significance of motifs and can be considered as a new method to extract and classify significant regions in genomes and estimate local scales in DNA. The present implementation establishes an efficient and useful tool for whole genome analysis. PMID:19416538
CisSERS: Customizable in silico sequence evaluation for restriction sites

DOE PAGES

Sharpe, Richard M.; Koepke, Tyson; Harper, Artemus; ...

2016-04-12

High-throughput sequencing continues to produce an immense volume of information that is processed and assembled into mature sequence data. Here, data analysis tools are urgently needed that leverage the embedded DNA sequence polymorphisms and consequent changes to restriction sites or sequence motifs in a high-throughput manner to enable biological experimentation. CisSERS was developed as a standalone open source tool to analyze sequence datasets and provide biologists with individual or comparative genome organization information in terms of presence and frequency of patterns or motifs such as restriction enzymes. Predicted agarose gel visualization of the custom analyses results was also integrated tomore » enhance the usefulness of the software. CisSERS offers several novel functionalities, such as handling of large and multiple datasets in parallel, multiple restriction enzyme site detection and custom motif detection features, which are seamlessly integrated with real time agarose gel visualization. Using a simple fasta-formatted file as input, CisSERS utilizes the REBASE enzyme database. Results from CisSERSenable the user to make decisions for designing genotyping by sequencing experiments, reduced representation sequencing, 3’UTR sequencing, and cleaved amplified polymorphic sequence (CAPS) molecular markers for large sample sets. CisSERS is a java based graphical user interface built around a perl backbone. Several of the applications of CisSERS including CAPS molecular marker development were successfully validated using wet-lab experimentation. Here, we present the tool CisSERSand results from in-silico and corresponding wet-lab analyses demonstrating that CisSERS is a technology platform solution that facilitates efficient data utilization in genomics and genetics studies.« less
CisSERS: Customizable in silico sequence evaluation for restriction sites

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sharpe, Richard M.; Koepke, Tyson; Harper, Artemus

High-throughput sequencing continues to produce an immense volume of information that is processed and assembled into mature sequence data. Here, data analysis tools are urgently needed that leverage the embedded DNA sequence polymorphisms and consequent changes to restriction sites or sequence motifs in a high-throughput manner to enable biological experimentation. CisSERS was developed as a standalone open source tool to analyze sequence datasets and provide biologists with individual or comparative genome organization information in terms of presence and frequency of patterns or motifs such as restriction enzymes. Predicted agarose gel visualization of the custom analyses results was also integrated tomore » enhance the usefulness of the software. CisSERS offers several novel functionalities, such as handling of large and multiple datasets in parallel, multiple restriction enzyme site detection and custom motif detection features, which are seamlessly integrated with real time agarose gel visualization. Using a simple fasta-formatted file as input, CisSERS utilizes the REBASE enzyme database. Results from CisSERSenable the user to make decisions for designing genotyping by sequencing experiments, reduced representation sequencing, 3’UTR sequencing, and cleaved amplified polymorphic sequence (CAPS) molecular markers for large sample sets. CisSERS is a java based graphical user interface built around a perl backbone. Several of the applications of CisSERS including CAPS molecular marker development were successfully validated using wet-lab experimentation. Here, we present the tool CisSERSand results from in-silico and corresponding wet-lab analyses demonstrating that CisSERS is a technology platform solution that facilitates efficient data utilization in genomics and genetics studies.« less
Counting of oligomers in sequences generated by markov chains for DNA motif discovery.

PubMed

Shan, Gao; Zheng, Wei-Mou

2009-02-01

By means of the technique of the imbedded Markov chain, an efficient algorithm is proposed to exactly calculate first, second moments of word counts and the probability for a word to occur at least once in random texts generated by a Markov chain. A generating function is introduced directly from the imbedded Markov chain to derive asymptotic approximations for the problem. Two Z-scores, one based on the number of sequences with hits and the other on the total number of word hits in a set of sequences, are examined for discovery of motifs on a set of promoter sequences extracted from A. thaliana genome. Source code is available at http://www.itp.ac.cn/zheng/oligo.c.
Identification and characterization of a NBS–LRR class resistance gene analog in Pistacia atlantica subsp. Kurdica

PubMed Central

Bahramnejad, Bahman

2014-01-01

P. atlantica subsp. Kurdica, with the local name of Baneh, is a wild medicinal plant which grows in Kurdistan, Iran. The identification of resistance gene analogs holds great promise for the development of resistant cultivars. A PCR approach with degenerate primers designed according to conserved NBS-LRR (nucleotide binding site-leucine rich repeat) regions of known disease-resistance (R) genes was used to amplify and clone homologous sequences from P. atlantica subsp. Kurdica. A DNA fragment of the expected 500-bp size was amplified. The nucleotide sequence of this amplicon was obtained through sequencing and the predicted amino acid sequence compared to the amino acid sequences of known R-genes revealed significant sequence similarity. Alignment of the deduced amino acid sequence of P. atlantica subsp. Kurdica resistance gene analog (RGA) showed strong identity, ranging from 68% to 77%, to the non-toll interleukin receptor (non-TIR) R-gene subfamily from other plants. A P-loop motif (GMMGGEGKTT), a conserved and hydrophobic motif GLPLAL, a kinase-2a motif (LLVLDDV), when replaced by IAVFDDI in PAKRGA1 and a kinase-3a (FGPGSRIII) were presented in all RGA. A phylogenetic tree, based on the deduced amino-acid sequences of PAKRGA1 and RGAs from different species indicated that they were separated in two clusters, PAKRGA1 being on cluster II. The isolated NBS analogs can be eventually used as guidelines to isolate numerous R-genes in Pistachio. PMID:27843981
Probing the Potential Role of Non-B DNA Structures at Yeast Meiosis-Specific DNA Double-Strand Breaks.

PubMed

Kshirsagar, Rucha; Khan, Krishnendu; Joshi, Mamata V; Hosur, Ramakrishna V; Muniyappa, K

2017-05-23

A plethora of evidence suggests that different types of DNA quadruplexes are widely present in the genome of all organisms. The existence of a growing number of proteins that selectively bind and/or process these structures underscores their biological relevance. Moreover, G-quadruplex DNA has been implicated in the alignment of four sister chromatids by forming parallel guanine quadruplexes during meiosis; however, the underlying mechanism is not well defined. Here we show that a G/C-rich motif associated with a meiosis-specific DNA double-strand break (DSB) in Saccharomyces cerevisiae folds into G-quadruplex, and the C-rich sequence complementary to the G-rich sequence forms an i-motif. The presence of G-quadruplex or i-motif structures upstream of the green fluorescent protein-coding sequence markedly reduces the levels of gfp mRNA expression in S. cerevisiae cells, with a concomitant decrease in green fluorescent protein abundance, and blocks primer extension by DNA polymerase, thereby demonstrating the functional significance of these structures. Surprisingly, although S. cerevisiae Hop1, a component of synaptonemal complex axial/lateral elements, exhibits strong affinity to G-quadruplex DNA, it displays a much weaker affinity for the i-motif structure. However, the Hop1 C-terminal but not the N-terminal domain possesses strong i-motif binding activity, implying that the C-terminal domain has a distinct substrate specificity. Additionally, we found that Hop1 promotes intermolecular pairing between G/C-rich DNA segments associated with a meiosis-specific DSB site. Our results support the idea that the G/C-rich motifs associated with meiosis-specific DSBs fold into intramolecular G-quadruplex and i-motif structures, both in vitro and in vivo, thus revealing an important link between non-B form DNA structures and Hop1 in meiotic chromosome synapsis and recombination. Copyright © 2017 Biophysical Society. Published by Elsevier Inc. All rights reserved.
A Glance at Microsatellite Motifs from 454 Sequencing Reads of Watermelon Genomic DNA

USDA-ARS?s Scientific Manuscript database

A single 454 (Life Sciences Sequencing Technology) run of Charleston Gray watermelon (Citrullus lanatus var. lanatus) genomic DNA was performed and sequence data were assembled. A large scale identification of simple sequence repeat (SSR) was performed and SSR sequence data were used for the develo...
The Thiamin Pyrophosphate-Motif

NASA Technical Reports Server (NTRS)

Dominiak, Paulina M.; Ciszak, Ewa M.

2003-01-01

Using databases the authors have identified a common thiamin pyrophosphate (TPP)-motif in the family of functionally diverse TPP-dependent enzymes. This common motif consists of multimeric organization of subunits, two catalytic centers, common amino acid sequence, and specific contacts to provide a flip-flop, or alternate site, mechanism of action. Each catalytic center [PP:PYR] is formed at the interface of the PP-domain binding the magnesium ion, pyrophosphate and aminopyrimidine ring of TPP, and the PYR-domain binding the aminopyrimidine ring of that cofactor. A pair of these catalytic centers constitutes the catalytic core [PP:PYR]* within these enzymes. Analysis of the structural elements of this catalytic core reveals novel definition of the common amino acid sequences, which are GX@&(G)@XXGQ, and GDGX25-30 within the PP- domain, and the E&(G)@XXG@ within the PYR-domain, where Q, corresponds to a hydrophobic amino acid. This TPP-motif provides a novel tool for annotation of TPP-dependent enzymes useful in advancing functional proteomics.
Molecular cloning of actin genes in Trichomonas vaginalis and phylogeny inferred from actin sequences.

PubMed

Bricheux, G; Brugerolle, G

1997-08-01

The parasitic protozoan Trichomonas vaginalis is known to contain the ubiquitous and highly conserved protein actin. A genomic library and a cDNA library have been screened to identify and clone the actin gene(s) of T. vaginalis. The nucleotide sequence of one gene and its flanking regions have been determined. The open reading frame encodes a protein of 376 amino acids. The sequence is not interrupted by any introns and the promoter could be represented by a 10 bp motif close to a consensus motif also found upstream of most sequenced T. vaginalis genes. The five different clones isolated from the cDNA library have similar sequences and encode three actin proteins differing only by one or two amino acids. A phylogenetic analysis of 31 actin sequences by distance matrix and parsimony methods, using centractin as outgroup, gives congruent trees with Parabasala branching above Diplomonadida.
De-novo discovery of differentially abundant transcription factor binding sites including their positional preference.

PubMed

Keilwagen, Jens; Grau, Jan; Paponov, Ivan A; Posch, Stefan; Strickert, Marc; Grosse, Ivo

2011-02-10

Transcription factors are a main component of gene regulation as they activate or repress gene expression by binding to specific binding sites in promoters. The de-novo discovery of transcription factor binding sites in target regions obtained by wet-lab experiments is a challenging problem in computational biology, which has not been fully solved yet. Here, we present a de-novo motif discovery tool called Dispom for finding differentially abundant transcription factor binding sites that models existing positional preferences of binding sites and adjusts the length of the motif in the learning process. Evaluating Dispom, we find that its prediction performance is superior to existing tools for de-novo motif discovery for 18 benchmark data sets with planted binding sites, and for a metazoan compendium based on experimental data from micro-array, ChIP-chip, ChIP-DSL, and DamID as well as Gene Ontology data. Finally, we apply Dispom to find binding sites differentially abundant in promoters of auxin-responsive genes extracted from Arabidopsis thaliana microarray data, and we find a motif that can be interpreted as a refined auxin responsive element predominately positioned in the 250-bp region upstream of the transcription start site. Using an independent data set of auxin-responsive genes, we find in genome-wide predictions that the refined motif is more specific for auxin-responsive genes than the canonical auxin-responsive element. In general, Dispom can be used to find differentially abundant motifs in sequences of any origin. However, the positional distribution learned by Dispom is especially beneficial if all sequences are aligned to some anchor point like the transcription start site in case of promoter sequences. We demonstrate that the combination of searching for differentially abundant motifs and inferring a position distribution from the data is beneficial for de-novo motif discovery. Hence, we make the tool freely available as a component of the open-source Java framework Jstacs and as a stand-alone application at http://www.jstacs.de/index.php/Dispom.
info-gibbs: a motif discovery algorithm that directly optimizes information content during sampling.

PubMed

Defrance, Matthieu; van Helden, Jacques

2009-10-15

Discovering cis-regulatory elements in genome sequence remains a challenging issue. Several methods rely on the optimization of some target scoring function. The information content (IC) or relative entropy of the motif has proven to be a good estimator of transcription factor DNA binding affinity. However, these information-based metrics are usually used as a posteriori statistics rather than during the motif search process itself. We introduce here info-gibbs, a Gibbs sampling algorithm that efficiently optimizes the IC or the log-likelihood ratio (LLR) of the motif while keeping computation time low. The method compares well with existing methods like MEME, BioProspector, Gibbs or GAME on both synthetic and biological datasets. Our study shows that motif discovery techniques can be enhanced by directly focusing the search on the motif IC or the motif LLR. http://rsat.ulb.ac.be/rsat/info-gibbs
Spectroscopic studies on peptides and proteins with cysteine-containing heme regulatory motifs (HRM).

PubMed

Schubert, Erik; Florin, Nicole; Duthie, Fraser; Henning Brewitz, H; Kühl, Toni; Imhof, Diana; Hagelueken, Gregor; Schiemann, Olav

2015-07-01

The role of heme as a cofactor in enzymatic reactions has been studied for a long time and in great detail. Recently it was discovered that heme can also serve as a signalling molecule in cells but so far only few examples of this regulation have been studied. In order to discover new potentially heme-regulated proteins, we screened protein sequence databases for bacterial proteins that contain sequence features like a Cysteine-Proline (CP) motif, which is known for its heme-binding propensity. Based on this search we synthesized a series of these potential heme regulatory motifs (HRMs). We used cw EPR spectroscopy to investigate whether these sequences do indeed bind to heme and if the spin state of heme is changed upon interaction with the peptides. The corresponding proteins of two potential HRMs, FeoB and GlpF, were expressed and purified and their interaction with heme was studied by cw EPR and UV-Visible (UV-Vis) spectroscopy. Copyright © 2015 Elsevier Inc. All rights reserved.
Promzea: a pipeline for discovery of co-regulatory motifs in maize and other plant species and its application to the anthocyanin and phlobaphene biosynthetic pathways and the Maize Development Atlas.

PubMed

Liseron-Monfils, Christophe; Lewis, Tim; Ashlock, Daniel; McNicholas, Paul D; Fauteux, François; Strömvik, Martina; Raizada, Manish N

2013-03-15

The discovery of genetic networks and cis-acting DNA motifs underlying their regulation is a major objective of transcriptome studies. The recent release of the maize genome (Zea mays L.) has facilitated in silico searches for regulatory motifs. Several algorithms exist to predict cis-acting elements, but none have been adapted for maize. A benchmark data set was used to evaluate the accuracy of three motif discovery programs: BioProspector, Weeder and MEME. Analysis showed that each motif discovery tool had limited accuracy and appeared to retrieve a distinct set of motifs. Therefore, using the benchmark, statistical filters were optimized to reduce the false discovery ratio, and then remaining motifs from all programs were combined to improve motif prediction. These principles were integrated into a user-friendly pipeline for motif discovery in maize called Promzea, available at http://www.promzea.org and on the Discovery Environment of the iPlant Collaborative website. Promzea was subsequently expanded to include rice and Arabidopsis. Within Promzea, a user enters cDNA sequences or gene IDs; corresponding upstream sequences are retrieved from the maize genome. Predicted motifs are filtered, combined and ranked. Promzea searches the chosen plant genome for genes containing each candidate motif, providing the user with the gene list and corresponding gene annotations. Promzea was validated in silico using a benchmark data set: the Promzea pipeline showed a 22% increase in nucleotide sensitivity compared to the best standalone program tool, Weeder, with equivalent nucleotide specificity. Promzea was also validated by its ability to retrieve the experimentally defined binding sites of transcription factors that regulate the maize anthocyanin and phlobaphene biosynthetic pathways. Promzea predicted additional promoter motifs, and genome-wide motif searches by Promzea identified 127 non-anthocyanin/phlobaphene genes that each contained all five predicted promoter motifs in their promoters, perhaps uncovering a broader co-regulated gene network. Promzea was also tested against tissue-specific microarray data from maize. An online tool customized for promoter motif discovery in plants has been generated called Promzea. Promzea was validated in silico by its ability to retrieve benchmark motifs and experimentally defined motifs and was tested using tissue-specific microarray data. Promzea predicted broader networks of gene regulation associated with the historic anthocyanin and phlobaphene biosynthetic pathways. Promzea is a new bioinformatics tool for understanding transcriptional gene regulation in maize and has been expanded to include rice and Arabidopsis.

Sequence analysis of the L protein of the Ebola 2014 outbreak: Insight into conserved regions and mutations.

PubMed

Ayub, Gohar; Waheed, Yasir

2016-06-01

The 2014 Ebola outbreak was one of the largest that have occurred; it started in Guinea and spread to Nigeria, Liberia and Sierra Leone. Phylogenetic analysis of the current virus species indicated that this outbreak is the result of a divergent lineage of the Zaire ebolavirus. The L protein of Ebola virus (EBOV) is the catalytic subunit of the RNA‑dependent RNA polymerase complex, which, with VP35, is key for the replication and transcription of viral RNA. Earlier sequence analysis demonstrated that the L protein of all non‑segmented negative‑sense (NNS) RNA viruses consists of six domains containing conserved functional motifs. The aim of the present study was to analyze the presence of these motifs in 2014 EBOV isolates, highlight their function and how they may contribute to the overall pathogenicity of the isolates. For this purpose, 81 2014 EBOV L protein sequences were aligned with 475 other NNS RNA viruses, including Paramyxoviridae and Rhabdoviridae viruses. Phylogenetic analysis of all EBOV outbreak L protein sequences was also performed. Analysis of the amino acid substitutions in the 2014 EBOV outbreak was conducted using sequence analysis. The alignment demonstrated the presence of previously conserved motifs in the 2014 EBOV isolates and novel residues. Notably, all the mutations identified in the 2014 EBOV isolates were tolerant, they were pathogenic with certain examples occurring within previously determined functional conserved motifs, possibly altering viral pathogenicity, replication and virulence. The phylogenetic analysis demonstrated that all sequences with the exception of the 2014 EBOV sequences were clustered together. The 2014 EBOV outbreak has acquired a great number of mutations, which may explain the reasons behind this unprecedented outbreak. Certain residues critical to the function of the polymerase remain conserved and may be targets for the development of antiviral therapeutic agents.
Identification of a gene for an ancient cytokine, interleukin 15-like, in mammals; interleukins 2 and 15 co-evolved with this third family member, all sharing binding motifs for IL-15Rα.

PubMed

Dijkstra, Johannes M; Takizawa, Fumio; Fischer, Uwe; Friedrich, Maik; Soto-Lampe, Veronica; Lefèvre, Christophe; Lenk, Matthias; Karger, Axel; Matsui, Taei; Hashimoto, Keiichiro

2014-02-01

Interleukins 2 and 15 (IL-2 and IL-15) are highly differentiated but related cytokines with overlapping, yet also distinct functions, and established benefits for medical drug use. The present study identified a gene for an ancient third IL-2/15 family member in reptiles and mammals, interleukin 15-like (IL-15L), which hitherto was only reported in fish. IL-15L genes with intact open reading frames (ORFs) and evidence of transcription, and a recent past of purifying selection, were found for cattle, horse, sheep, pig and rabbit. In human and mouse the IL-15L ORF is incapacitated. Although deduced IL-15L proteins share only ~21 % overall amino acid identity with IL-15, they share many of the IL-15 residues important for binding to receptor chain IL-15Rα, and recombinant bovine IL-15L was shown to interact with IL-15Rα indeed. Comparison of sequence motifs indicates that capacity for binding IL-15Rα is an ancestral characteristic of the IL-2/15/15L family, in accordance with a recent study which showed that in fish both IL-2 and IL-15 can bind IL-15Rα. Evidence reveals that the species lineage leading to mammals started out with three similar cytokines IL-2, IL-15 and IL-15L, and that later in evolution (1) IL-2 and IL-2Rα receptor chain acquired a new and specific binding mode and (2) IL-15L was lost in several but not all groups of mammals. The present study forms an important step forward in understanding this potent family of cytokines, and may help to improve future strategies for their application in veterinarian and human medicine.
Efficient exact motif discovery.

PubMed

Marschall, Tobias; Rahmann, Sven

2009-06-15

The motif discovery problem consists of finding over-represented patterns in a collection of biosequences. It is one of the classical sequence analysis problems, but still has not been satisfactorily solved in an exact and efficient manner. This is partly due to the large number of possibilities of defining the motif search space and the notion of over-representation. Even for well-defined formalizations, the problem is frequently solved in an ad hoc manner with heuristics that do not guarantee to find the best motif. We show how to solve the motif discovery problem (almost) exactly on a practically relevant space of IUPAC generalized string patterns, using the p-value with respect to an i.i.d. model or a Markov model as the measure of over-representation. In particular, (i) we use a highly accurate compound Poisson approximation for the null distribution of the number of motif occurrences. We show how to compute the exact clump size distribution using a recently introduced device called probabilistic arithmetic automaton (PAA). (ii) We define two p-value scores for over-representation, the first one based on the total number of motif occurrences, the second one based on the number of sequences in a collection with at least one occurrence. (iii) We describe an algorithm to discover the optimal pattern with respect to either of the scores. The method exploits monotonicity properties of the compound Poisson approximation and is by orders of magnitude faster than exhaustive enumeration of IUPAC strings (11.8 h compared with an extrapolated runtime of 4.8 years). (iv) We justify the use of the proposed scores for motif discovery by showing our method to outperform other motif discovery algorithms (e.g. MEME, Weeder) on benchmark datasets. We also propose new motifs on Mycobacterium tuberculosis. The method has been implemented in Java. It can be obtained from http://ls11-www.cs.tu-dortmund.de/people/marschal/paa_md/.
Full-length genome sequence of a simian immunodeficiency virus from a wild-captured sun-tailed monkey in Gabon provides evidence for a species-specific monophyletic SIVsun lineage.

PubMed

Liégeois, Florian; Butel, Christelle; Mouinga-Ondéme, Augustin; Verrier, Delphine; Motsch, Peggy; Gonzalez, Jean-Paul; Peeters, Martine; Rouet, François; Onanga, Richard

2011-11-01

Since the first characterization of SIVsun (L14 strain) from a sun-tailed monkey (Cercopithecus solatus) in Gabon in 1999, no further information exists about the evolutionary history and geographic distribution of this lentivirus. Here, we report the full-length molecular characterization of a second SIVsun virus (SIVsunK08) naturally infecting a wild-caught sun-tailed monkey. The SIVsunK08 strain was most closely related to SIVsunL14 and clustered with members of the SIVmnd-1/SIVlhoest group. SIVsunK08 shared identical functional motifs in the LTR, Gag and Env proteins with SIVsunL14. Our data indicate that C. solatus is naturally infected with a monophyletic SIVsun strain.
SLIDER: a generic metaheuristic for the discovery of correlated motifs in protein-protein interaction networks.

PubMed

Boyen, Peter; Van Dyck, Dries; Neven, Frank; van Ham, Roeland C H J; van Dijk, Aalt D J

2011-01-01

Correlated motif mining (cmm) is the problem of finding overrepresented pairs of patterns, called motifs, in sequences of interacting proteins. Algorithmic solutions for cmm thereby provide a computational method for predicting binding sites for protein interaction. In this paper, we adopt a motif-driven approach where the support of candidate motif pairs is evaluated in the network. We experimentally establish the superiority of the Chi-square-based support measure over other support measures. Furthermore, we obtain that cmm is an np-hard problem for a large class of support measures (including Chi-square) and reformulate the search for correlated motifs as a combinatorial optimization problem. We then present the generic metaheuristic slider which uses steepest ascent with a neighborhood function based on sliding motifs and employs the Chi-square-based support measure. We show that slider outperforms existing motif-driven cmm methods and scales to large protein-protein interaction networks. The slider-implementation and the data used in the experiments are available on http://bioinformatics.uhasselt.be.
Amyloid fibril formation from sequences of a natural beta-structured fibrous protein, the adenovirus fiber.

PubMed

Papanikolopoulou, Katerina; Schoehn, Guy; Forge, Vincent; Forsyth, V Trevor; Riekel, Christian; Hernandez, Jean-François; Ruigrok, Rob W H; Mitraki, Anna

2005-01-28

Amyloid fibrils are fibrous beta-structures that derive from abnormal folding and assembly of peptides and proteins. Despite a wealth of structural studies on amyloids, the nature of the amyloid structure remains elusive; possible connections to natural, beta-structured fibrous motifs have been suggested. In this work we focus on understanding amyloid structure and formation from sequences of a natural, beta-structured fibrous protein. We show that short peptides (25 to 6 amino acids) corresponding to repetitive sequences from the adenovirus fiber shaft have an intrinsic capacity to form amyloid fibrils as judged by electron microscopy, Congo Red binding, infrared spectroscopy, and x-ray fiber diffraction. In the presence of the globular C-terminal domain of the protein that acts as a trimerization motif, the shaft sequences adopt a triple-stranded, beta-fibrous motif. We discuss the possible structure and arrangement of these sequences within the amyloid fibril, as compared with the one adopted within the native structure. A 6-amino acid peptide, corresponding to the last beta-strand of the shaft, was found to be sufficient to form amyloid fibrils. Structural analysis of these amyloid fibrils suggests that perpendicular stacking of beta-strand repeat units is an underlying common feature of amyloid formation.
Structural polymorphism of a cytosine-rich DNA sequence forming i-motif structure: Exploring pH based biosensors.

PubMed

Ahmed, Saami; Kaushik, Mahima; Chaudhary, Swati; Kukreti, Shrikant

2018-05-01

Sequence recognition and conformational polymorphism enable DNA to emerge out as a substantial tool in fabricating the devices within nano-dimensions. These DNA associated nano devices work on the principle of conformational switches, which can be facilitated by many factors like sequence of DNA/RNA strand, change in pH or temperature, enzyme or ligand interactions etc. Thus, controlling these DNA conformational changes to acquire the desired function is significant for evolving DNA hybridization biosensor, used in genetic screening and molecular diagnosis. For exploring this conformational switching ability of cytosine-rich DNA oligonucleotides as a function of pH for their potential usage as biosensors, this study has been designed. A C-rich stretch of DNA sequence (5'-TCCCCCAATTAATTCCCCCA-3'; SG20c) has been investigated using UV-Thermal denaturation, poly-acrylamide gel electrophoresis and CD spectroscopy. The SG20c sequence is shown to adopt various topologies of i-motif structure at low pH. This pH dependent transition of SG20c from unstructured single strand to unimolecular and bimolecular i-motif structures can further be exploited for its utilization as switching on/off pH-based biosensors. Copyright © 2018. Published by Elsevier B.V.
Efficient farnesylation of an extended C-terminal C(x)3X sequence motif expands the scope of the prenylated proteome.

PubMed

Blanden, Melanie J; Suazo, Kiall F; Hildebrandt, Emily R; Hardgrove, Daniel S; Patel, Meet; Saunders, William P; Distefano, Mark D; Schmidt, Walter K; Hougland, James L

2018-02-23

Protein prenylation is a post-translational modification that has been most commonly associated with enabling protein trafficking to and interaction with cellular membranes. In this process, an isoprenoid group is attached to a cysteine near the C terminus of a substrate protein by protein farnesyltransferase (FTase) or protein geranylgeranyltransferase type I or II (GGTase-I and GGTase-II). FTase and GGTase-I have long been proposed to specifically recognize a four-amino acid C AAX C-terminal sequence within their substrates. Surprisingly, genetic screening reveals that yeast FTase can modify sequences longer than the canonical C AAX sequence, specifically C( x ) 3 X sequences with four amino acids downstream of the cysteine. Biochemical and cell-based studies using both peptide and protein substrates reveal that mammalian FTase orthologs can also prenylate C( x ) 3 X sequences. As the search to identify physiologically relevant C( x ) 3 X proteins begins, this new prenylation motif nearly doubles the number of proteins within the yeast and human proteomes that can be explored as potential FTase substrates. This work expands our understanding of prenylation's impact within the proteome, establishes the biologically relevant reactivity possible with this new motif, and opens new frontiers in determining the impact of non-canonically prenylated proteins on cell function. © 2018 by The American Society for Biochemistry and Molecular Biology, Inc.
Molecular cloning and characterization of a cDNA encoding the gibberellin biosynthetic enzyme ent-kaurene synthase B from pumpkin (Cucurbita maxima L.).

PubMed

Yamaguchi, S; Saito, T; Abe, H; Yamane, H; Murofushi, N; Kamiya, Y

1996-08-01

The first committed step in the formation of diterpenoids leading to gibberellin (GA) biosynthesis is the conversion of geranylgeranyl diphosphate (GGDP) to ent-kaurene. ent-Kaurene synthase A (KSA) catalyzes the conversion of GGDP to copalyl diphosphate (CDP), which is subsequently converted to ent-kaurene by ent-kaurene synthase B (KSB). A full-length KSB cDNA was isolated from developing cotyledons in immature seeds of pumpkin (Cucurbita maxima L.). Degenerate oligonucleotide primers were designed from the amino acid sequences obtained from the purified protein to amplify a cDNA fragment, which was used for library screening. The isolated full-length cDNA was expressed in Escherichia coli as a fusion protein, which demonstrated the KSB activity to cyclize [3H]CDP to [3H]ent-kaurene. The KSB transcript was most abundant in growing tissues, but was detected in every organ in pumpkin seedlings. The deduced amino acid sequence shares significant homology with other terpene cyclases, including the conserved DDXXD motif, a putative divalent metal ion-diphosphate complex binding site. A putative transit peptide sequence that may target the translated product into the plastids is present in the N-terminal region.
Bacterial collagen-like proteins that form triple-helical structures

PubMed Central

Yu, Zhuoxin; An, Bo; Ramshaw, John A.M.; Brodsky, Barbara

2014-01-01

A large number of collagen-like proteins have been identified in bacteria during the past ten years, principally from analysis of genome databases. These bacterial collagens share the distinctive Gly-Xaa-Yaa repeating amino acid sequence of animal collagens which underlies their unique triple-helical structure. A number of the bacterial collagens have been expressed in E. coli, and they all adopt a triple-helix conformation. Unlike animal collagens, these bacterial proteins do not contain the post-translationally modified amino acid, hydroxyproline, which is known to stabilize the triple-helix structure and may promote self-assembly. Despite the absence of collagen hydroxylation, the triple-helix structures of the bacterial collagens studied exhibit a high thermal stability of 35–39 °C, close to that seen for mammalian collagens. These bacterial collagens are readily produced in large quantities by recombinant methods, either in the original amino acid sequence or in genetically manipulated sequences. This new family of recombinant, easy to modify collagens could provide a novel system for investigating structural and functional motifs in animal collagens and could also form the basis of new biomedical materials with designed structural properties and functions. PMID:24434612
Fungal Genes in Context: Genome Architecture Reflects Regulatory Complexity and Function

PubMed Central

Noble, Luke M.; Andrianopoulos, Alex

2013-01-01

Gene context determines gene expression, with local chromosomal environment most influential. Comparative genomic analysis is often limited in scope to conserved or divergent gene and protein families, and fungi are well suited to this approach with low functional redundancy and relatively streamlined genomes. We show here that one aspect of gene context, the amount of potential upstream regulatory sequence maintained through evolution, is highly predictive of both molecular function and biological process in diverse fungi. Orthologs with large upstream intergenic regions (UIRs) are strongly enriched in information processing functions, such as signal transduction and sequence-specific DNA binding, and, in the genus Aspergillus, include the majority of experimentally studied, high-level developmental and metabolic transcriptional regulators. Many uncharacterized genes are also present in this class and, by implication, may be of similar importance. Large intergenic regions also share two novel sequence characteristics, currently of unknown significance: they are enriched for plus-strand polypyrimidine tracts and an information-rich, putative regulatory motif that was present in the last common ancestor of the Pezizomycotina. Systematic consideration of gene UIR in comparative genomics, particularly for poorly characterized species, could help reveal organisms’ regulatory priorities. PMID:23699226
In silico analysis of β-1,3-glucanase from a psychrophilic yeast, Glaciozyma antarctica PI12

NASA Astrophysics Data System (ADS)

Mohammadi, Salimeh; Bakar, Farah Diba Abu; Rabu, Amir; Murad, Abdul Munir Abdul

2014-09-01

1,3-beta-glucanase is an industrially important enzyme having wide range of applications especially in food industry. It is crucial to gain an understanding about the structure and functional aspects of various beta-1,3-glucanase produced from diverse sources. In this, study a cDNA encoding β-1,3-glucanase (GaExg55) was isolated from a psychrophilic yeast, Glaciozyma antarctica PI12. The cDNA sequence has been submitted to Genbank with an accession number (KJ436377). Subsequently, the perdition protein was analyzed using various bioinformatics tools to explore the properties of the protein. GaEXG55 is consisting of 1,440-bp nucleotides encoding 480 amino acid residues. Alignment of the deduced amino acid for GaExg55 with other exo-β-1,3-glucanase available at the NCBI database indicate that deduced amino acids shared a consensus motif NEP, which is signature pattern of GH5 hydrolases. Predicted molecular weight of GaExg55 is 53.66 kDa. GaExg55 sequences possesses signal peptide sequence and it is highly conserved with other fungal exo-beta-1,3 glucanase.
Structure of Lmaj006129AAA, a hypothetical protein from Leishmania major

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arakaki, Tracy; Le Trong, Isolde; Structural Genomics of Pathogenic Protozoa

2006-03-01

The crystal structure of a conserved hypothetical protein from L. major, Pfam sequence family PF04543, structural genomics target ID Lmaj006129AAA, has been determined at a resolution of 1.6 Å. The gene product of structural genomics target Lmaj006129 from Leishmania major codes for a 164-residue protein of unknown function. When SeMet expression of the full-length gene product failed, several truncation variants were created with the aid of Ginzu, a domain-prediction method. 11 truncations were selected for expression, purification and crystallization based upon secondary-structure elements and disorder. The structure of one of these variants, Lmaj006129AAH, was solved by multiple-wavelength anomalous diffraction (MAD)more » using ELVES, an automatic protein crystal structure-determination system. This model was then successfully used as a molecular-replacement probe for the parent full-length target, Lmaj006129AAA. The final structure of Lmaj006129AAA was refined to an R value of 0.185 (R{sub free} = 0.229) at 1.60 Å resolution. Structure and sequence comparisons based on Lmaj006129AAA suggest that proteins belonging to Pfam sequence families PF04543 and PF01878 may share a common ligand-binding motif.« less
Sequence analyses reveal that a TPR-DP module, surrounded by recombinable flanking introns, could be at the origin of eukaryotic Hop and Hip TPR-DP domains and prokaryotic GerD proteins.

PubMed

Hernández Torres, Jorge; Papandreou, Nikolaos; Chomilier, Jacques

2009-05-01

The co-chaperone Hop [heat shock protein (HSP) organising protein] is known to bind both Hsp70 and Hsp90. Hop comprises three repeats of a tetratricopeptide repeat (TPR) domain, each consisting of three TPR motifs. The first and last TPR domains are followed by a domain containing several dipeptide (DP) repeats called the DP domain. These analyses suggest that the hop genes result from successive recombination events of an ancestral TPR-DP module. From a hydrophobic cluster analysis of homologous Hop protein sequences derived from gene families, we can postulate that shifts in the open reading frames are at the origin of the present sequences. Moreover, these shifts can be related to the presence or absence of biological function. We propose to extend the family of Hop co-chaperons into the kingdom of bacteria, as several structurally related genes have been identified by hydrophobic cluster analysis. We also provide evidence of common structural characteristics between hop and hip genes, suggesting a shared precursor of ancestral TPR-DP domains.
Four signature motifs define the first class of structurally related large coiled-coil proteins in plants.

PubMed Central

Gindullis, Frank; Rose, Annkatrin; Patel, Shalaka; Meier, Iris

2002-01-01

Background Animal and yeast proteins containing long coiled-coil domains are involved in attaching other proteins to the large, solid-state components of the cell. One subgroup of long coiled-coil proteins are the nuclear lamins, which are involved in attaching chromatin to the nuclear envelope and have recently been implicated in inherited human diseases. In contrast to other eukaryotes, long coiled-coil proteins have been barely investigated in plants. Results We have searched the completed Arabidopsis genome and have identified a family of structurally related long coiled-coil proteins. Filament-like plant proteins (FPP) were identified by sequence similarity to a tomato cDNA that encodes a coiled-coil protein which interacts with the nuclear envelope-associated protein, MAF1. The FPP family is defined by four novel unique sequence motifs and by two clusters of long coiled-coil domains separated by a non-coiled-coil linker. All family members are expressed in a variety of Arabidopsis tissues. A homolog sharing the structural features was identified in the monocot rice, indicating conservation among angiosperms. Conclusion Except for myosins, this is the first characterization of a family of long coiled-coil proteins in plants. The tomato homolog of the FPP family binds in a yeast two-hybrid assay to a nuclear envelope-associated protein. This might suggest that FPP family members function in nuclear envelope biology. Because the full Arabidopsis genome does not appear to contain genes for lamins, it is of interest to investigate other long coiled-coil proteins, which might functionally replace lamins in the plant kingdom. PMID:11972898
Defining RNA motif-aminoglycoside interactions via two-dimensional combinatorial screening and structure-activity relationships through sequencing.

PubMed

Velagapudi, Sai Pradeep; Disney, Matthew D

2013-10-15

RNA is an extremely important target for the development of chemical probes of function or small molecule therapeutics. Aminoglycosides are the most well studied class of small molecules to target RNA. However, the RNA motifs outside of the bacterial rRNA A-site that are likely to be bound by these compounds in biological systems is largely unknown. If such information were known, it could allow for aminoglycosides to be exploited to target other RNAs and, in addition, could provide invaluable insights into potential bystander targets of these clinically used drugs. We utilized two-dimensional combinatorial screening (2DCS), a library-versus-library screening approach, to select the motifs displayed in a 3×3 nucleotide internal loop library and in a 6-nucleotide hairpin library that bind with high affinity and selectivity to six aminoglycoside derivatives. The selected RNA motifs were then analyzed using structure-activity relationships through sequencing (StARTS), a statistical approach that defines the privileged RNA motif space that binds a small molecule. StARTS allowed for the facile annotation of the selected RNA motif-aminoglycoside interactions in terms of affinity and selectivity. The interactions selected by 2DCS generally have nanomolar affinities, which is higher affinity than the binding of aminoglycosides to a mimic of their therapeutic target, the bacterial rRNA A-site. Copyright © 2013 Elsevier Ltd. All rights reserved.
Inforna 2.0: A Platform for the Sequence-Based Design of Small Molecules Targeting Structured RNAs.

PubMed

Disney, Matthew D; Winkelsas, Audrey M; Velagapudi, Sai Pradeep; Southern, Mark; Fallahi, Mohammad; Childs-Disney, Jessica L

2016-06-17

The development of small molecules that target RNA is challenging yet, if successful, could advance the development of chemical probes to study RNA function or precision therapeutics to treat RNA-mediated disease. Previously, we described Inforna, an approach that can mine motifs (secondary structures) within target RNAs, which is deduced from the RNA sequence, and compare them to a database of known RNA motif-small molecule binding partners. Output generated by Inforna includes the motif found in both the database and the desired RNA target, lead small molecules for that target, and other related meta-data. Lead small molecules can then be tested for binding and affecting cellular (dys)function. Herein, we describe Inforna 2.0, which incorporates all known RNA motif-small molecule binding partners reported in the scientific literature, a chemical similarity searching feature, and an improved user interface and is freely available via an online web server. By incorporation of interactions identified by other laboratories, the database has been doubled, containing 1936 RNA motif-small molecule interactions, including 244 unique small molecules and 1331 motifs. Interestingly, chemotype analysis of the compounds that bind RNA in the database reveals features in small molecule chemotypes that are privileged for binding. Further, this updated database expanded the number of cellular RNAs to which lead compounds can be identified.
Discovery and validation of information theory-based transcription factor and cofactor binding site motifs.

PubMed

Lu, Ruipeng; Mucaki, Eliseos J; Rogan, Peter K

2017-03-17

Data from ChIP-seq experiments can derive the genome-wide binding specificities of transcription factors (TFs) and other regulatory proteins. We analyzed 765 ENCODE ChIP-seq peak datasets of 207 human TFs with a novel motif discovery pipeline based on recursive, thresholded entropy minimization. This approach, while obviating the need to compensate for skewed nucleotide composition, distinguishes true binding motifs from noise, quantifies the strengths of individual binding sites based on computed affinity and detects adjacent cofactor binding sites that coordinate with the targets of primary, immunoprecipitated TFs. We obtained contiguous and bipartite information theory-based position weight matrices (iPWMs) for 93 sequence-specific TFs, discovered 23 cofactor motifs for 127 TFs and revealed six high-confidence novel motifs. The reliability and accuracy of these iPWMs were determined via four independent validation methods, including the detection of experimentally proven binding sites, explanation of effects of characterized SNPs, comparison with previously published motifs and statistical analyses. We also predict previously unreported TF coregulatory interactions (e.g. TF complexes). These iPWMs constitute a powerful tool for predicting the effects of sequence variants in known binding sites, performing mutation analysis on regulatory SNPs and predicting previously unrecognized binding sites and target genes. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
The most common Chinese rhesus macaque MHC class I molecule shares peptide binding repertoire with the HLA-B7 supertype

PubMed Central

Solomon, Christopher; Southwood, Scott; Hoof, Ilka; Rudersdorf, Richard; Peters, Bjoern; Sidney, John; Pinilla, Clemencia; Marcondes, Maria Cecilia Garibaldi; Ling, Binhua; Marx, Preston; Sette, Alessandro

2010-01-01

Of the two rhesus macaque subspecies used for AIDS studies, the Simian immunodeficiency virus-infected Indian rhesus macaque (Macaca mulatta) is the most established model of HIV infection, providing both insight into pathogenesis and a system for testing novel vaccines. Despite the Chinese rhesus macaque potentially being a more relevant model for AIDS outcomes than the Indian rhesus macaque, the Chinese-origin rhesus macaques have not been well-characterized for their major histocompatibility complex (MHC) composition and function, reducing their greater utilization. In this study, we characterized a total of 50 unique Chinese rhesus macaques from several varying origins for their entire MHC class I allele composition and identified a total of 58 unique complete MHC class I sequences. Only nine of the sequences had been associated with Indian rhesus macaques, and 28/58 (48.3%) of the sequences identified were novel. From all MHC alleles detected, we prioritized Mamu-A1*02201 for functional characterization based on its higher frequency of expression. Upon the development of MHC/peptide binding assays and definition of its associated motif, we revealed that this allele shares peptide binding characteristics with the HLA-B7 supertype, the most frequent supertype in human populations. These studies provide the first functional characterization of an MHC class I molecule in the context of Chinese rhesus macaques and the first instance of HLA-B7 analogy for rhesus macaques. Electronic supplementary material The online version of this article (doi:10.1007/s00251-010-0450-3) contains supplementary material, which is available to authorized users. PMID:20480161
Common T cell receptor clonotype in lacrimal glands and labial salivary glands from patients with Sjögren's syndrome.

PubMed Central

Matsumoto, I; Tsubota, K; Satake, Y; Kita, Y; Matsumura, R; Murata, H; Namekawa, T; Nishioka, K; Iwamoto, I; Saitoh, Y; Sumida, T

1996-01-01

Sjogren's syndrome (SS) is an autoimmune disease characterized by lymphocytic infiltration into lacrimal and salivary glands leading to symptomatic dry eyes and mouth. Immunohistological studies have clarified that the majority of infiltrating lymphocytes around the lacrimal glands and labial salivary glands are CD4 positive alphabeta T cells. To analyze the pathogenesis of T cells infiltrating into lacrimal and labial salivary glands, we examined T cell clonotype of these cells in both glands from four SS patients using PCR-single-strand conformation polymorphism (SSCP) and a sequencing method. SSCP analysis showed that some infiltrating T cells in both glands expand clonally, suggesting that the cells proliferate by antigen-driven stimulation. Intriguingly, six to sixteen identical T cell receptor (TCR) Vbeta genes were commonly found in lacrimal glands and labial salivary glands from individual patients. This indicates that some T cells infiltrating into both glands recognize the shared epitopes on autoantigens. Moreover, highly conserved amino acid sequence motifs were found in the TCR CDR3 region bearing the same TCR Vbeta family gene from four SS patients, supporting the notion that the shared epitopes on antigens are limited. In conclusion, these findings suggest that some autoreactive T cells infiltrating into the lips and eyes recognized restricted epitopes of a common autoantigen in patients with SS. PMID:8621782

PSSMSearch: a server for modeling, visualization, proteome-wide discovery and annotation of protein motif specificity determinants.

PubMed

Krystkowiak, Izabella; Manguy, Jean; Davey, Norman E

2018-06-05

There is a pressing need for in silico tools that can aid in the identification of the complete repertoire of protein binding (SLiMs, MoRFs, miniMotifs) and modification (moiety attachment/removal, isomerization, cleavage) motifs. We have created PSSMSearch, an interactive web-based tool for rapid statistical modeling, visualization, discovery and annotation of protein motif specificity determinants to discover novel motifs in a proteome-wide manner. PSSMSearch analyses proteomes for regions with significant similarity to a motif specificity determinant model built from a set of aligned motif-containing peptides. Multiple scoring methods are available to build a position-specific scoring matrix (PSSM) describing the motif specificity determinant model. This model can then be modified by a user to add prior knowledge of specificity determinants through an interactive PSSM heatmap. PSSMSearch includes a statistical framework to calculate the significance of specificity determinant model matches against a proteome of interest. PSSMSearch also includes the SLiMSearch framework's annotation, motif functional analysis and filtering tools to highlight relevant discriminatory information. Additional tools to annotate statistically significant shared keywords and GO terms, or experimental evidence of interaction with a motif-recognizing protein have been added. Finally, PSSM-based conservation metrics have been created for taxonomic range analyses. The PSSMSearch web server is available at http://slim.ucd.ie/pssmsearch/.
Molecular cloning, gene expression analysis, and recombinant protein expression of novel silk proteins from larvae of a retreat-maker caddisfly, Stenopsyche marmorata.

PubMed

Bai, Xue; Sakaguchi, Mayo; Yamaguchi, Yuko; Ishihara, Shiori; Tsukada, Masuhiro; Hirabayashi, Kimio; Ohkawa, Kousaku; Nomura, Takaomi; Arai, Ryoichi

2015-08-28

Retreat-maker larvae of Stenopsyche marmorata, one of the major caddisfly species in Japan, produce silk threads and adhesives to build food capture nets and protective nests in water. Research on these underwater adhesive silk proteins potentially leads to the development of new functional biofiber materials. Recently, we identified four major S. marmorata silk proteins (Smsps), Smsp-1, Smsp-2, Smsp-3, and Smsp-4 from silk glands of S. marmorata larvae. In this study, we cloned full-length cDNAs of Smsp-2, Smsp-3, and Smsp-4 from the cDNA library of the S. marmorata silk glands to reveal the primary sequences of Smsps. Homology search results of the deduced amino acid sequences indicate that Smsp-2 and Smsp-4 are novel proteins. The Smsp-2 sequence [167 amino acids (aa)] has an array of GYD-rich repeat motifs and two (SX)4E motifs. The Smsp-4 sequence (132 aa) contains a number of GW-rich repeat motifs and three (SX)4E motifs. The Smsp-3 sequence (248 aa) exhibits high homology with fibroin light chain of other caddisflies. Gene expression analysis of Smsps by real-time PCR suggested that the gene expression of Smsp-1 and Smsp-3 was relatively stable throughout the year, whereas that of Smsp-2 and Smsp-4 varied seasonally. Furthermore, Smsps recombinant protein expression was successfully performed in Escherichia coli. The study provides new molecular insights into caddisfly aquatic silk and its potential for future applications. Copyright © 2015 Elsevier Inc. All rights reserved.
Computational mining for hypothetical patterns of amino acid side chains in protein data bank (PDB)

NASA Astrophysics Data System (ADS)

Ghani, Nur Syatila Ab; Firdaus-Raih, Mohd

2018-04-01

The three-dimensional structure of a protein can provide insights regarding its function. Functional relationship between proteins can be inferred from fold and sequence similarities. In certain cases, sequence or fold comparison fails to conclude homology between proteins with similar mechanism. Since the structure is more conserved than the sequence, a constellation of functional residues can be similarly arranged among proteins of similar mechanism. Local structural similarity searches are able to detect such constellation of amino acids among distinct proteins, which can be useful to annotate proteins of unknown function. Detection of such patterns of amino acids on a large scale can increase the repertoire of important 3D motifs since available known 3D motifs currently, could not compensate the ever-increasing numbers of uncharacterized proteins to be annotated. Here, a computational platform for an automated detection of 3D motifs is described. A fuzzy-pattern searching algorithm derived from IMagine an Amino Acid 3D Arrangement search EnGINE (IMAAAGINE) was implemented to develop an automated method for searching of hypothetical patterns of amino acid side chains in Protein Data Bank (PDB), without the need for prior knowledge on related sequence or structure of pattern of interest. We present an example of the searches, which is the detection of a hypothetical pattern derived from known structural motif of C2H2 structural pattern from zinc fingers. The conservation of particular patterns of amino acid side chains in unrelated proteins is highlighted. This approach can act as a complementary method for available structure- and sequence-based platforms and may contribute in improving functional association between proteins.
Systematic elucidation and in vivo validation of sequences enriched in hindbrain transcriptional control

PubMed Central

Burzynski, Grzegorz M.; Reed, Xylena; Taher, Leila; Stine, Zachary E.; Matsui, Takeshi; Ovcharenko, Ivan; McCallion, Andrew S.

2012-01-01

Illuminating the primary sequence encryption of enhancers is central to understanding the regulatory architecture of genomes. We have developed a machine learning approach to decipher motif patterns of hindbrain enhancers and identify 40,000 sequences in the human genome that we predict display regulatory control that includes the hindbrain. Consistent with their roles in hindbrain patterning, MEIS1, NKX6-1, as well as HOX and POU family binding motifs contributed strongly to this enhancer model. Predicted hindbrain enhancers are overrepresented at genes expressed in hindbrain and associated with nervous system development, and primarily reside in the areas of open chromatin. In addition, 77 (0.2%) of these predictions are identified as hindbrain enhancers on the VISTA Enhancer Browser, and 26,000 (60%) overlap enhancer marks (H3K4me1 or H3K27ac). To validate these putative hindbrain enhancers, we selected 55 elements distributed throughout our predictions and six low scoring controls for evaluation in a zebrafish transgenic assay. When assayed in mosaic transgenic embryos, 51/55 elements directed expression in the central nervous system. Furthermore, 30/34 (88%) predicted enhancers analyzed in stable zebrafish transgenic lines directed expression in the larval zebrafish hindbrain. Subsequent analysis of sequence fragments selected based upon motif clustering further confirmed the critical role of the motifs contributing to the classifier. Our results demonstrate the existence of a primary sequence code characteristic to hindbrain enhancers. This code can be accurately extracted using machine-learning approaches and applied successfully for de novo identification of hindbrain enhancers. This study represents a critical step toward the dissection of regulatory control in specific neuronal subtypes. PMID:22759862
Identification and characterization of gene-based SSR markers in date palm (Phoenix dactylifera L.).

PubMed

Zhao, Yongli; Williams, Roxanne; Prakash, C S; He, Guohao

2012-12-15

Date palm (Phoenix dactylifera L.) is an important tree in the Middle East and North Africa due to the nutritional value of its fruit. Molecular Breeding would accelerate genetic improvement of fruit tree through marker assisted selection. However, the lack of molecular markers in date palm restricts the application of molecular breeding. In this study, we analyzed 28,889 EST sequences from the date palm genome database to identify simple-sequence repeats (SSRs) and to develop gene-based markers, i.e. expressed sequence tag-SSRs (EST-SSRs). We identified 4,609 ESTs as containing SSRs, among which, trinucleotide motifs (69.7%) were the most common, followed by tetranucleotide (10.4%) and dinucleotide motifs (9.6%). The motif AG (85.7%) was most abundant in dinucleotides, while motifs AGG (26.8%), AAG (19.3%), and AGC (16.1%) were most common among trinucleotides. A total of 4,967 primer pairs were designed for EST-SSR markers from the computational data. In a follow up laboratory study, we tested a sample of 20 random selected primer pairs for amplification and polymorphism detection using genomic DNA from date palm cultivars. Nearly one-third of these primer pairs detected DNA polymorphism to differentiate the twelve date palm cultivars used. Functional categorization of EST sequences containing SSRs revealed that 3,108 (67.4%) of such ESTs had homology with known proteins. Date palm EST sequences exhibits a good resource for developing gene-based markers. These genic markers identified in our study may provide a valuable genetic and genomic tool for further genetic research and varietal development in date palm, such as diversity study, QTL mapping, and molecular breeding.
Deciphering functional glycosaminoglycan motifs in development.

PubMed

Townley, Robert A; Bülow, Hannes E

2018-03-23

Glycosaminoglycans (GAGs) such as heparan sulfate, chondroitin/dermatan sulfate, and keratan sulfate are linear glycans, which when attached to protein backbones form proteoglycans. GAGs are essential components of the extracellular space in metazoans. Extensive modifications of the glycans such as sulfation, deacetylation and epimerization create structural GAG motifs. These motifs regulate protein-protein interactions and are thereby repsonsible for many of the essential functions of GAGs. This review focusses on recent genetic approaches to characterize GAG motifs and their function in defined signaling pathways during development. We discuss a coding approach for GAGs that would enable computational analyses of GAG sequences such as alignments and the computation of position weight matrices to describe GAG motifs. Copyright © 2018 Elsevier Ltd. All rights reserved.
Novel Inhibitor Cystine Knot Peptides from Momordica charantia

PubMed Central

Clark, Richard J.; Tang, Jun; Zeng, Guang-Zhi; Franco, Octavio L.; Cantacessi, Cinzia; Craik, David J.; Daly, Norelle L.; Tan, Ning-Hua

2013-01-01

Two new peptides, MCh-1 and MCh-2, along with three known trypsin inhibitors (MCTI-I, MCTI-II and MCTI-III), were isolated from the seeds of the tropical vine Momordica charantia. The sequences of the peptides were determined using mass spectrometry and NMR spectroscopy. Using a strategy involving partial reduction and stepwise alkylation of the peptides, followed by enzymatic digestion and tandem mass spectrometry sequencing, the disulfide connectivity of MCh-1 was elucidated to be CysI-CysIV, CysII-CysV and CysIII-CysVI. The three-dimensional structures of MCh-1 and MCh-2 were determined using NMR spectroscopy and found to contain the inhibitor cystine knot (ICK) motif. The sequences of the novel peptides differ significantly from peptides previously isolated from this plant. Therefore, this study expands the known peptide diversity in M. charantia and the range of sequences that can be accommodated by the ICK motif. Furthermore, we show that a stable two-disulfide intermediate is involved in the oxidative folding of MCh-1. This disulfide intermediate is structurally homologous to the proposed ancestral fold of ICK peptides, and provides a possible pathway for the evolution of this structural motif, which is highly prevalent in nature. PMID:24116036
Microsatellites for Lindera species

Treesearch

Craig S. Echt; D. Deemer; T.L. Kubisiak; C.D. Nelson

2006-01-01

Microsatellite markers were developed for conservation genetic studies of Lindera melissifolia (pondberry), a federally endangered shrub of southern bottomland ecosystems. Microsatellite sequences were obtained from DNA libraries that were enriched for the (AC)n simple sequence repeat motif. From 35 clone sequences, 20 primer...
Sequence analysis reveals genomic factors affecting EST-SSR primer performance and polymorphism

USDA-ARS?s Scientific Manuscript database

Search for simple sequence repeat (SSR) motifs and design of flanking primers in expressed sequence tag (EST) sequences can be easily done at a large scale using bioinformatics programs. However, failed amplification and/or detection, along with lack of polymorphism, is often seen among randomly sel...
A viral, transporter associated with antigen processing (TAP)-independent, high affinity ligand with alternative interactions endogenously presented by the nonclassical human leukocyte antigen E class I molecule.

PubMed

Lorente, Elena; Infantes, Susana; Abia, David; Barnea, Eilon; Beer, Ilan; García, Ruth; Lasala, Fátima; Jiménez, Mercedes; Mir, Carmen; Morreale, Antonio; Admon, Arie; López, Daniel

2012-10-12

The transporter associated with antigen processing (TAP) enables the flow of viral peptides generated in the cytosol by the proteasome and other proteases to the endoplasmic reticulum, where they complex with nascent human leukocyte antigen (HLA) class I. Later, these peptide-HLA class I complexes can be recognized by CD8(+) lymphocytes. Cancerous cells and infected cells in which TAP is blocked, as well as individuals with unusable TAP complexes, are able to present peptides on HLA class I by generating them through TAP-independent processing pathways. Here, we identify a physiologically processed HLA-E ligand derived from the D8L protein in TAP-deficient vaccinia virus-infected cells. This natural high affinity HLA-E class I ligand uses alternative interactions to the anchor motifs previously described to be presented on nonclassical HLA class I molecules. This octameric peptide was also presented on HLA-Cw1 with similar binding affinity on both classical and nonclassical class I molecules. In addition, this viral peptide inhibits HLA-E-mediated cytolysis by natural killer cells. Comparison between the amino acid sequences of the presenting HLA-E and HLA-Cw1 alleles revealed a shared structural motif in both HLA class molecules, which could be related to their observed similar cross-reactivity affinities. This motif consists of several residues located on the floor of the peptide-binding site. These data expand the role of HLA-E as an antigen-presenting molecule.
Analysis of the Isolated SecA DEAD Motor Suggests a Mechanism for Chemical-Mechanical Coupling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nithianantham, Stanley; Shilton, Brian H

The preprotein cross-linking domain and C-terminal domains of Escherichia coli SecA were removed to create a minimal DEAD motor, SecA-DM. SecA-DM hydrolyzes ATP and has the same affinity for ADP as full-length SecA. The crystal structure of SecA-DM in complex with ADP was solved and shows the DEAD motor in a closed conformation. Comparison with the structure of the E. coli DEAD motor in an open conformation (Protein Data Bank ID 2FSI) indicates main-chain conformational changes in two critical sequences corresponding to Motif III and Motif V of the DEAD helicase family. The structures that the Motif III and Motifmore » V sequences adopt in the DEAD motor open conformation are incompatible with the closed conformation. Therefore, when the DEAD motor makes the transition from open to closed, Motif III and Motif V are forced to change their conformations, which likely functions to regulate passage through the transition state for ATP hydrolysis. The transition state for ATP hydrolysis for the SecA DEAD motor was modeled based on the conformation of the Vasa helicase in complex with adenylyl imidodiphosphate and RNA (Protein Data Bank ID 2DB3). A mechanism for chemical-mechanical coupling emerges, where passage through the transition state for ATP hydrolysis is hindered by the conformational changes required in Motif III and Motif V, and may be promoted by binding interactions with the preprotein substrate and/or other translocase domains and subunits.« less
Analysis of the Isolated SecA DEAD Motor Suggests a Mechanism for Chemical-Mechanical Coupling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nithianantham, Stanley; Shilton, Brian H

2011-09-28

The preprotein cross-linking domain and C-terminal domains of Escherichia coli SecA were removed to create a minimal DEAD motor, SecA-DM. SecA-DM hydrolyzes ATP and has the same affinity for ADP as full-length SecA. The crystal structure of SecA-DM in complex with ADP was solved and shows the DEAD motor in a closed conformation. Comparison with the structure of the E. coli DEAD motor in an open conformation (Protein Data Bank ID 2FSI) indicates main-chain conformational changes in two critical sequences corresponding to Motif III and Motif V of the DEAD helicase family. The structures that the Motif III and Motifmore » V sequences adopt in the DEAD motor open conformation are incompatible with the closed conformation. Therefore, when the DEAD motor makes the transition from open to closed, Motif III and Motif V are forced to change their conformations, which likely functions to regulate passage through the transition state for ATP hydrolysis. The transition state for ATP hydrolysis for the SecA DEAD motor was modeled based on the conformation of the Vasa helicase in complex with adenylyl imidodiphosphate and RNA (Protein Data Bank ID 2DB3). A mechanism for chemical-mechanical coupling emerges, where passage through the transition state for ATP hydrolysis is hindered by the conformational changes required in Motif III and Motif V, and may be promoted by binding interactions with the preprotein substrate and/or other translocase domains and subunits.« less
Cloud-based MOTIFSIM: Detecting Similarity in Large DNA Motif Data Sets.

PubMed

Tran, Ngoc Tam L; Huang, Chun-Hsi

2017-05-01

We developed the cloud-based MOTIFSIM on Amazon Web Services (AWS) cloud. The tool is an extended version from our web-based tool version 2.0, which was developed based on a novel algorithm for detecting similarity in multiple DNA motif data sets. This cloud-based version further allows researchers to exploit the computing resources available from AWS to detect similarity in multiple large-scale DNA motif data sets resulting from the next-generation sequencing technology. The tool is highly scalable with expandable AWS.
Visualizing frequent patterns in large multivariate time series

NASA Astrophysics Data System (ADS)

Hao, M.; Marwah, M.; Janetzko, H.; Sharma, R.; Keim, D. A.; Dayal, U.; Patnaik, D.; Ramakrishnan, N.

2011-01-01

The detection of previously unknown, frequently occurring patterns in time series, often called motifs, has been recognized as an important task. However, it is difficult to discover and visualize these motifs as their numbers increase, especially in large multivariate time series. To find frequent motifs, we use several temporal data mining and event encoding techniques to cluster and convert a multivariate time series to a sequence of events. Then we quantify the efficiency of the discovered motifs by linking them with a performance metric. To visualize frequent patterns in a large time series with potentially hundreds of nested motifs on a single display, we introduce three novel visual analytics methods: (1) motif layout, using colored rectangles for visualizing the occurrences and hierarchical relationships of motifs in a multivariate time series, (2) motif distortion, for enlarging or shrinking motifs as appropriate for easy analysis and (3) motif merging, to combine a number of identical adjacent motif instances without cluttering the display. Analysts can interactively optimize the degree of distortion and merging to get the best possible view. A specific motif (e.g., the most efficient or least efficient motif) can be quickly detected from a large time series for further investigation. We have applied these methods to two real-world data sets: data center cooling and oil well production. The results provide important new insights into the recurring patterns.
Overlapping activation-induced cytidine deaminase hotspot motifs in Ig class-switch recombination

PubMed Central

Han, Li; Masani, Shahnaz; Yu, Kefei

2011-01-01

Ig class-switch recombination (CSR) is directed by the long and repetitive switch regions and requires activation-induced cytidine deaminase (AID). One of the conserved switch-region sequence motifs (AGCT) is a preferred site for AID-mediated DNA-cytosine deamination. By using somatic gene targeting and recombinase-mediated cassette exchange, we established a cell line-based CSR assay that allows manipulation of switch sequences at the endogenous locus. We show that AGCT is only one of a family of four WGCW motifs in the switch region that can facilitate CSR. We go on to show that it is the overlap of AID hotspots at WGCW sites on the top and bottom strands that is critical. This finding leads to a much clearer model for the difference between CSR and somatic hypermutation. PMID:21709240
Isolation and Expression Analysis of CYP9A11 and Cytochrome P450 Reductase Gene in the Beet Armyworm (Lepidoptera: Noctuidae)

PubMed Central

Zhao, Chunqing; Feng, Xiaoyun; Tang, Tao; Qiu, Lihong

2015-01-01

Cytochrome P450 monooxygenases (CYPs), as an enzyme superfamily, is widely distributed in organisms and plays a vital function in the metabolism of exogenous and endogenous compounds by interacting with its obligatory redox partner, CYP reductase (CPR). A novel CYP gene (CYP9A11) and CPR gene from the agricultural pest insect Spodoptera exigua were cloned and characterized. The complete cDNA sequences of SeCYP9A11 and SeCPR are 1,931 and 3,919 bp in length, respectively, and contain open reading frames of 1,593 and 2,070 nucleotides, respectively. Analysis of the putative protein sequences indicated that SeCYP9A11 contains a heme-binding domain and the unique characteristic sequence (SRFALCE) of the CYP9 family, in addition to a signal peptide and transmembrane segment at the N-terminal. Alignment analysis revealed that SeCYP9A11 shares the highest sequence similarity with CYP9A13 from Mamestra brassicae, which is 66.54%. The putative protein sequence of SeCPR has all of the classical CPR features, such as an N-terminal membrane anchor; three conserved domain flavin adenine dinucleotide (FAD), flavin mononucleotide (FMN), and nicotinamide adenine dinucleotide phosphate (NADPH) domain; and characteristic binding motifs. Phylogenetic analysis revealed that SeCPR shares the highest identity with HaCPR, which is 95.21%. The SeCYP9A11 and SeCPR genes were detected in the midgut, fat body, and cuticle tissues, and throughout all of the developmental stages of S. exigua. The mRNA levels of SeCYP9A11 and SeCPR decreased remarkably after exposure to plant secondary metabolites quercetin and tannin. The results regarding SeCYP9A11 and SeCPR genes in the current study provide foundation for the further study of S. exigua P450 system. PMID:26320261
Extraordinary Sequence Divergence at Tsga8, an X-linked Gene Involved in Mouse Spermiogenesis

PubMed Central

Good, Jeffrey M.; Vanderpool, Dan; Smith, Kimberly L.; Nachman, Michael W.

2011-01-01

The X chromosome plays an important role in both adaptive evolution and speciation. We used a molecular evolutionary screen of X-linked genes potentially involved in reproductive isolation in mice to identify putative targets of recurrent positive selection. We then sequenced five very rapidly evolving genes within and between several closely related species of mice in the genus Mus. All five genes were involved in male reproduction and four of the genes showed evidence of recurrent positive selection. The most remarkable evolutionary patterns were found at Testis-specific gene a8 (Tsga8), a spermatogenesis-specific gene expressed during postmeiotic chromatin condensation and nuclear transformation. Tsga8 was characterized by extremely high levels of insertion–deletion variation of an alanine-rich repetitive motif in natural populations of Mus domesticus and M. musculus, differing in length from the reference mouse genome by up to 89 amino acids (27% of the total protein length). This population-level variation was coupled with striking divergence in protein sequence and length between closely related mouse species. Although no clear orthologs had previously been described for Tsga8 in other mammalian species, we have identified a highly divergent hypothetical gene on the rat X chromosome that shares clear orthology with the 5′ and 3′ ends of Tsga8. Further inspection of this ortholog verified that it is expressed in rat testis and shares remarkable similarity with mouse Tsga8 across several general features of the protein sequence despite no conservation of nucleotide sequence across over 60% of the rat-coding domain. Overall, Tsga8 appears to be one of the most rapidly evolving genes to have been described in rodents. We discuss the potential evolutionary causes and functional implications of this extraordinary divergence and the possible contribution of Tsga8 and the other four genes we examined to reproductive isolation in mice. PMID:21186189
Regulatory elements of Caenorhabditis elegans ribosomal protein genes

PubMed Central

2012-01-01

Background Ribosomal protein genes (RPGs) are essential, tightly regulated, and highly expressed during embryonic development and cell growth. Even though their protein sequences are strongly conserved, their mechanism of regulation is not conserved across yeast, Drosophila, and vertebrates. A recent investigation of genomic sequences conserved across both nematode species and associated with different gene groups indicated the existence of several elements in the upstream regions of C. elegans RPGs, providing a new insight regarding the regulation of these genes in C. elegans. Results In this study, we performed an in-depth examination of C. elegans RPG regulation and found nine highly conserved motifs in the upstream regions of C. elegans RPGs using the motif discovery algorithm DME. Four motifs were partially similar to transcription factor binding sites from C. elegans, Drosophila, yeast, and human. One pair of these motifs was found to co-occur in the upstream regions of 250 transcripts including 22 RPGs. The distance between the two motifs displayed a complex frequency pattern that was related to their relative orientation. We tested the impact of three of these motifs on the expression of rpl-2 using a series of reporter gene constructs and showed that all three motifs are necessary to maintain the high natural expression level of this gene. One of the motifs was similar to the binding site of an orthologue of POP-1, and we showed that RNAi knockdown of pop-1 impacts the expression of rpl-2. We further determined the transcription start site of rpl-2 by 5’ RACE and found that the motifs lie 40–90 bases upstream of the start site. We also found evidence that a noncoding RNA, contained within the outron of rpl-2, is co-transcribed with rpl-2 and cleaved during trans-splicing. Conclusions Our results indicate that C. elegans RPGs are regulated by a complex novel series of regulatory elements that is evolutionarily distinct from those of all other species examined up until now. PMID:22928635
Molecular cloning and sequence analysis of stearoyl-CoA desaturase in milkfish, Chanos chanos.

PubMed

Hsieh, S L; Liao, W L; Kuo, C M

2001-12-01

Stearoyl-CoA desaturase (EC 1.14.99.5) is a key enzyme in the biosynthesis of polyunsaturated fatty acids and the maintenance of the homeoviscous fluidity of biological membranes. The stearoyl-CoA desaturase cDNA in milkfish (Chanos chanos) was cloned by RT-PCR and RACE, and it was compared with the stearoyl-CoA desaturase in cold-tolerant teleosts, common carp and grass carp. Nucleotide sequence analysis revealed that the cDNA clone has a 972-bp open reading frame encoding 323 amino acid residues. Alignments of the deduced amino acid sequence showed that the milkfish stearoyl-CoA desaturase shares 79% and 75% identity with common carp and grass carp, and 63%-64% with other vertebrates such as sheep, hamsters, rats, mice, and humans. Like common carp and grass carp, the deduced amino acid sequence in milkfish well conserves three histidine cluster motifs (one HXXXXH and two HXXHH) that are essential for catalysis of stearoyl-CoA desaturase activity. However, RT-PCR analysis showed that stearoyl-CoA desaturase expression in milkfish is detected in the tissues of liver, muscle, kidney, brain, and gill, and more expression sites were found in milkfish than in common carp and grass carp. Phylogenic relationships among the deduced stearoyl-CoA desaturase amino acid sequence in milkfish and those in other vertebrates showed that the milkfish stearoyl-CoA desaturase amino acid sequence is phylogenetically closer to those of common carp and grass carp than to other higher vertebrates.
Development of a bioassay to screen for chemicals mimicking the anti-aging effects of calorie restriction

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chiba, Takuya, E-mail: takuya@nagasaki-u.ac.jp; Tsuchiya, Tomoshi; Komatsu, Toshimitsu

2010-10-15

Research highlights: {yields} We identified four sequence motifs lying upstream of putative pro-longevity genes. {yields} One of these motifs binds to HNF-4{alpha}. {yields} HNF-4{alpha}/PGC-1{alpha} could up-regulate the transcription of a reporter gene linked to this motif. {yields} The reporter system described here could be used to screen candidate anti-aging molecules. -- Abstract: Suppression of the growth hormone/insulin-like growth factor-I pathway in Ames dwarf (DF) mice, and caloric restriction (CR) in normal mice extends lifespan and delays the onset of age-related disorders. In combination, these interventions have an additive effect on lifespan in Ames DF mice. Therefore, common signaling pathways regulatedmore » by DF and CR could have additive effects on longevity. In this study, we tried to identity the signaling mechanism and develop a system to assess pro-longevity status in cells and mice. We previously identified genes up-regulated in the liver of DF and CR mice by DNA microarray analysis. Motif analysis of the upstream sequences of those genes revealed four major consensus sequence motifs, which have been named dwarfism and calorie restriction-responsive elements (DFCR-REs). One of the synthesized sequences bound to hepatocyte nuclear factor-4{alpha} (HNF-4{alpha}), an important transcription factor involved in liver metabolism. Furthermore, using this sequence information, we developed a highly sensitive bioassay to identify chemicals mimicking the anti-aging effects of CR. When the reporter construct, containing an element upstream of a secreted alkaline phosphatase (SEAP) gene, was co-transfected with HNF-4{alpha} and its regulator peroxisome proliferator-activated receptor (PPAR) {gamma} coactivator-1{alpha} (PGC-1{alpha}), SEAP activity was increased compared with untransfected controls. Moreover, transient transgenic mice established using this construct showed increased SEAP activity in CR mice compared with ad libitum-fed mice. These data suggest that because of its rapidity, ease of use, and specificity, our bioassay will be more useful than the systems currently employed to screen for CR mimetics, which mimic the beneficial effects of CR. Our system will be particularly useful for high-throughput screening of natural and synthetic candidate molecules.« less

Sequence analyses of fimbriae subunit FimA proteins on Actinomyces naeslundii genospecies 1 and 2 and Actinomyces odontolyticus with variant carbohydrate binding specificities

PubMed Central

Drobni, Mirva; Hallberg, Kristina; Öhman, Ulla; Birve, Anna; Persson, Karina; Johansson, Ingegerd; Strömberg, Nicklas

2006-01-01

Background Actinomyces naeslundii genospecies 1 and 2 express type-2 fimbriae (FimA subunit polymers) with variant Galβ binding specificities and Actinomyces odontolyticus a sialic acid specificity to colonize different oral surfaces. However, the fimbrial nature of the sialic acid binding property and sequence information about FimA proteins from multiple strains are lacking. Results Here we have sequenced fimA genes from strains of A.naeslundii genospecies 1 (n = 4) and genospecies 2 (n = 4), both of which harboured variant Galβ-dependent hemagglutination (HA) types, and from A.odontolyticus PK984 with a sialic acid-dependent HA pattern. Three unique subtypes of FimA proteins with 63.8–66.4% sequence identity were present in strains of A. naeslundii genospecies 1 and 2 and A. odontolyticus. The generally high FimA sequence identity (>97.2%) within a genospecies revealed species specific sequences or segments that coincided with binding specificity. All three FimA protein variants contained a signal peptide, pilin motif, E box, proline-rich segment and an LPXTG sorting motif among other conserved segments for secretion, assembly and sorting of fimbrial proteins. The highly conserved pilin, E box and LPXTG motifs are present in fimbriae proteins from other Gram-positive bacteria. Moreover, only strains of genospecies 1 were agglutinated with type-2 fimbriae antisera derived from A. naeslundii genospecies 1 strain 12104, emphasizing that the overall folding of FimA may generate different functionalities. Western blot analyses with FimA antisera revealed monomers and oligomers of FimA in whole cell protein extracts and a purified recombinant FimA preparation, indicating a sortase-independent oligomerization of FimA. Conclusion The genus Actinomyces involves a diversity of unique FimA proteins with conserved pilin, E box and LPXTG motifs, depending on subspecies and associated binding specificity. In addition, a sortase independent oligomerization of FimA subunit proteins in solution was indicated. PMID:16686953
A sequence upstream of canonical PDZ-binding motif within CFTR COOH-terminus enhances NHERF1 interaction.

PubMed

Sharma, Neeraj; LaRusch, Jessica; Sosnay, Patrick R; Gottschalk, Laura B; Lopez, Andrea P; Pellicore, Matthew J; Evans, Taylor; Davis, Emily; Atalar, Melis; Na, Chan-Hyun; Rosson, Gedge D; Belchis, Deborah; Milewski, Michal; Pandey, Akhilesh; Cutting, Garry R

2016-12-01

The development of cystic fibrosis transmembrane conductance regulator (CFTR) targeted therapy for cystic fibrosis has generated interest in maximizing membrane residence of mutant forms of CFTR by manipulating interactions with scaffold proteins, such as sodium/hydrogen exchange regulatory factor-1 (NHERF1). In this study, we explored whether COOH-terminal sequences in CFTR beyond the PDZ-binding motif influence its interaction with NHERF1. NHERF1 displayed minimal self-association in blot overlays (NHERF1, K d = 1,382 ± 61.1 nM) at concentrations well above physiological levels, estimated at 240 nM from RNA-sequencing and 260 nM by liquid chromatography tandem mass spectrometry in sweat gland, a key site of CFTR function in vivo. However, NHERF1 oligomerized at considerably lower concentrations (10 nM) in the presence of the last 111 amino acids of CFTR (20 nM) in blot overlays and cross-linking assays and in coimmunoprecipitations using differently tagged versions of NHERF1. Deletion and alanine mutagenesis revealed that a six-amino acid sequence 1417 EENKVR 1422 and the terminal 1478 TRL 1480 (PDZ-binding motif) in the COOH-terminus were essential for the enhanced oligomerization of NHERF1. Full-length CFTR stably expressed in Madin-Darby canine kidney epithelial cells fostered NHERF1 oligomerization that was substantially reduced (∼5-fold) on alanine substitution of EEN, KVR, or EENKVR residues or deletion of the TRL motif. Confocal fluorescent microscopy revealed that the EENKVR and TRL sequences contribute to preferential localization of CFTR to the apical membrane. Together, these results indicate that COOH-terminal sequences mediate enhanced NHERF1 interaction and facilitate the localization of CFTR, a property that could be manipulated to stabilize mutant forms of CFTR at the apical surface to maximize the effect of CFTR-targeted therapeutics. Copyright © 2016 the American Physiological Society.
A sequence upstream of canonical PDZ-binding motif within CFTR COOH-terminus enhances NHERF1 interaction

PubMed Central

Sharma, Neeraj; LaRusch, Jessica; Sosnay, Patrick R.; Gottschalk, Laura B.; Lopez, Andrea P.; Pellicore, Matthew J.; Evans, Taylor; Davis, Emily; Atalar, Melis; Na, Chan-Hyun; Rosson, Gedge D.; Belchis, Deborah; Milewski, Michal; Pandey, Akhilesh

2016-01-01

The development of cystic fibrosis transmembrane conductance regulator (CFTR) targeted therapy for cystic fibrosis has generated interest in maximizing membrane residence of mutant forms of CFTR by manipulating interactions with scaffold proteins, such as sodium/hydrogen exchange regulatory factor-1 (NHERF1). In this study, we explored whether COOH-terminal sequences in CFTR beyond the PDZ-binding motif influence its interaction with NHERF1. NHERF1 displayed minimal self-association in blot overlays (NHERF1, Kd = 1,382 ± 61.1 nM) at concentrations well above physiological levels, estimated at 240 nM from RNA-sequencing and 260 nM by liquid chromatography tandem mass spectrometry in sweat gland, a key site of CFTR function in vivo. However, NHERF1 oligomerized at considerably lower concentrations (10 nM) in the presence of the last 111 amino acids of CFTR (20 nM) in blot overlays and cross-linking assays and in coimmunoprecipitations using differently tagged versions of NHERF1. Deletion and alanine mutagenesis revealed that a six-amino acid sequence 1417EENKVR1422 and the terminal 1478TRL1480 (PDZ-binding motif) in the COOH-terminus were essential for the enhanced oligomerization of NHERF1. Full-length CFTR stably expressed in Madin-Darby canine kidney epithelial cells fostered NHERF1 oligomerization that was substantially reduced (∼5-fold) on alanine substitution of EEN, KVR, or EENKVR residues or deletion of the TRL motif. Confocal fluorescent microscopy revealed that the EENKVR and TRL sequences contribute to preferential localization of CFTR to the apical membrane. Together, these results indicate that COOH-terminal sequences mediate enhanced NHERF1 interaction and facilitate the localization of CFTR, a property that could be manipulated to stabilize mutant forms of CFTR at the apical surface to maximize the effect of CFTR-targeted therapeutics. PMID:27793802
The nonamer UUAUUUAUU is the key AU-rich sequence motif that mediates mRNA degradation.

PubMed Central

Zubiaga, A M; Belasco, J G; Greenberg, M E

1995-01-01

Labile mRNAs that encode cytokine and immediate-early gene products often contain AU-rich sequences within their 3' untranslated region (UTR). These AU-rich sequences appear to be key determinants of the short half-lives of these mRNAs, although the sequence features of these elements and the mechanism by which they target mRNAs for rapid decay have not been fully defined. We have examined the features of AU-rich elements (AREs) that are crucial for their function as determinants of mRNA instability in mammalian cells by testing the ability of various mutant c-fos AREs and synthetic AREs to direct rapid mRNA deadenylation and decay when inserted within the 3' UTR of the normally stable beta-globin mRNA. Evidence is presented that the pentamer AUUUA, which previously was suggested to be the minimal determinant of instability present in mammalian AREs, cannot direct rapid mRNA deadenylation and decay. Instead, the nonomer UUAUUUAUU is the elemental AU-rich sequence motif that destabilizes mRNA. Removal of one uridine residue from either end of the nonamer (UUAUUUAU or UAUUUAUU) results in a decrease of potency of the element, while removal of a uridine residue from both ends of the nonamer (UAUUUAU) eliminates detectable destabilizing activity. The inclusion of an additional uridine residue at both ends of the nonamer (UUUAUUUAUUU) does not further increase the efficacy of the element. Taken together, these findings suggest that the nonamer UUAUUUAUU is the minimal AU-rich motif that effectively destabilizes mRNA. Additional ARE potency is achieved by combining multiple copies of this nonamer in a single mRNA 3' UTR. Furthermore, analysis of poly(A) shortening rates for ARE-containing mRNAs reveals that the UUAUUUAUU sequence also accelerates mRNA deadenylation and suggests that the UUAUUUAUU motif targets mRNA for rapid deadenylation as an early step in the mRNA decay process. PMID:7891716
Interaction of Tsg101 with Marburg Virus VP40 Depends on the PPPY Motif, but Not the PT/SAP Motif as in the Case of Ebola Virus, and Tsg101 Plays a Critical Role in the Budding of Marburg Virus-Like Particles Induced by VP40, NP, and GP▿

PubMed Central

Urata, Shuzo; Noda, Takeshi; Kawaoka, Yoshihiro; Morikawa, Shigeru; Yokosawa, Hideyoshi; Yasuda, Jiro

2007-01-01

Marburg virus (MARV) VP40 is a matrix protein that can be released from mammalian cells in the form of virus-like particles (VLPs) and contains the PPPY sequence, which is an L-domain motif. Here, we demonstrate that the PPPY motif is important for VP40-induced VLP budding and that VLP production is significantly enhanced by coexpression of NP and GP. We show that Tsg101 interacts with VP40 depending on the presence of the PPPY motif, but not the PT/SAP motif as in the case of Ebola virus, and plays an important role in VLP budding. These findings provide new insights into the mechanism of MARV budding. PMID:17301151
Transcriptome Analysis of an Insecticide Resistant Housefly Strain: Insights about SNPs and Regulatory Elements in Cytochrome P450 Genes.

PubMed

Mahmood, Khalid; Højland, Dorte H; Asp, Torben; Kristensen, Michael

2016-01-01

Insecticide resistance in the housefly, Musca domestica, has been investigated for more than 60 years. It will enter a new era after the recent publication of the housefly genome and the development of multiple next generation sequencing technologies. The genetic background of the xenobiotic response can now be investigated in greater detail. Here, we investigate the 454-pyrosequencing transcriptome of the spinosad-resistant 791spin strain in relation to the housefly genome with focus on P450 genes. The de novo assembly of clean reads gave 35,834 contigs consisting of 21,780 sequences of the spinosad resistant strain. The 3,648 sequences were annotated with an enzyme code EC number and were mapped to 124 KEGG pathways with metabolic processes as most highly represented pathway. One hundred and twenty contigs were annotated as P450s covering 44 different P450 genes of housefly. Eight differentially expressed P450s genes were identified and investigated for SNPs, CpG islands and common regulatory motifs in promoter and coding regions. Functional annotation clustering of metabolic related genes and motif analysis of P450s revealed their association with epigenetic, transcription and gene expression related functions. The sequence variation analysis resulted in 12 SNPs and eight of them found in cyp6d1. There is variation in location, size and frequency of CpG islands and specific motifs were also identified in these P450s. Moreover, identified motifs were associated to GO terms and transcription factors using bioinformatic tools. Transcriptome data of a spinosad resistant strain provide together with genome data fundamental support for future research to understand evolution of resistance in houseflies. Here, we report for the first time the SNPs, CpG islands and common regulatory motifs in differentially expressed P450s. Taken together our findings will serve as a stepping stone to advance understanding of the mechanism and role of P450s in xenobiotic detoxification.
Presence of a consensus DNA motif at nearby DNA sequence of the mutation susceptible CG nucleotides.

PubMed

Chowdhury, Kaushik; Kumar, Suresh; Sharma, Tanu; Sharma, Ankit; Bhagat, Meenakshi; Kamai, Asangla; Ford, Bridget M; Asthana, Shailendra; Mandal, Chandi C

2018-01-10

Complexity in tissues affected by cancer arises from somatic mutations and epigenetic modifications in the genome. The mutation susceptible hotspots present within the genome indicate a non-random nature and/or a position specific selection of mutation. An association exists between the occurrence of mutations and epigenetic DNA methylation. This study is primarily aimed at determining mutation status, and identifying a signature for predicting mutation prone zones of tumor suppressor (TS) genes. Nearby sequences from the top five positions having a higher mutation frequency in each gene of 42 TS genes were selected from a cosmic database and were considered as mutation prone zones. The conserved motifs present in the mutation prone DNA fragments were identified. Molecular docking studies were done to determine putative interactions between the identified conserved motifs and enzyme methyltransferase DNMT1. Collective analysis of 42 TS genes found GC as the most commonly replaced and AT as the most commonly formed residues after mutation. Analysis of the top 5 mutated positions of each gene (210 DNA segments for 42 TS genes) identified that CG nucleotides of the amino acid codons (e.g., Arginine) are most susceptible to mutation, and found a consensus DNA "T/AGC/GAGGA/TG" sequence present in these mutation prone DNA segments. Similar to TS genes, analysis of 54 oncogenes not only found CG nucleotides of the amino acid Arg as the most susceptible to mutation, but also identified the presence of similar consensus DNA motifs in the mutation prone DNA fragments (270 DNA segments for 54 oncogenes) of oncogenes. Docking studies depicted that, upon binding of DNMT1 methylates to this consensus DNA motif (C residues of CpG islands), mutation was likely to occur. Thus, this study proposes that DNMT1 mediated methylation in chromosomal DNA may decrease if a foreign DNA segment containing this consensus sequence along with CG nucleotides is exogenously introduced to dividing cancer cells. Copyright © 2017 Elsevier B.V. All rights reserved.
Discriminative motif discovery via simulated evolution and random under-sampling.

PubMed

Song, Tao; Gu, Hong

2014-01-01

Conserved motifs in biological sequences are closely related to their structure and functions. Recently, discriminative motif discovery methods have attracted more and more attention. However, little attention has been devoted to the data imbalance problem, which is one of the main reasons affecting the performance of the discriminative models. In this article, a simulated evolution method is applied to solve the multi-class imbalance problem at the stage of data preprocessing, and at the stage of Hidden Markov Models (HMMs) training, a random under-sampling method is introduced for the imbalance between the positive and negative datasets. It is shown that, in the task of discovering targeting motifs of nine subcellular compartments, the motifs found by our method are more conserved than the methods without considering data imbalance problem and recover the most known targeting motifs from Minimotif Miner and InterPro. Meanwhile, we use the found motifs to predict protein subcellular localization and achieve higher prediction precision and recall for the minority classes.
Anion induced conformational preference of Cα NN motif residues in functional proteins.

PubMed

Patra, Piya; Ghosh, Mahua; Banerjee, Raja; Chakrabarti, Jaydeb

2017-12-01

Among different ligand binding motifs, anion binding C α NN motif consisting of peptide backbone atoms of three consecutive residues are observed to be important for recognition of free anions, like sulphate or biphosphate and participate in different key functions. Here we study the interaction of sulphate and biphosphate with C α NN motif present in different proteins. Instead of total protein, a peptide fragment has been studied keeping C α NN motif flanked in between other residues. We use classical force field based molecular dynamics simulations to understand the stability of this motif. Our data indicate fluctuations in conformational preferences of the motif residues in absence of the anion. The anion gives stability to one of these conformations. However, the anion induced conformational preferences are highly sequence dependent and specific to the type of anion. In particular, the polar residues are more favourable compared to the other residues for recognising the anion. © 2017 Wiley Periodicals, Inc.
ATtRACT-a database of RNA-binding proteins and associated motifs.

PubMed

Giudice, Girolamo; Sánchez-Cabo, Fátima; Torroja, Carlos; Lara-Pezzi, Enrique

2016-01-01

RNA-binding proteins (RBPs) play a crucial role in key cellular processes, including RNA transport, splicing, polyadenylation and stability. Understanding the interaction between RBPs and RNA is key to improve our knowledge of RNA processing, localization and regulation in a global manner. Despite advances in recent years, a unified non-redundant resource that includes information on experimentally validated motifs, RBPs and integrated tools to exploit this information is lacking. Here, we developed a database named ATtRACT (available athttp://attract.cnic.es) that compiles information on 370 RBPs and 1583 RBP consensus binding motifs, 192 of which are not present in any other database. To populate ATtRACT we (i) extracted and hand-curated experimentally validated data from CISBP-RNA, SpliceAid-F, RBPDB databases, (ii) integrated and updated the unavailable ASD database and (iii) extracted information from Protein-RNA complexes present in Protein Data Bank database through computational analyses. ATtRACT provides also efficient algorithms to search a specific motif and scan one or more RNA sequences at a time. It also allows discoveringde novomotifs enriched in a set of related sequences and compare them with the motifs included in the database.Database URL:http:// attract. cnic. es. © The Author(s) 2016. Published by Oxford University Press.
Sequence-specific DNA binding by MYC/MAX to low-affinity non-E-box motifs.

PubMed

Allevato, Michael; Bolotin, Eugene; Grossman, Mark; Mane-Padros, Daniel; Sladek, Frances M; Martinez, Ernest

2017-01-01

The MYC oncoprotein regulates transcription of a large fraction of the genome as an obligatory heterodimer with the transcription factor MAX. The MYC:MAX heterodimer and MAX:MAX homodimer (hereafter MYC/MAX) bind Enhancer box (E-box) DNA elements (CANNTG) and have the greatest affinity for the canonical MYC E-box (CME) CACGTG. However, MYC:MAX also recognizes E-box variants and was reported to bind DNA in a "non-specific" fashion in vitro and in vivo. Here, in order to identify potential additional non-canonical binding sites for MYC/MAX, we employed high throughput in vitro protein-binding microarrays, along with electrophoretic mobility-shift assays and bioinformatic analyses of MYC-bound genomic loci in vivo. We identified all hexameric motifs preferentially bound by MYC/MAX in vitro, which include the low-affinity non-E-box sequence AACGTT, and found that the vast majority (87%) of MYC-bound genomic sites in a human B cell line contain at least one of the top 21 motifs bound by MYC:MAX in vitro. We further show that high MYC/MAX concentrations are needed for specific binding to the low-affinity sequence AACGTT in vitro and that elevated MYC levels in vivo more markedly increase the occupancy of AACGTT sites relative to CME sites, especially at distal intergenic and intragenic loci. Hence, MYC binds diverse DNA motifs with a broad range of affinities in a sequence-specific and dose-dependent manner, suggesting that MYC overexpression has more selective effects on the tumor transcriptome than previously thought.
Analysis of septins across kingdoms reveals orthology and new motifs.

PubMed

Pan, Fangfang; Malmberg, Russell L; Momany, Michelle

2007-07-01

Septins are cytoskeletal GTPase proteins first discovered in the fungus Saccharomyces cerevisiae where they organize the septum and link nuclear division with cell division. More recently septins have been found in animals where they are important in processes ranging from actin and microtubule organization to embryonic patterning and where defects in septins have been implicated in human disease. Previous studies suggested that many animal septins fell into independent evolutionary groups, confounding cross-kingdom comparison. In the current work, we identified 162 septins from fungi, microsporidia and animals and analyzed their phylogenetic relationships. There was support for five groups of septins with orthology between kingdoms. Group 1 (which includes S. cerevisiae Cdc10p and human Sept9) and Group 2 (which includes S. cerevisiae Cdc3p and human Sept7) contain sequences from fungi and animals. Group 3 (which includes S. cerevisiae Cdc11p) and Group 4 (which includes S. cerevisiae Cdc12p) contain sequences from fungi and microsporidia. Group 5 (which includes Aspergillus nidulans AspE) contains sequences from filamentous fungi. We suggest a modified nomenclature based on these phylogenetic relationships. Comparative sequence alignments revealed septin derivatives of already known G1, G3 and G4 GTPase motifs, four new motifs from two to twelve amino acids long and six conserved single amino acid positions. One of these new motifs is septin-specific and several are group specific. Our studies provide an evolutionary history for this important family of proteins and a framework and consistent nomenclature for comparison of septin orthologs across kingdoms.
Targeting MED1 LxxLL Motifs for Tissue-Selective Treatment of Human Breast Cancer

DTIC Science & Technology

2012-09-01

his colleagues have successfully conjugated malachite green (MG) aptamer to RNA nanoparticles characterized by a three-way junction (3WJ) pRNA motif...nanoparticle harboring malachite green (MG) aptamer, survivin siRNA and folate-DNA/RNA sequence for targeting delivery, using 3WJ-pRNA as scaffolds. Figure
Stereochemical determinants of C-terminal specificity in PDZ peptide-binding domains: a novel contribution of the carboxylate-binding loop.

PubMed

Amacher, Jeanine F; Cushing, Patrick R; Bahl, Christopher D; Beck, Tobias; Madden, Dean R

2013-02-15

PDZ (PSD-95/Dlg/ZO-1) binding domains often serve as cellular traffic engineers, controlling the localization and activity of a wide variety of binding partners. As a result, they play important roles in both physiological and pathological processes. However, PDZ binding specificities overlap, allowing multiple PDZ proteins to mediate distinct effects on shared binding partners. For example, several PDZ domains bind the cystic fibrosis (CF) transmembrane conductance regulator (CFTR), an epithelial ion channel mutated in CF. Among these binding partners, the CFTR-associated ligand (CAL) facilitates post-maturational degradation of the channel and is thus a potential therapeutic target. Using iterative optimization, we previously developed a selective CAL inhibitor peptide (iCAL36). Here, we investigate the stereochemical basis of iCAL36 specificity. The crystal structure of iCAL36 in complex with the CAL PDZ domain reveals stereochemical interactions distributed along the peptide-binding cleft, despite the apparent degeneracy of the CAL binding motif. A critical selectivity determinant that distinguishes CAL from other CFTR-binding PDZ domains is the accommodation of an isoleucine residue at the C-terminal position (P(0)), a characteristic shared with the Tax-interacting protein-1. Comparison of the structures of these two PDZ domains in complex with ligands containing P(0) Leu or Ile residues reveals two distinct modes of accommodation for β-branched C-terminal side chains. Access to each mode is controlled by distinct residues in the carboxylate-binding loop. These studies provide new insights into the primary sequence determinants of binding motifs, which in turn control the scope and evolution of PDZ interactomes.
A Viral-Human Interactome Based on Structural Motif-Domain Interactions Captures the Human Infectome

PubMed Central

Guo, Xianwu; Rodríguez-Pérez, Mario A.

2013-01-01

Protein interactions between a pathogen and its host are fundamental in the establishment of the pathogen and underline the infection mechanism. In the present work, we developed a single predictive model for building a host-viral interactome based on the identification of structural descriptors from motif-domain interactions of protein complexes deposited in the Protein Data Bank (PDB). The structural descriptors were used for searching, in a database of protein sequences of human and five clinically important viruses; therefore, viral and human proteins sharing a descriptor were predicted as interacting proteins. The analysis of the host-viral interactome allowed to identify a set of new interactions that further explain molecular mechanism associated with viral infections and showed that it was able to capture human proteins already associated to viral infections (human infectome) and non-infectious diseases (human diseasome). The analysis of human proteins targeted by viral proteins in the context of a human interactome showed that their neighbors are enriched in proteins reported with differential expression under infection and disease conditions. It is expected that the findings of this work will contribute to the development of systems biology for infectious diseases, and help guide the rational identification and prioritization of novel drug targets. PMID:23951184
Substrate Specificity and Possible Heterologous Targets of Phytaspase, a Plant Cell Death Protease.

PubMed

Galiullina, Raisa A; Kasperkiewicz, Paulina; Chichkova, Nina V; Szalek, Aleksandra; Serebryakova, Marina V; Poreba, Marcin; Drag, Marcin; Vartapetian, Andrey B

2015-10-09

Plants lack aspartate-specific cell death proteases homologous to animal caspases. Instead, a subtilisin-like serine-dependent plant protease named phytaspase shown to be involved in the accomplishment of programmed death of plant cells is able to hydrolyze a number of peptide-based caspase substrates. Here, we determined the substrate specificity of rice (Oryza sativa) phytaspase by using the positional scanning substrate combinatorial library approach. Phytaspase was shown to display an absolute specificity of hydrolysis after an aspartic acid residue. The preceding amino acid residues, however, significantly influence the efficiency of hydrolysis. Efficient phytaspase substrates demonstrated a remarkable preference for an aromatic amino acid residue in the P3 position. The deduced optimum phytaspase recognition motif has the sequence IWLD and is strikingly hydrophobic. The established pattern was confirmed through synthesis and kinetic analysis of cleavage of a set of optimized peptide substrates. An amino acid motif similar to the phytaspase cleavage site is shared by the human gastrointestinal peptide hormones gastrin and cholecystokinin. In agreement with the established enzyme specificity, phytaspase was shown to hydrolyze gastrin-1 and cholecystokinin at the predicted sites in vitro, thus destroying the active moieties of the hormones. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.
Molecular cloning of a C-type lectin (LvLT) from the shrimp Litopenaeus vannamei: early gene down-regulation after WSSV infection.

PubMed

Ma, Tracy Hoi Tung; Tiu, Shirley Hiu Kwan; He, Jian-Guo; Chan, Siu-Ming

2007-08-01

C-type lectin is one of the pattern-recognition proteins of the non-self innate immune system in the invertebrates. In this study, a lectin-like cDNA (LvLT) of Litopenaeus vannamei was cloned and characterized. LvLT cDNA consists of 1035 nt encoding for a protein with 345 amino acid residues. The deduced LvLT consists of two putative carbohydrate-recognition domains (CRDs) as found in most C-type lectins. The first CRD consists of an amino acid motif (QPD) for the binding of galactose and the other CRDs consist of amino acid motifs (EPN) for the binding of mannose. Except for some conserved amino acid residues, the CRD of LvLT shared an overall low amino acid sequence identity with CRDs of other lectins. Unlike other shrimp lectins, LvLT is expressed only in the hepatopancreas but not in the hemocytes as revealed by RT-PCR. When juvenile shrimp were challenged with shrimp extracts containing white spot syndrome virus (WSSV), the expression levels of LvLT decreased initially in the first 2 h and then increased to a much higher level after 4 h. The results suggest that the initial reduction in LvLT transcript level may be related to the WSSV infection in shrimp.
SLiMSearch 2.0: biological context for short linear motifs in proteins

PubMed Central

Davey, Norman E.; Haslam, Niall J.; Shields, Denis C.

2011-01-01

Short, linear motifs (SLiMs) play a critical role in many biological processes. The SLiMSearch 2.0 (Short, Linear Motif Search) web server allows researchers to identify occurrences of a user-defined SLiM in a proteome, using conservation and protein disorder context statistics to rank occurrences. User-friendly output and visualizations of motif context allow the user to quickly gain insight into the validity of a putatively functional motif occurrence. For each motif occurrence, overlapping UniProt features and annotated SLiMs are displayed. Visualization also includes annotated multiple sequence alignments surrounding each occurrence, showing conservation and protein disorder statistics in addition to known and predicted SLiMs, protein domains and known post-translational modifications. In addition, enrichment of Gene Ontology terms and protein interaction partners are provided as indicators of possible motif function. All web server results are available for download. Users can search motifs against the human proteome or a subset thereof defined by Uniprot accession numbers or GO term. The SLiMSearch server is available at: http://bioware.ucd.ie/slimsearch2.html. PMID:21622654
A flexible motif search technique based on generalized profiles.

PubMed

Bucher, P; Karplus, K; Moeri, N; Hofmann, K

1996-03-01

A flexible motif search technique is presented which has two major components: (1) a generalized profile syntax serving as a motif definition language; and (2) a motif search method specifically adapted to the problem of finding multiple instances of a motif in the same sequence. The new profile structure, which is the core of the generalized profile syntax, combines the functions of a variety of motif descriptors implemented in other methods, including regular expression-like patterns, weight matrices, previously used profiles, and certain types of hidden Markov models (HMMs). The relationship between generalized profiles and other biomolecular motif descriptors is analyzed in detail, with special attention to HMMs. Generalized profiles are shown to be equivalent to a particular class of HMMs, and conversion procedures in both directions are given. The conversion procedures provide an interpretation for local alignment in the framework of stochastic models, allowing for clear, simple significance tests. A mathematical statement of the motif search problem defines the new method exactly without linking it to a specific algorithmic solution. Part of the definition includes a new definition of disjointness of alignments.
Establishing glucose- and ABA-regulated transcription networks in Arabidopsis by microarray analysis and promoter classification using a Relevance Vector Machine.

PubMed

Li, Yunhai; Lee, Kee Khoon; Walsh, Sean; Smith, Caroline; Hadingham, Sophie; Sorefan, Karim; Cawley, Gavin; Bevan, Michael W

2006-03-01

Establishing transcriptional regulatory networks by analysis of gene expression data and promoter sequences shows great promise. We developed a novel promoter classification method using a Relevance Vector Machine (RVM) and Bayesian statistical principles to identify discriminatory features in the promoter sequences of genes that can correctly classify transcriptional responses. The method was applied to microarray data obtained from Arabidopsis seedlings treated with glucose or abscisic acid (ABA). Of those genes showing >2.5-fold changes in expression level, approximately 70% were correctly predicted as being up- or down-regulated (under 10-fold cross-validation), based on the presence or absence of a small set of discriminative promoter motifs. Many of these motifs have known regulatory functions in sugar- and ABA-mediated gene expression. One promoter motif that was not known to be involved in glucose-responsive gene expression was identified as the strongest classifier of glucose-up-regulated gene expression. We show it confers glucose-responsive gene expression in conjunction with another promoter motif, thus validating the classification method. We were able to establish a detailed model of glucose and ABA transcriptional regulatory networks and their interactions, which will help us to understand the mechanisms linking metabolism with growth in Arabidopsis. This study shows that machine learning strategies coupled to Bayesian statistical methods hold significant promise for identifying functionally significant promoter sequences.

The Thiamine-Pyrophosphate-Motif

NASA Technical Reports Server (NTRS)

Ciszak, Ewa; Dominiak, Paulina

2004-01-01

Thiamin pyrophosphate (TPP), a derivative of vitamin B1, is a cofactor for enzymes performing catalysis in pathways of energy production including the well known decarboxylation of a-keto acid dehydrogenases followed by transketolation. TPP-dependent enzymes constitute a structurally and functionally diverse group exhibiting multimeric subunit organization, multiple domains and two chemically equivalent catalytic centers. Annotation of functional TPP-dependcnt enzymes, therefore, has not been trivial due to low sequence similarity related to this complex organization. Our approach to analysis of structures of known TPP-dependent enzymes reveals for the first time features common to this group, which we have termed the TPP-motif. The TPP-motif consists of specific spatial arrangements of structural elements and their specific contacts to provide for a flip-flop, or alternate site, enzymatic mechanism of action. Analysis of structural elements entrained in the flip-flop action displayed by TPP-dependent enzymes reveals a novel definition of the common amino acid sequences. These sequences allow for annotation of TPP-dependent enzymes, thus advancing functional proteomics. Further details of three-dimensional structures of TPP-dependent enzymes will be discussed.
Characterization and evolution of the mitochondrial DNA control region in hornbills (Bucerotiformes).

PubMed

Delport, Wayne; Ferguson, J Willem H; Bloomer, Paulette

2002-06-01

We determined the mitochondrial DNA control region sequences of six Bucerotiformes. Hornbills have the typical avian gene order and their control region is similar to other avian control regions in that it is partitioned into three domains: two variable domains that flank a central conserved domain. Two characteristics of the hornbill control region sequence differ from that of other birds. First, domain I is AT rich as opposed to AC rich, and second, the control region is approximately 500 bp longer than that of other birds. Both these deviations from typical avian control region sequence are explainable on the basis of repeat motifs in domain I of the hornbill control region. The repeat motifs probably originated from a duplication of CSB-1 as has been determined in chicken, quail, and snowgoose. Furthermore, the hornbill repeat motifs probably arose before the divergence of hornbills from each other but after the divergence of hornbills from other avian taxa. The mitochondrial control region of hornbills is suitable for both phylogenetic and population studies, with domains I and II probably more suited to population and phylogenetic analyses, respectively.
A dehydrin cognate protein from pea (Pisum sativum L.) with an atypical pattern of expression.

PubMed

Robertson, M; Chandler, P M

1994-11-01

Dehydrins are a family of proteins characterised by conserved amino acid motifs, and induced in plants by dehydration or treatment with ABA. An antiserum was raised against a synthetic oligopeptide based on the most highly conserved dehydrin amino acid motif, the lysine-rich (core sequence KIKEK-LPG). This antiserum detected a novel M(r) 40,000 polypeptide and enabled isolation of a corresponding cDNA clone, pPsB61 (B61). The deduced amino acid sequence contained two lysine-rich blocks, however the remainder of the sequenced differed markedly from other pea dehydrins. Surprisingly, the sequence contained a stretch of serine residues, a characteristic common to dehydrins from many plant species but which is missing in pea dehydrin. The expression patterns of B61 mRNA and polypeptide were distinctively different from those of the pea dehydrins during seed development, germination and in young seedlings exposed to dehydration stress or treated with ABA. In particular, dehydration stress led to slightly reduced levels of B61 RNA, and ABA application to young seedlings had no marked effect on its abundance. The M(r) 40,000 polypeptide is thus related to pea dehydrin by the presence of the most highly conserved amino acid sequence motifs, but lacks the characteristic expression pattern of dehydrin. By analogy with heat shock cognate proteins we refer to this protein as a dehydrin cognate.
Systematic and fully automated identification of protein sequence patterns.

PubMed

Hart, R K; Royyuru, A K; Stolovitzky, G; Califano, A

2000-01-01

We present an efficient algorithm to systematically and automatically identify patterns in protein sequence families. The procedure is based on the Splash deterministic pattern discovery algorithm and on a framework to assess the statistical significance of patterns. We demonstrate its application to the fully automated discovery of patterns in 974 PROSITE families (the complete subset of PROSITE families which are defined by patterns and contain DR records). Splash generates patterns with better specificity and undiminished sensitivity, or vice versa, in 28% of the families; identical statistics were obtained in 48% of the families, worse statistics in 15%, and mixed behavior in the remaining 9%. In about 75% of the cases, Splash patterns identify sequence sites that overlap more than 50% with the corresponding PROSITE pattern. The procedure is sufficiently rapid to enable its use for daily curation of existing motif and profile databases. Third, our results show that the statistical significance of discovered patterns correlates well with their biological significance. The trypsin subfamily of serine proteases is used to illustrate this method's ability to exhaustively discover all motifs in a family that are statistically and biologically significant. Finally, we discuss applications of sequence patterns to multiple sequence alignment and the training of more sensitive score-based motif models, akin to the procedure used by PSI-BLAST. All results are available at httpl//www.research.ibm.com/spat/.
MPRAnator: a web-based tool for the design of massively parallel reporter assay experiments

PubMed Central

Georgakopoulos-Soares, Ilias; Jain, Naman; Gray, Jesse M; Hemberg, Martin

2017-01-01

Motivation: With the rapid advances in DNA synthesis and sequencing technologies and the continuing decline in the associated costs, high-throughput experiments can be performed to investigate the regulatory role of thousands of oligonucleotide sequences simultaneously. Nevertheless, designing high-throughput reporter assay experiments such as massively parallel reporter assays (MPRAs) and similar methods remains challenging. Results: We introduce MPRAnator, a set of tools that facilitate rapid design of MPRA experiments. With MPRA Motif design, a set of variables provides fine control of how motifs are placed into sequences, thereby allowing the investigation of the rules that govern transcription factor (TF) occupancy. MPRA single-nucleotide polymorphism design can be used to systematically examine the functional effects of single or combinations of single-nucleotide polymorphisms at regulatory sequences. Finally, the Transmutation tool allows for the design of negative controls by permitting scrambling, reversing, complementing or introducing multiple random mutations in the input sequences or motifs. Availability and implementation: MPRAnator tool set is implemented in Python, Perl and Javascript and is freely available at www.genomegeek.com and www.sanger.ac.uk/science/tools/mpranator. The source code is available on www.github.com/hemberg-lab/MPRAnator/ under the MIT license. The REST API allows programmatic access to MPRAnator using simple URLs. Contact: igs@sanger.ac.uk or mh26@sanger.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27605100
MPRAnator: a web-based tool for the design of massively parallel reporter assay experiments.

PubMed

Georgakopoulos-Soares, Ilias; Jain, Naman; Gray, Jesse M; Hemberg, Martin

2017-01-01

With the rapid advances in DNA synthesis and sequencing technologies and the continuing decline in the associated costs, high-throughput experiments can be performed to investigate the regulatory role of thousands of oligonucleotide sequences simultaneously. Nevertheless, designing high-throughput reporter assay experiments such as massively parallel reporter assays (MPRAs) and similar methods remains challenging. We introduce MPRAnator, a set of tools that facilitate rapid design of MPRA experiments. With MPRA Motif design, a set of variables provides fine control of how motifs are placed into sequences, thereby allowing the investigation of the rules that govern transcription factor (TF) occupancy. MPRA single-nucleotide polymorphism design can be used to systematically examine the functional effects of single or combinations of single-nucleotide polymorphisms at regulatory sequences. Finally, the Transmutation tool allows for the design of negative controls by permitting scrambling, reversing, complementing or introducing multiple random mutations in the input sequences or motifs. MPRAnator tool set is implemented in Python, Perl and Javascript and is freely available at www.genomegeek.com and www.sanger.ac.uk/science/tools/mpranator The source code is available on www.github.com/hemberg-lab/MPRAnator/ under the MIT license. The REST API allows programmatic access to MPRAnator using simple URLs. igs@sanger.ac.uk or mh26@sanger.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Recognition of p63 by the E3 ligase ITCH: Effect of an ectodermal dysplasia mutant.

PubMed

Bellomaria, A; Barbato, Gaetano; Melino, G; Paci, M; Melino, Sonia

2010-09-15

The E3 ubiquitin ligase Itch mediates the degradation of the p63 protein. Itch contains four WW domains which are pivotal for the substrate recognition process. Indeed, this domain is implicated in several signalling complexes crucially involved in human diseases including Muscular Dystrophy, Alzheimer's Disease and Huntington Disease. WW domains are highly compact protein-protein binding modules that interact with short proline-rich sequences. The four WW domains present in Itch belong to the Group I type, which binds polypeptides with a PY motif characterized by a PP xY consensus sequence, where x can be any residue. Accordingly, the Itch-p63 interaction results from a direct binding of Itch-WW2 domain with the PY motif of p63. Here, we report a structural analysis of the Itch-p63 interaction by fluorescence, CD and NMR spectroscopy. Indeed, we studied the in vitro interaction between Itch-WW2 domain and p63(534-551), an 18-mer peptide encompassing a fragment of the p63 protein including the PY motif. In addition, we evaluated the conformation and the interaction with Itch-WW2 of a site specific mutant of p63, I549T, that has been reported in both Hay-Wells syndrome and Rapp-Hodgkin syndrome. Based on our results, we propose an extended PP xY motif for the Itch recognition motif (P-P-P-Y-x(4)-[ST]-[ILV]), which includes these C-terminal residues to the PP xY motif.
Systematic analysis of phosphotyrosine antibodies recognizing single phosphorylated EPIYA-motifs in CagA of East Asian-type Helicobacter pylori strains.

PubMed

Lind, Judith; Backert, Steffen; Hoffmann, Rebecca; Eichler, Jutta; Yamaoka, Yoshio; Perez-Perez, Guillermo I; Torres, Javier; Sticht, Heinrich; Tegtmeyer, Nicole

2016-09-02

Highly virulent strains of the gastric pathogen Helicobacter pylori encode a type IV secretion system (T4SS) that delivers the effector protein CagA into gastric epithelial cells. Translocated CagA undergoes tyrosine phosphorylation by members of the oncogenic c-Src and c-Abl host kinases at EPIYA-sequence motifs A, B and D in East Asian-type strains. These phosphorylated EPIYA-motifs serve as recognition sites for various SH2-domains containing human proteins, mediating interactions of CagA with host signaling factors to manipulate signal transduction pathways. Recognition of phospho-CagA is mainly based on the use of commercial pan-phosphotyrosine antibodies that were originally designed to detect phosphotyrosines in mammalian proteins. Specific anti-phospho-EPIYA antibodies for each of the three sites in CagA are not forthcoming. This study was designed to systematically analyze the detection preferences of each phosphorylated East Asian CagA EPIYA-motif by pan-phosphotyrosine antibodies and to determine a minimal recognition sequence. We synthesized phospho- and non-phosphopeptides derived from each predominant EPIYA-site, and determined the recognition patterns by seven different pan-phosphotyrosine antibodies using Western blotting, and also investigated representative East Asian H. pylori isolates during infection. The results indicate that a total of only 9-11 amino acids containing the phosphorylated East Asian EPIYA-types are required and sufficient to detect the phosphopeptides with high specificity. However, the sequence recognition by the different antibodies was found to bear high variability. From the seven antibodies used, only four recognized all three phosphorylated EPIYA-motifs A, B and D similarly well. Two of the phosphotyrosine antibodies preferentially bound primarily to the phosphorylated motif A and D, while the seventh antibody failed to react with any of the phosphorylated EPIYA-motifs. Control experiments confirmed that none of the antibodies reacted with non-phospho-CagA peptides and in accordance were able to recognize phosphotyrosine proteins in human cells. The results of this study disclose the various binding preferences of commercial anti-phosphotyrosine antibodies for phospho-EPIYA-motifs, and are valuable in the application for further characterization of CagA phosphorylation events during infection with H. pylori and risk prediction for gastric disease development.
A computational approach to candidate gene prioritization for X-linked mental retardation using annotation-based binary filtering and motif-based linear discriminatory analysis

PubMed Central

2011-01-01

Background Several computational candidate gene selection and prioritization methods have recently been developed. These in silico selection and prioritization techniques are usually based on two central approaches - the examination of similarities to known disease genes and/or the evaluation of functional annotation of genes. Each of these approaches has its own caveats. Here we employ a previously described method of candidate gene prioritization based mainly on gene annotation, in accompaniment with a technique based on the evaluation of pertinent sequence motifs or signatures, in an attempt to refine the gene prioritization approach. We apply this approach to X-linked mental retardation (XLMR), a group of heterogeneous disorders for which some of the underlying genetics is known. Results The gene annotation-based binary filtering method yielded a ranked list of putative XLMR candidate genes with good plausibility of being associated with the development of mental retardation. In parallel, a motif finding approach based on linear discriminatory analysis (LDA) was employed to identify short sequence patterns that may discriminate XLMR from non-XLMR genes. High rates (>80%) of correct classification was achieved, suggesting that the identification of these motifs effectively captures genomic signals associated with XLMR vs. non-XLMR genes. The computational tools developed for the motif-based LDA is integrated into the freely available genomic analysis portal Galaxy (http://main.g2.bx.psu.edu/). Nine genes (APLN, ZC4H2, MAGED4, MAGED4B, RAP2C, FAM156A, FAM156B, TBL1X, and UXT) were highlighted as highly-ranked XLMR methods. Conclusions The combination of gene annotation information and sequence motif-orientated computational candidate gene prediction methods highlight an added benefit in generating a list of plausible candidate genes, as has been demonstrated for XLMR. Reviewers: This article was reviewed by Dr Barbara Bardoni (nominated by Prof Juergen Brosius); Prof Neil Smalheiser and Dr Dustin Holloway (nominated by Prof Charles DeLisi). PMID:21668950
Unusual occurrence of a DAG motif in the Ipomovirus Cassava brown streak virus and implications for its vector transmission.

PubMed

Ateka, Elijah; Alicai, Titus; Ndunguru, Joseph; Tairo, Fred; Sseruwagi, Peter; Kiarie, Samuel; Makori, Timothy; Kehoe, Monica A; Boykin, Laura M

2017-01-01

Cassava is the main staple food for over 800 million people globally. Its production in eastern Africa is being constrained by two devastating Ipomoviruses that cause cassava brown streak disease (CBSD); Cassava brown streak virus (CBSV) and Ugandan cassava brown streak virus (UCBSV), with up to 100% yield loss for smallholder farmers in the region. To date, vector studies have not resulted in reproducible and highly efficient transmission of CBSV and UCBSV. Most virus transmission studies have used Bemisia tabaci (whitefly), but a maximum of 41% U/CBSV transmission efficiency has been documented for this vector. With the advent of next generation sequencing, researchers are generating whole genome sequences for both CBSV and UCBSV from throughout eastern Africa. Our initial goal for this study was to characterize U/CBSV whole genomes from CBSD symptomatic cassava plants sampled in Kenya. We have generated 8 new whole genomes (3 CBSV and 5 UCBSV) from Kenya, and in the process of analyzing these genomes together with 26 previously published sequences, we uncovered the aphid transmission associated DAG motif within coat protein genes of all CBSV whole genomes at amino acid positions 52-54, but not in UCBSV. Upon further investigation, the DAG motif was also found at the same positions in two other Ipomoviruses: Squash vein yellowing virus (SqVYV), Coccinia mottle virus (CocMoV). Until this study, the highly-conserved DAG motif, which is associated with aphid transmission was only noticed once, in SqVYV but discounted as being of minimal importance. This study represents the first comprehensive look at Ipomovirus genomes to determine the extent of DAG motif presence and significance for vector relations. The presence of this motif suggests that aphids could potentially be a vector of CBSV, SqVYV and CocMov. Further transmission and ipomoviral protein evolutionary studies are needed to confirm this hypothesis.
Motivated Proteins: A web application for studying small three-dimensional protein motifs

PubMed Central

Leader, David P; Milner-White, E James

2009-01-01

Background Small loop-shaped motifs are common constituents of the three-dimensional structure of proteins. Typically they comprise between three and seven amino acid residues, and are defined by a combination of dihedral angles and hydrogen bonding partners. The most abundant of these are αβ-motifs, asx-motifs, asx-turns, β-bulges, β-bulge loops, β-turns, nests, niches, Schellmann loops, ST-motifs, ST-staples and ST-turns. We have constructed a database of such motifs from a range of high-quality protein structures and built a web application as a visual interface to this. Description The web application, Motivated Proteins, provides access to these 12 motifs (with 48 sub-categories) in a database of over 400 representative proteins. Queries can be made for specific categories or sub-categories of motif, motifs in the vicinity of ligands, motifs which include part of an enzyme active site, overlapping motifs, or motifs which include a particular amino acid sequence. Individual proteins can be specified, or, where appropriate, motifs for all proteins listed. The results of queries are presented in textual form as an (X)HTML table, and may be saved as parsable plain text or XML. Motifs can be viewed and manipulated either individually or in the context of the protein in the Jmol applet structural viewer. Cartoons of the motifs imposed on a linear representation of protein secondary structure are also provided. Summary information for the motifs is available, as are histograms of amino acid distribution, and graphs of dihedral angles at individual positions in the motifs. Conclusion Motivated Proteins is a publicly and freely accessible web application that enables protein scientists to study small three-dimensional motifs without requiring knowledge of either Structured Query Language or the underlying database schema. PMID:19210785
Sequence characterization and immunogenicity of cystatins from the cattle tick Rhipicephalus (Boophilus) microplus.

PubMed

Parizi, Luís F; Githaka, Naftaly W; Acevedo, Carolina; Benavides, Uruguaysito; Seixas, Adriana; Logullo, Carlos; Konnai, Satoru; Ohashi, Kazuhiko; Masuda, Aoi; da Silva Vaz, Itabajara

2013-12-01

Various classes of endopeptidases and their inhibitors facilitate blood feeding and digestion in ticks. Cystatins, a family of tight-binding and reversible inhibitors of cysteine endopeptidases, have recently been found in several tick tissues. Moreover, vaccine trials using tick cystatins have been found to induce protective immune responses against tick infestation. However, the mode of action of tick cystatins is still poorly understood, limiting the elucidation of their physiological role. Against this background, we have investigated sequence characteristics and immunogenic properties of 5 putative cystatins from Rhipicephalus (Boophilus) microplus from Brazil and Uruguay. The similarity of the deduced amino acid sequences among cystatins from the Brazilian tick strain was 27-42%, all of which had a secretory signal peptide. The cystatin motif (QxVxG), a glycine in the N-terminal region, and the PW motif in the second hairpin loop in the C-terminal region are highly conserved in all 5 cystatins identified in this study. Four cysteine residues in the C terminus characteristic of type 2 cystatins are also present. qRT-PCR revealed differential expression patterns among the 5 cystatins identified, as well as variation in mRNA transcripts present in egg, larva, gut, salivary glands, ovary, and fat body tissues. One R. microplus cystatin showed 97-100% amino acid similarity between Brazilian and Uruguayan isolates. Furthermore, by in silico analysis, antigenic amino acid regions from R. microplus cystatins showed high degrees of homology (54-92%) among Rhipicephalus spp. cystatins. Three Brazilian R. microplus cystatins were expressed in Escherichia coli, and immunogenicity of the recombinant proteins were determined by vaccinating mice. Western blotting using mice sera indicated cross-reactivity between the cystatins, suggesting shared epitopes. The present characterization of Rhipicephalus spp. cystatins represents an empirical approach in an effort to evaluate the physiological role of cystatins in a larger context of targeting them for use in future tick control strategies. Copyright © 2013 Elsevier GmbH. All rights reserved.
Genome-wide identification and characterization of WRKY gene family in Salix suchowensis.

PubMed

Bi, Changwei; Xu, Yiqing; Ye, Qiaolin; Yin, Tongming; Ye, Ning

2016-01-01

WRKY proteins are the zinc finger transcription factors that were first identified in plants. They can specifically interact with the W-box, which can be found in the promoter region of a large number of plant target genes, to regulate the expressions of downstream target genes. They also participate in diverse physiological and growing processes in plants. Prior to this study, a plenty of WRKY genes have been identified and characterized in herbaceous species, but there is no large-scale study of WRKY genes in willow. With the whole genome sequencing of Salix suchowensis, we have the opportunity to conduct the genome-wide research for willow WRKY gene family. In this study, we identified 85 WRKY genes in the willow genome and renamed them from SsWRKY1 to SsWRKY85 on the basis of their specific distributions on chromosomes. Due to their diverse structural features, the 85 willow WRKY genes could be further classified into three main groups (group I-III), with five subgroups (IIa-IIe) in group II. With the multiple sequence alignment and the manual search, we found three variations of the WRKYGQK heptapeptide: WRKYGRK, WKKYGQK and WRKYGKK, and four variations of the normal zinc finger motif, which might execute some new biological functions. In addition, the SsWRKY genes from the same subgroup share the similar exon-intron structures and conserved motif domains. Further studies of SsWRKY genes revealed that segmental duplication events (SDs) played a more prominent role in the expansion of SsWRKY genes. Distinct expression profiles of SsWRKY genes with RNA sequencing data revealed that diverse expression patterns among five tissues, including tender roots, young leaves, vegetative buds, non-lignified stems and barks. With the analyses of WRKY gene family in willow, it is not only beneficial to complete the functional and annotation information of WRKY genes family in woody plants, but also provide important references to investigate the expansion and evolution of this gene family in flowering plants.
Genome-wide identification and characterization of WRKY gene family in Salix suchowensis

PubMed Central

Ye, Qiaolin; Yin, Tongming

2016-01-01

WRKY proteins are the zinc finger transcription factors that were first identified in plants. They can specifically interact with the W-box, which can be found in the promoter region of a large number of plant target genes, to regulate the expressions of downstream target genes. They also participate in diverse physiological and growing processes in plants. Prior to this study, a plenty of WRKY genes have been identified and characterized in herbaceous species, but there is no large-scale study of WRKY genes in willow. With the whole genome sequencing of Salix suchowensis, we have the opportunity to conduct the genome-wide research for willow WRKY gene family. In this study, we identified 85 WRKY genes in the willow genome and renamed them from SsWRKY1 to SsWRKY85 on the basis of their specific distributions on chromosomes. Due to their diverse structural features, the 85 willow WRKY genes could be further classified into three main groups (group I–III), with five subgroups (IIa–IIe) in group II. With the multiple sequence alignment and the manual search, we found three variations of the WRKYGQK heptapeptide: WRKYGRK, WKKYGQK and WRKYGKK, and four variations of the normal zinc finger motif, which might execute some new biological functions. In addition, the SsWRKY genes from the same subgroup share the similar exon–intron structures and conserved motif domains. Further studies of SsWRKY genes revealed that segmental duplication events (SDs) played a more prominent role in the expansion of SsWRKY genes. Distinct expression profiles of SsWRKY genes with RNA sequencing data revealed that diverse expression patterns among five tissues, including tender roots, young leaves, vegetative buds, non-lignified stems and barks. With the analyses of WRKY gene family in willow, it is not only beneficial to complete the functional and annotation information of WRKY genes family in woody plants, but also provide important references to investigate the expansion and evolution of this gene family in flowering plants. PMID:27651997
ELM: the status of the 2010 eukaryotic linear motif resource

PubMed Central

Gould, Cathryn M.; Diella, Francesca; Via, Allegra; Puntervoll, Pål; Gemünd, Christine; Chabanis-Davidson, Sophie; Michael, Sushama; Sayadi, Ahmed; Bryne, Jan Christian; Chica, Claudia; Seiler, Markus; Davey, Norman E.; Haslam, Niall; Weatheritt, Robert J.; Budd, Aidan; Hughes, Tim; Paś, Jakub; Rychlewski, Leszek; Travé, Gilles; Aasland, Rein; Helmer-Citterich, Manuela; Linding, Rune; Gibson, Toby J.

2010-01-01

Linear motifs are short segments of multidomain proteins that provide regulatory functions independently of protein tertiary structure. Much of intracellular signalling passes through protein modifications at linear motifs. Many thousands of linear motif instances, most notably phosphorylation sites, have now been reported. Although clearly very abundant, linear motifs are difficult to predict de novo in protein sequences due to the difficulty of obtaining robust statistical assessments. The ELM resource at http://elm.eu.org/ provides an expanding knowledge base, currently covering 146 known motifs, with annotation that includes >1300 experimentally reported instances. ELM is also an exploratory tool for suggesting new candidates of known linear motifs in proteins of interest. Information about protein domains, protein structure and native disorder, cellular and taxonomic contexts is used to reduce or deprecate false positive matches. Results are graphically displayed in a ‘Bar Code’ format, which also displays known instances from homologous proteins through a novel ‘Instance Mapper’ protocol based on PHI-BLAST. ELM server output provides links to the ELM annotation as well as to a number of remote resources. Using the links, researchers can explore the motifs, proteins, complex structures and associated literature to evaluate whether candidate motifs might be worth experimental investigation. PMID:19920119
Identification, molecular and functional characterization of calmodulin gene of Phytomonas serpens 15T that shares high similarity with its pathogenic counterparts Trypanosoma cruzi.

PubMed

de Souza, Tatiana de Arruda Campos Brasil; Graça-de Souza, Viviane Krominski; Lancheros, César Armando Contreras; Monteiro-Góes, Viviane; Krieger, Marco Aurélio; Goldenberg, Samuel; Yamauchi, Lucy Megumi; Yamada-Ogatta, Sueli Fumie

2011-03-01

In trypanosomatids, Ca²+-binding proteins can affect parasite growth, differentiation and invasion. Due to their importance for parasite maintenance, they become an attractive target for drug discovery and design. Phytomonas serpens 15T is a non-human pathogenic trypanosomatid that expresses important protein homologs of human pathogenic trypanosomatids. In this study, the coding sequence of calmodulin, a Ca²+-binding protein, of P. serpens 15T was cloned and characterized. The encoded polypeptide (CaMP) displayed high amino acid identity to homolog protein of Trypanosoma cruzi and four helix-loop-helix motifs were found. CaMP sequence analysis showed 20 amino acid substitutions compared to its mammalian counterparts. This gene is located on a chromosomal band with estimated size of 1,300 kb and two transcripts were detected by Northern blot analysis. A polyclonal antiserum raised against the recombinant protein recognized a polypeptide with an estimated size of 17 kDa in log-phase promastigote extracts. The recombinant CaMP retains its Ca²+-binding capacity.
Intraspecies and interspecies transmission of mink H9N2 influenza virus.

PubMed

Yong-Feng, Zhao; Fei-Fei, Diao; Jia-Yu, Yu; Feng-Xia, Zhang; Chang-Qing, Jiang; Jian-Li, Wang; Shou-Yu, Guo; Kai, Cui; Chuan-Yi, Liu; Xue-Hua, Wei; Jiang, Shi-Jin; Zhi-Jing, Xie

2017-08-07

H9N2 influenza A virus (IAV) causes low pathogenic respiratory disease and infects a wide range of hosts. In this study, six IAVs were isolated from mink and identified as H9N2 IAV. Sequence analysis revealed that the six isolates continued to evolve, and their PB2 genes shared high nucleotide sequence identity with H7N9 IAV. The six isolates contained an amino acid motif PSRSSR↓GL at the hemagglutinin cleavage site, which is a characteristic of low pathogenic influenza viruses. A serosurvey demonstrated that H9N2 IAV had spread widely in mink and was prevalent in foxes and raccoon dogs. Transmission experiments showed that close contact between H9N2-infected mink and naive mink, foxes and raccoon dogs resulted in spread of the virus to the contact animals. Furthermore, H9N2 challenge experiments in foxes and raccoon dogs showed that H9N2 IAV could infect these hosts. Virological and epidemiological surveillance of H9N2 IAV should be strengthened for the fur animal industry.
Generation and reactivation of T-cell receptor A joining region pseudogenes in primates

DOE Office of Scientific and Technical Information (OSTI.GOV)

Thiel, C.; Lanchbury, J.S.; Otting, N.

1996-06-01

Tandemly duplicated T-cell receptor (Tcr) AJ (J{alpha}) segments contribute significantly to TCRA chain junctional region diversity in mammals. Since only limited data exists on TCRA diversity in nonhuman primates, we examined the TCRAJ regions of 37 chimpanzee and 71 rhesus macaque TCRA cDNA clones derived from inverse polymerase chain reaction on peripheral blood mononuclear cell cDNA of healthy animals. Twenty-five different TCRAJ regions were characterized in the chimpanzee and 36 in the rhesus macaque. Each bears a close structural relationship to an equivalent human TCRAJ region. Conserved amino acid motifs are shared between all three species. There are indications thatmore » differences between nonhuman primates and humans exist in the generation of TCRAJ pseudogenes. The nucleotide and amino acid sequences of the various characterized TCRAJ of each species are reported and we compare our results to the available information on human genomic sequences. Although we provide evidence of dynamic processes modifying TCRAJ segments during primate evolution, their repertoire and primary structure appears to be relatively conserved. 21 refs., 2 figs.« less
Structural Basis for Sequence-specific DNA Recognition by an Arabidopsis WRKY Transcription Factor*

PubMed Central

Yamasaki, Kazuhiko; Kigawa, Takanori; Watanabe, Satoru; Inoue, Makoto; Yamasaki, Tomoko; Seki, Motoaki; Shinozaki, Kazuo; Yokoyama, Shigeyuki

2012-01-01

The WRKY family transcription factors regulate plant-specific reactions that are mostly related to biotic and abiotic stresses. They share the WRKY domain, which recognizes a DNA element (TTGAC(C/T)) termed the W-box, in target genes. Here, we determined the solution structure of the C-terminal WRKY domain of Arabidopsis WRKY4 in complex with the W-box DNA by NMR. A four-stranded β-sheet enters the major groove of DNA in an atypical mode termed the β-wedge, where the sheet is nearly perpendicular to the DNA helical axis. Residues in the conserved WRKYGQK motif contact DNA bases mainly through extensive apolar contacts with thymine methyl groups. The importance of these contacts was verified by substituting the relevant T bases with U and by surface plasmon resonance analyses of DNA binding. PMID:22219184
Analysis of SSR information in EST resources of sugarcane

USDA-ARS?s Scientific Manuscript database

Expressed sequence tags ( ESTs) offer the opportunity to exploit single, low -copy, conserved sequence motifs for the development of simple sequence repeats ( SSRs). The total of 262 113 ESTs of sugarcane (Saccharum officinarum) in the database of NCBI were downloaded and analyzed, which resulted in...

Mismatch and G-Stack Modulated Probe Signals on SNP Microarrays

PubMed Central

Binder, Hans; Fasold, Mario; Glomb, Torsten

2009-01-01

Background Single nucleotide polymorphism (SNP) arrays are important tools widely used for genotyping and copy number estimation. This technology utilizes the specific affinity of fragmented DNA for binding to surface-attached oligonucleotide DNA probes. We analyze the variability of the probe signals of Affymetrix GeneChip SNP arrays as a function of the probe sequence to identify relevant sequence motifs which potentially cause systematic biases of genotyping and copy number estimates. Methodology/Principal Findings The probe design of GeneChip SNP arrays enables us to disentangle different sources of intensity modulations such as the number of mismatches per duplex, matched and mismatched base pairings including nearest and next-nearest neighbors and their position along the probe sequence. The effect of probe sequence was estimated in terms of triple-motifs with central matches and mismatches which include all 256 combinations of possible base pairings. The probe/target interactions on the chip can be decomposed into nearest neighbor contributions which correlate well with free energy terms of DNA/DNA-interactions in solution. The effect of mismatches is about twice as large as that of canonical pairings. Runs of guanines (G) and the particular type of mismatched pairings formed in cross-allelic probe/target duplexes constitute sources of systematic biases of the probe signals with consequences for genotyping and copy number estimates. The poly-G effect seems to be related to the crowded arrangement of probes which facilitates complex formation of neighboring probes with at minimum three adjacent G's in their sequence. Conclusions The applied method of “triple-averaging” represents a model-free approach to estimate the mean intensity contributions of different sequence motifs which can be applied in calibration algorithms to correct signal values for sequence effects. Rules for appropriate sequence corrections are suggested. PMID:19924253
Design of character-based DNA barcode motif for species identification: A computational approach and its validation in fishes.

PubMed

Chakraborty, Mohua; Dhar, Bishal; Ghosh, Sankar Kumar

2017-11-01

The DNA barcodes are generally interpreted using distance-based and character-based methods. The former uses clustering of comparable groups, based on the relative genetic distance, while the latter is based on the presence or absence of discrete nucleotide substitutions. The distance-based approach has a limitation in defining a universal species boundary across the taxa as the rate of mtDNA evolution is not constant throughout the taxa. However, character-based approach more accurately defines this using a unique set of nucleotide characters. The character-based analysis of full-length barcode has some inherent limitations, like sequencing of the full-length barcode, use of a sparse-data matrix and lack of a uniform diagnostic position for each group. A short continuous stretch of a fragment can be used to resolve the limitations. Here, we observe that a 154-bp fragment, from the transversion-rich domain of 1367 COI barcode sequences can successfully delimit species in the three most diverse orders of freshwater fishes. This fragment is used to design species-specific barcode motifs for 109 species by the character-based method, which successfully identifies the correct species using a pattern-matching program. The motifs also correctly identify geographically isolated population of the Cypriniformes species. Further, this region is validated as a species-specific mini-barcode for freshwater fishes by successful PCR amplification and sequencing of the motif (154 bp) using the designed primers. We anticipate that use of such motifs will enhance the diagnostic power of DNA barcode, and the mini-barcode approach will greatly benefit the field-based system of rapid species identification. © 2017 John Wiley & Sons Ltd.
RGAugury: a pipeline for genome-wide prediction of resistance gene analogs (RGAs) in plants.

PubMed

Li, Pingchuan; Quan, Xiande; Jia, Gaofeng; Xiao, Jin; Cloutier, Sylvie; You, Frank M

2016-11-02

Resistance gene analogs (RGAs), such as NBS-encoding proteins, receptor-like protein kinases (RLKs) and receptor-like proteins (RLPs), are potential R-genes that contain specific conserved domains and motifs. Thus, RGAs can be predicted based on their conserved structural features using bioinformatics tools. Computer programs have been developed for the identification of individual domains and motifs from the protein sequences of RGAs but none offer a systematic assessment of the different types of RGAs. A user-friendly and efficient pipeline is needed for large-scale genome-wide RGA predictions of the growing number of sequenced plant genomes. An integrative pipeline, named RGAugury, was developed to automate RGA prediction. The pipeline first identifies RGA-related protein domains and motifs, namely nucleotide binding site (NB-ARC), leucine rich repeat (LRR), transmembrane (TM), serine/threonine and tyrosine kinase (STTK), lysin motif (LysM), coiled-coil (CC) and Toll/Interleukin-1 receptor (TIR). RGA candidates are identified and classified into four major families based on the presence of combinations of these RGA domains and motifs: NBS-encoding, TM-CC, and membrane associated RLP and RLK. All time-consuming analyses of the pipeline are paralleled to improve performance. The pipeline was evaluated using the well-annotated Arabidopsis genome. A total of 98.5, 85.2, and 100 % of the reported NBS-encoding genes, membrane associated RLPs and RLKs were validated, respectively. The pipeline was also successfully applied to predict RGAs for 50 sequenced plant genomes. A user-friendly web interface was implemented to ease command line operations, facilitate visualization and simplify result management for multiple datasets. RGAugury is an efficiently integrative bioinformatics tool for large scale genome-wide identification of RGAs. It is freely available at Bitbucket: https://bitbucket.org/yaanlpc/rgaugury .
An evolutionarily conserved motif in the TAB1 C-terminal region is necessary for interaction with and activation of TAK1 MAPKKK.

PubMed

Ono, K; Ohtomo, T; Sato, S; Sugamata, Y; Suzuki, M; Hisamoto, N; Ninomiya-Tsuji, J; Tsuchiya, M; Matsumoto, K

2001-06-29

TAK1, a member of the MAPKKK family, is involved in the intracellular signaling pathways mediated by transforming growth factor beta, interleukin 1, and Wnt. TAK1 kinase activity is specifically activated by the TAK1-binding protein TAB1. The C-terminal 68-amino acid sequence of TAB1 (TAB1-C68) is sufficient for TAK1 interaction and activation. Analysis of various truncated versions of TAB1-C68 defined a C-terminal 30-amino acid sequence (TAB1-C30) necessary for TAK1 binding and activation. NMR studies revealed that the TAB1-C30 region has a unique alpha-helical structure. We identified a conserved sequence motif, PYVDXA/TXF, in the C-terminal domain of mammalian TAB1, Xenopus TAB1, and its Caenorhabditis elegans homolog TAP-1, suggesting that this motif constitutes a specific TAK1 docking site. Alanine substitution mutagenesis showed that TAB1 Phe-484, located in the conserved motif, is crucial for TAK1 binding and activation. The C. elegans homolog of TAB1, TAP-1, was able to interact with and activate the C. elegans homolog of TAK1, MOM-4. However, the site in TAP-1 corresponding to Phe-484 of TAB1 is an alanine residue (Ala-364), and changing this residue to Phe abrogates the ability of TAP-1 to interact with and activate MOM-4. These results suggest that the Phe or Ala residue within the conserved motif of the TAB1-related proteins is important for interaction with and activation of specific TAK1 MAPKKK family members in vivo.
Amino acid sequence motifs essential for P0-mediated suppression of RNA silencing in an isolate of potato leafroll virus from Inner Mongolia.

PubMed

Zhuo, Tao; Li, Yuan-Yuan; Xiang, Hai-Ying; Wu, Zhan-Yu; Wang, Xian-Bin; Wang, Ying; Zhang, Yong-Liang; Li, Da-Wei; Yu, Jia-Lin; Han, Cheng-Gui

2014-06-01

Polerovirus P0 suppressors of host gene silencing contain a consensus F-box-like motif with Leu/Pro (L/P) requirements for suppressor activity. The Inner Mongolian Potato leafroll virus (PLRV) P0 protein (P0(PL-IM)) has an unusual F-box-like motif that contains a Trp/Gly (W/G) sequence and an additional GW/WG-like motif (G139/W140/G141) that is lacking in other P0 proteins. We used Agrobacterium infiltration-mediated RNA silencing assays to establish that P0(PL-IM) has a strong suppressor activity. Mutagenesis experiments demonstrated that the P0(PL-IM) F-box-like motif encompasses amino acids 76-LPRHLHYECLEWGLLCG THP-95, and that the suppressor activity is abolished by L76A, W87A, or G88A substitution. The suppressor activity is also weakened substantially by mutations within the G139/W140/G141 region and is eliminated by a mutation (F220R) in a C-terminal conserved sequence of P0(PL-IM). As has been observed with other P0 proteins, P0(PL-IM) suppression is correlated with reduced accumulation of the host AGO1-silencing complex protein. However, P0(PL-IM) fails to bind SKP1, which functions in a proteasome pathway that may be involved in AGO1 degradation. These results suggest that P0(PL-IM) may suppress RNA silencing by using an alternative pathway to target AGO1 for degradation. Our results help improve our understanding of the molecular mechanisms involved in PLRV infection.
STEME: A Robust, Accurate Motif Finder for Large Data Sets

PubMed Central

Reid, John E.; Wernisch, Lorenz

2014-01-01

Motif finding is a difficult problem that has been studied for over 20 years. Some older popular motif finders are not suitable for analysis of the large data sets generated by next-generation sequencing. We recently published an efficient approximation (STEME) to the EM algorithm that is at the core of many motif finders such as MEME. This approximation allows the EM algorithm to be applied to large data sets. In this work we describe several efficient extensions to STEME that are based on the MEME algorithm. Together with the original STEME EM approximation, these extensions make STEME a fully-fledged motif finder with similar properties to MEME. We discuss the difficulty of objectively comparing motif finders. We show that STEME performs comparably to existing prominent discriminative motif finders, DREME and Trawler, on 13 sets of transcription factor binding data in mouse ES cells. We demonstrate the ability of STEME to find long degenerate motifs which these discriminative motif finders do not find. As part of our method, we extend an earlier method due to Nagarajan et al. for the efficient calculation of motif E-values. STEME's source code is available under an open source license and STEME is available via a web interface. PMID:24625410
Molecular dynamics analysis of stabilities of the telomeric Watson-Crick duplex and the associated i-motif as a function of pH and temperature.

PubMed

Panczyk, Tomasz; Wolski, Pawel

2018-06-01

This work deals with a molecular dynamics analysis of the protonated and deprotonated states of the natural sequence d[(CCCTAA) 3 CCCT] of the telomeric DNA forming the intercalated i-motif or paired with the sequence d[(CCCTAA) 3 CCCT] and forming the Watson-Crick (WC) duplex. By utilizing the amber force field for nucleic acids we built the i-motif and the WC duplex either with native cytosines or using their protonated forms. We studied, by applying molecular dynamics simulations, the role of hydrogen bonds between cytosines or in cytosine-guanine pairs in the stabilization of both structures in the physiological fluid. We found that hydrogen bonds exist in the case of protonated i-motif and in the standard form of the WC duplex. They, however, vanish in the case of the deprotonated i-motif and protonated form of the WC duplex. By determining potentials of mean force in the enforced unwrapping of these structures we found that the protonated i-motif is thermodynamically the most stable. Its deprotonation leads to spontaneous and observed directly in the unbiased calculations unfolding of the i-motif to the hairpin structure at normal temperature. The WC duplex is stable in its standard form and its slight destabilization is observed at the acidic pH. However, the protonated WC duplex unwraps very slowly at 310 K and its decomposition was not observed in the unbiased calculations. At higher temperatures (ca. 400 K or more) the WC duplex unwraps spontaneously. Copyright © 2018. Published by Elsevier B.V.
Direct AUC optimization of regulatory motifs.

PubMed

Zhu, Lin; Zhang, Hong-Bo; Huang, De-Shuang

2017-07-15

The discovery of transcription factor binding site (TFBS) motifs is essential for untangling the complex mechanism of genetic variation under different developmental and environmental conditions. Among the huge amount of computational approaches for de novo identification of TFBS motifs, discriminative motif learning (DML) methods have been proven to be promising for harnessing the discovery power of accumulated huge amount of high-throughput binding data. However, they have to sacrifice accuracy for speed and could fail to fully utilize the information of the input sequences. We propose a novel algorithm called CDAUC for optimizing DML-learned motifs based on the area under the receiver-operating characteristic curve (AUC) criterion, which has been widely used in the literature to evaluate the significance of extracted motifs. We show that when the considered AUC loss function is optimized in a coordinate-wise manner, the cost function of each resultant sub-problem is a piece-wise constant function, whose optimal value can be found exactly and efficiently. Further, a key step of each iteration of CDAUC can be efficiently solved as a computational geometry problem. Experimental results on real world high-throughput datasets illustrate that CDAUC outperforms competing methods for refining DML motifs, while being one order of magnitude faster. Meanwhile, preliminary results also show that CDAUC may also be useful for improving the interpretability of convolutional kernels generated by the emerging deep learning approaches for predicting TF sequences specificities. CDAUC is available at: https://drive.google.com/drive/folders/0BxOW5MtIZbJjNFpCeHlBVWJHeW8 . dshuang@tongji.edu.cn. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Exploring the roles of DNA methylation in the metal-reducing bacterium Shewanella oneidensis MR-1

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bendall, Matthew L.; Luong, Khai; Wetmore, Kelly M.

2013-08-30

We performed whole genome analyses of DNA methylation in Shewanella 17 oneidensis MR-1 to examine its possible role in regulating gene expression and 18 other cellular processes. Single-Molecule Real Time (SMRT) sequencing 19 revealed extensive methylation of adenine (N6mA) throughout the 20 genome. These methylated bases were located in five sequence motifs, 21 including three novel targets for Type I restriction/modification enzymes. The 22 sequence motifs targeted by putative methyltranferases were determined via 23 SMRT sequencing of gene knockout mutants. In addition, we found S. 24 oneidensis MR-1 cultures grown under various culture conditions displayed 25 different DNA methylation patterns.more » However, the small number of differentially 26 methylated sites could not be directly linked to the much larger number of 27 differentially expressed genes in these conditions, suggesting DNA methylation is 28 not a major regulator of gene expression in S. oneidensis MR-1. The enrichment 29 of methylated GATC motifs in the origin of replication indicate DNA methylation 30 may regulate genome replication in a manner similar to that seen in Escherichia 31 coli. Furthermore, comparative analyses suggest that many 32 Gammaproteobacteria, including all members of the Shewanellaceae family, may 33 also utilize DNA methylation to regulate genome replication.« less
Identification, Characterization, and Expression of a Novel P450 Gene Encoding CYP6AE25 from the Asian Corn Borer, Ostrinia furnacalis

PubMed Central

Zhang, Yu-liang; Kulye, Mahesh; Yang, Feng-shan; Xiao, Luo; Zhang, Yi-tong; Zeng, Hongmei; Wang, Jian-hua; Liu, Zhi-xin

2011-01-01

An allele of the cytochrome P450 gene, CYP6AE14, named CYP6AE25 (GenBank accession no. EU807990) was isolated from the Asian com borer, Ostrinia fumacalis (Guenée) (Lepidoptera: Pyralidae) by RT-PCR. The cDNA sequence of CYP6AE25 is 2315 bp in length and contains a 1569 nucleotides open reading frame encoding a putative protein with 523 amino acid residues and a predicted molecular weight of 59.95 kDa and a theoretical pI of 8.31. The putative protein contains the classic heme-binding sequence motif F××G×××C×G (residues 451–460) conserved among all P450 enzymes as well as other characteristic motifs of all cytochrome P450s. It shares 52% identity with the previously published sequence of CYP6AE14 (GenBank accession no. DQ986461) from Helicoverpa armigera. Phylogenetic analysis of amino acid sequences from members of various P450 families indicated that CYP6AE25 has a closer phylogenetic relationship with CYP6AE14 and CYP6B1 that are related to metabolism of plant allelochemicals, CYP6D1 which is related to pyrethroid resistance and has a more distant relationship to CYP302A1 and CYP307A1 which are related to synthesis of the insect molting hormones. The expression level of the gene in the adults and immature stages of O. furnacalis by quantitative real-time PCR revealed that CYP6AE25 was expressed in all life stages investigated. The mRNA expression level in 3rd instar larvae was 12.8- and 2.97-fold higher than those in pupae and adults, respectively. The tissue specific expression level of CYP6AE25 was in the order of midgut, malpighian tube and fatty body from high to low but was absent in ovary and brain. The analysis of the CYP6AB25 gene using bioinformatic software is discussed. PMID:21529257
Methods for sequencing GC-rich and CCT repeat DNA templates

DOEpatents

Robinson, Donna L.

2007-02-20

The present invention is directed to a PCR-based method of cycle sequencing DNA and other polynucleotide sequences having high CG content and regions of high GC content, and includes for example DNA strands with a high Cytosine and/or Guanosine content and repeated motifs such as CCT repeats.
The Contribution of Short Repeats of Low Sequence Complexity to Large Conifer Genomes

Treesearch

A. Schmidt; R.L. Doudrick; J.S. Heslop-Harrison; T. Schmidt

2000-01-01

Abstract: The abundance and genomic organization of six simple sequence repeats, consisting of di-, tri-, and tetranucleotide sequence motifs, and a minisatellite repeat have been analyzed in different gymnosperms by Southern hybridization. Within the gymnosperm genomes investigated, the abundance and genomic organization of micro- and...
Mining and gene ontology based annotation of SSR markers from expressed sequence tags of Humulus lupulus

PubMed Central

Singh, Swati; Gupta, Sanchita; Mani, Ashutosh; Chaturvedi, Anoop

2012-01-01

Humulus lupulus is commonly known as hops, a member of the family moraceae. Currently many projects are underway leading to the accumulation of voluminous genomic and expressed sequence tag sequences in public databases. The genetically characterized domains in these databases are limited due to non-availability of reliable molecular markers. The large data of EST sequences are available in hops. The simple sequence repeat markers extracted from EST data are used as molecular markers for genetic characterization, in the present study. 25,495 EST sequences were examined and assembled to get full-length sequences. Maximum frequency distribution was shown by mononucleotide SSR motifs i.e. 60.44% in contig and 62.16% in singleton where as minimum frequency are observed for hexanucleotide SSR in contig (0.09%) and pentanucleotide SSR in singletons (0.12%). Maximum trinucleotide motifs code for Glutamic acid (GAA) while AT/TA were the most frequent repeat of dinucleotide SSRs. Flanking primer pairs were designed in-silico for the SSR containing sequences. Functional categorization of SSRs containing sequences was done through gene ontology terms like biological process, cellular component and molecular function. PMID:22368382
Novel arrangement and comparative analysis of hsp90 family genes in three thermotolerant species of Stratiomyidae (Diptera).

PubMed

Astakhova, L N; Zatsepina, O G; Przhiboro, A A; Evgen'ev, M B; Garbuz, D G

2013-06-01

The heat shock proteins belonging to the Hsp90 family (Hsp83 in Diptera) play a crucial role in the protection of cells due to their chaperoning functions. We sequenced hsp90 genes from three species of the family Stratiomyidae (Diptera) living in thermally different habitats and characterized by extraordinarily high thermotolerance. The sequence variation and structure of the hsp90 family genes were compared with previously described features of hsp70 copies isolated from the same species. Two functional hsp83 genes were found in the species studied, that are arranged in tandem orientation at least in one of them. This organization was not previously described. Stratiomyidae hsp83 genes share a high level of identity with hsp83 of Drosophila, and the deduced protein possesses five conserved amino acid sequence motifs characteristic of the Hsp90 family as well as the C-terminus MEEVD sequence characteristic of the cytosolic isoform. A comparison of the hsp83 promoters of two Stratiomyidae species from thermally contrasting habitats demonstrated that while both species contain canonical heat shock elements in the same position, only one of the species contains functional GAF-binding elements. Our data indicate that in the same species, hsp83 family genes show a higher evolution rate than the hsp70 family. © 2013 Royal Entomological Society.
Arabidopsis Polycomb Repressive Complex 2 binding sites contain putative GAGA factor binding motifs within coding regions of genes

PubMed Central

2013-01-01

Background Polycomb Repressive Complex 2 (PRC2) is an essential regulator of gene expression that maintains genes in a repressed state by marking chromatin with trimethylated Histone H3 lysine 27 (H3K27me3). In Arabidopsis, loss of PRC2 function leads to pleiotropic effects on growth and development thought to be due to ectopic expression of seed and embryo-specific genes. While there is some understanding of the mechanisms by which specific genes are targeted by PRC2 in animal systems, it is still not clear how PRC2 is recruited to specific regions of plant genomes. Results We used ChIP-seq to determine the genome-wide distribution of hemagglutinin (HA)-tagged FERTLIZATION INDEPENDENT ENDOSPERM (FIE-HA), the Extra Sex Combs homolog protein present in all Arabidopsis PRC2 complexes. We found that the FIE-HA binding sites co-locate with a subset of the H3K27me3 sites in the genome and that the associated genes were more likely to be de-repressed in mutants of PRC2 components. The FIE-HA binding sites are enriched for three sequence motifs including a putative GAGA factor binding site that is also found in Drosophila Polycomb Response Elements (PREs). Conclusions Our results suggest that PRC2 binding sites in plant genomes share some sequence features with Drosophila PREs. However, unlike Drosophila PREs which are located in promoters and devoid of H3K27me3, Arabidopsis FIE binding sites tend to be in gene coding regions and co-localize with H3K27me3. PMID:24001316
Identification and Characterization of the Novel LysM Domain-Containing Surface Protein Sep from Lactobacillus fermentum BR11 and Its Use as a Peptide Fusion Partner in Lactobacillus and Lactococcus

PubMed Central

Turner, Mark S.; Hafner, Louise M.; Walsh, Terry; Giffard, Philip M.

2004-01-01

Examination of supernatant fractions from broth cultures of Lactobacillus fermentum BR11 revealed the presence of a number of proteins, including a 27-kDa protein termed Sep. The amino-terminal sequence of Sep was determined, and the gene encoding it was cloned and sequenced. Sep is a 205-amino-acid protein and contains a 30-amino-acid secretion signal and has overall homology (between 39 and 92% identity) with similarly sized proteins of Lactobacillus reuteri, Enterococcus faecium, Streptococcus pneumoniae, Streptococcus agalactiae, and Lactobacillus plantarum. The carboxy-terminal 81 amino acids of Sep also have strong homology (86% identity) to the carboxy termini of the aggregation-promoting factor (APF) surface proteins of Lactobacillus gasseri and Lactobacillus johnsonii. The mature amino terminus of Sep contains a putative peptidoglycan-binding LysM domain, thereby making it distinct from APF proteins. We have identified a common motif within LysM domains that is shared with carbohydrate binding YG motifs which are found in streptococcal glucan-binding proteins and glucosyltransferases. Sep was investigated as a heterologous peptide expression vector in L. fermentum, Lactobacillus rhamnosus GG and Lactococcus lactis MG1363. Modified Sep containing an amino-terminal six-histidine epitope was found associated with the cells but was largely present in the supernatant in the L. fermentum, L. rhamnosus, and L. lactis hosts. Sep as well as the previously described surface protein BspA were used to express and secrete in L. fermentum or L. rhamnosus a fragment of human E-cadherin, which contains the receptor region for Listeria monocytogenes. This study demonstrates that Sep has potential for heterologous protein expression and export in lactic acid bacteria. PMID:15184172
ZP Domain Proteins in the Abalone Egg Coat Include a Paralog of VERL under Positive Selection That Binds Lysin and 18-kDa Sperm Proteins

PubMed Central

Aagaard, Jan E.; Vacquier, Victor D.; MacCoss, Michael J.; Swanson, Willie J.

2010-01-01

Identifying fertilization molecules is key to our understanding of reproductive biology, yet only a few examples of interacting sperm and egg proteins are known. One of the best characterized comes from the invertebrate archeogastropod abalone (Haliotis spp.), where sperm lysin mediates passage through the protective egg vitelline envelope (VE) by binding to the VE protein vitelline envelope receptor for lysin (VERL). Rapid adaptive divergence of abalone lysin and VERL are an example of positive selection on interacting fertilization proteins contributing to reproductive isolation. Previously, we characterized a subset of the abalone VE proteins that share a structural feature, the zona pellucida (ZP) domain, which is common to VERL and the egg envelopes of vertebrates. Here, we use additional expressed sequence tag sequencing and shotgun proteomics to characterize this family of proteins in the abalone egg VE. We expand 3-fold the number of known ZP domain proteins present within the VE (now 30 in total) and identify a paralog of VERL (vitelline envelope zona pellucida domain protein [VEZP] 14) that contains a putative lysin-binding motif. We find that, like VERL, the divergence of VEZP14 among abalone species is driven by positive selection on the lysin-binding motif alone and that these paralogous egg VE proteins bind a similar set of sperm proteins including a rapidly evolving 18-kDa paralog of lysin, which may mediate sperm–egg fusion. This work identifies an egg coat paralog of VERL under positive selection and the candidate sperm proteins with which it may interact during abalone fertilization. PMID:19767347
Sel1-like repeat proteins in signal transduction.

PubMed

Mittl, Peer R E; Schneider-Brachert, Wulf

2007-01-01

Solenoid proteins, which are distinguished from general globular proteins by their modular architectures, are frequently involved in signal transduction pathways. Proteins from the tetratricopeptide repeat (TPR) and Sel1-like repeat (SLR) families share similar alpha-helical conformations but different consensus sequence lengths and superhelical topologies. Both families are characterized by low sequence similarity levels, rendering the identification of functional homologous difficult. Therefore current knowledge of the molecular and cellular functions of the SLR proteins Sel1, Hrd3, Chs4, Nif1, PodJ, ExoR, AlgK, HcpA, Hsp12, EnhC, LpnE, MotX, and MerG has been reviewed. Although SLR proteins possess different cellular functions they all seem to serve as adaptor proteins for the assembly of macromolecular complexes. Sel1, Hrd3, Hsp12 and LpnE are activated under cellular stress. The eukaryotic Sel1 and Hrd3 proteins are involved in the ER-associated protein degradation, whereas the bacterial LpnE, EnhC, HcpA, ExoR, and AlgK proteins mediate the interactions between bacterial and eukaryotic host cells. LpnE and EnhC are responsible for the entry of L. pneumophila into epithelial cells and macrophages. ExoR from the symbiotic microorganism S. melioti and AlgK from the pathogen P. aeruginosa regulate exopolysaccaride synthesis. Nif1 and Chs4 from yeast are responsible for the regulation of mitosis and septum formation during cell division, respectively, and PodJ guides the cellular differentiation during the cell cycle of the bacterium C. crescentus. Taken together the SLR motif establishes a link between signal transduction pathways from eukaryotes and bacteria. The SLR motif is so far absent from archaea. Therefore the SLR could have developed in the last common ancestor between eukaryotes and bacteria.
Direct enzyme assay evidence confirms aldehyde reductase function of Ydr541cp and Ygl039wp from Saccharomyces cerevisiae.

PubMed

Moon, Jaewoong; Liu, Z Lewis

2015-04-01

The aldehyde reductase gene ARI1 is a recently characterized member of an intermediate subfamily within the short-chain dehydrogenase/reductase (SDR) superfamily that clarified mechanisms of in situ detoxification of 2-furaldehyde and 5-hydroxymethyl-2-furaldehyde by Saccharomyces cerevisiae. Uncharacterized open reading frames (ORFs) are common among tolerant candidate genes identified for lignocellulose-to-advanced biofuels conversion. This study presents partially purified proteins of two ORFs, YDR541C and YGL039W, and direct enzyme assay evidence against aldehyde-inhibitory compounds commonly encountered during lignocellulosic biomass fermentation processes. Each of the partially purified proteins encoded by these ORFs showed a molecular mass of approximately 38 kDa, similar to Ari1p, a protein encoded by aldehyde reductase gene. Both proteins demonstrated strong aldehyde reduction activities toward 14 aldehyde substrates, with high levels of reduction activity for Ydr541cp toward both aromatic and aliphatic aldehydes. While Ydr541cp was observed to have a significantly higher specific enzyme activity at 20 U/mg using co-factor NADPH, Ygl039wp displayed a NADH preference at 25 U/mg in reduction of butylaldehyde. Amino acid sequence analysis identified a characteristic catalytic triad, Ser, Tyr and Lys; a conserved catalytic motif of Tyr-X-X-X-Lys; and a cofactor-binding sequence motif, Gly-X-X-Gly-X-X-Ala, near the N-terminus that are shared by Ydr541cp, Ygl039wp, Yol151wp/GRE2 and Ari1p. Findings of aldehyde reductase genes contribute to the yeast gene annotation and aids development of the next-generation biocatalyst for advanced biofuels production. Copyright © 2015 John Wiley & Sons, Ltd.
The Encapsidated Genome of Microplitis demolitor Bracovirus Integrates into the Host Pseudoplusia includens ▿ ‡

PubMed Central

Beck, Markus H.; Zhang, Shu; Bitra, Kavita; Burke, Gaelen R.; Strand, Michael R.

2011-01-01

Polydnaviruses (PDVs) are symbionts of parasitoid wasps that function as gene delivery vehicles in the insects (hosts) that the wasps parasitize. PDVs persist in wasps as integrated proviruses but are packaged as circularized and segmented double-stranded DNAs into the virions that wasps inject into hosts. In contrast, little is known about how PDV genomic DNAs persist in host cells. Microplitis demolitor carries Microplitis demolitor bracovirus (MdBV) and parasitizes the host Pseudoplusia includens. MdBV infects primarily host hemocytes and also infects a hemocyte-derived cell line from P. includens called CiE1 cells. Here we report that all 15 genomic segments of the MdBV encapsidated genome exhibited long-term persistence in CiE1 cells. Most MdBV genes expressed in hemocytes were persistently expressed in CiE1 cells, including members of the glc gene family whose products transformed CiE1 cells into a suspension culture. PCR-based integration assays combined with cloning and sequencing of host-virus junctions confirmed that genomic segments J and C persisted in CiE1 cells by integration. These genomic DNAs also rapidly integrated into parasitized P. includens. Sequence analysis of wasp-viral junction clones showed that the integration of proviral segments in M. demolitor was associated with a wasp excision/integration motif (WIM) known from other bracoviruses. However, integration into host cells occurred in association with a previously unknown domain that we named the host integration motif (HIM). The presence of HIMs in most MdBV genomic DNAs suggests that the integration of each genomic segment into host cells occurs through a shared mechanism. PMID:21880747

Massive GGAAs in genomic repetitive sequences serve as a nuclear reservoir of NF-κB.

PubMed

Wu, Jian; Wang, Qiao; Dai, Wei; Wang, Wei; Yue, Ming; Wang, Jinke

2018-04-13

Nuclear factor κB (NF-κB) is a DNA-binding transcription factor. Characterizing its genomic binding sites is crucial for understanding its gene regulatory function and mechanism in cells. This study characterized the binding sites of NF-κB RelA/p65 in the tumor neurosis factor-α (TNFα) stimulated HeLa cells by a precise chromatin immunoprecipitation-sequencing (ChIP-seq). The results revealed that NF-κB binds nontraditional motifs (nt-motifs) containing conserved GGAA quadruplet. Moreover, nt-motifs mainly distribute in the peaks nearby centromeres that contain a larger number of repetitive elements such as satellite, simple repeats and short interspersed nuclear elements (SINEs). This intracellular binding pattern was then confirmed by the in vitro detection, indicating that NF-κB dimers can bind the nontraditional κB (nt-κB) sites with low affinity. However, this binding hardly activates transcription. This study thus deduced that NF-κB binding nt-motifs may realize functions other than gene regulation as NF-κB binding traditional motifs (t-motifs). To testify the deduction, many ChIP-seq data of other cell lines were then analyzed. The results indicate that NF-κB binding nt-motifs is also widely present in other cells. The ChIP-seq data analysis also revealed that nt-motifs more widely distribute in the peaks with low-fold enrichment. Importantly, it was also found that NF-κB binding nt-motifs is mainly present in the resting cells, whereas NF-κB binding t-motifs is mainly present in the stimulated cells. Astonishingly, no known function was enriched by the gene annotation of nt-motif peaks. Based on these results, this study proposed that the nt-κB sites that extensively distribute in larger numbers of repeat elements function as a nuclear reservoir of NF-κB. The nuclear NF-κB proteins stored at nt-κB sites in the resting cells may be recruited to the t-κB sites for regulating its target genes upon stimulation. Copyright © 2018 Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, and Genetics Society of China. Published by Elsevier Ltd. All rights reserved.
Evidence for a Structural Motif in Toxins and Interleukin-2 That May Be Responsible for Binding to Endothelial Cells and Initiating Vascular Leak Syndrome

NASA Astrophysics Data System (ADS)

Baluna, Roxana; Rizo, Josep; Gordon, Brian E.; Ghetie, Victor; Vitetta, Ellen S.

1999-03-01

The dose-limiting toxicity of interleukin-2 (IL-2) and immunotoxin (IT) therapy in humans is vascular leak syndrome (VLS). VLS has a complex etiology involving damage to vascular endothelial cells (ECs), extravasation of fluids and proteins, interstitial edema, and organ failure. IL-2 and ITs prepared with the catalytic A chain of the plant toxin, ricin (RTA), and other toxins, damage human ECs in vitro and in vivo. Damage to ECs may initiate VLS; if this damage could be avoided without losing the efficacy of ITs or IL-2, larger doses could be administered. In this paper, we provide evidence that a three amino acid sequence motif, (x)D(y), in toxins and IL-2 damages ECs. Thus, when peptides from RTA or IL-2 containing this sequence motif are coupled to mouse IgG, they bind to and damage ECs both in vitro and, in the case of RTA, in vivo. In contrast, the same peptides with a deleted or mutated sequence do not. Furthermore, the peptide from RTA attached to mouse IgG can block the binding of intact RTA to ECs in vitro and vice versa. In addition, RTA, a fragment of Pseudomonas exotoxin A (PE38-lys), and fibronectin also block the binding of the mouse IgG-RTA peptide to ECs, suggesting that an (x)D(y) motif is exposed on all three molecules. Our results suggest that deletions or mutations in this sequence or the use of nondamaging blocking peptides may increase the therapeutic index of both IL-2, as well as ITs prepared with a variety of plant or bacterial toxins.
Degenerate RNA packaging signals in the genome of Satellite Tobacco Necrosis Virus: implications for the assembly of a T=1 capsid.

PubMed

Bunka, David H J; Lane, Stephen W; Lane, Claire L; Dykeman, Eric C; Ford, Robert J; Barker, Amy M; Twarock, Reidun; Phillips, Simon E V; Stockley, Peter G

2011-10-14

Using a recombinant, T=1 Satellite Tobacco Necrosis Virus (STNV)-like particle expressed in Escherichia coli, we have established conditions for in vitro disassembly and reassembly of the viral capsid. In vivo assembly is dependent on the presence of the coat protein (CP) N-terminal region, and in vitro assembly requires RNA. Using immobilised CP monomers under reassembly conditions with "free" CP subunits, we have prepared a range of partially assembled CP species for RNA aptamer selection. SELEX directed against the RNA-binding face of the STNV CP resulted in the isolation of several clones, one of which (B3) matches the STNV-1 genome in 16 out of 25 nucleotide positions, including across a statistically significant 10/10 stretch. This 10-base region folds into a stem-loop displaying the motif ACAA and has been shown to bind to STNV CP. Analysis of the other aptamer sequences reveals that the majority can be folded into stem-loops displaying versions of this motif. Using a sequence and secondary structure search motif to analyse the genomic sequence of STNV-1, we identified 30 stem-loops displaying the sequence motif AxxA. The implication is that there are many stem-loops in the genome carrying essential recognition features for binding STNV CP. Secondary structure predictions of the genomic RNA using Mfold showed that only 8 out of 30 of these stem-loops would be formed in the lowest-energy structure. These results are consistent with an assembly mechanism based on kinetically driven folding of the RNA. Copyright © 2011 Elsevier Ltd. All rights reserved.
Regulation of the Osem gene by abscisic acid and the transcriptional activator VP1: analysis of cis-acting promoter elements required for regulation by abscisic acid and VP1.

PubMed

Hattori, T; Terada, T; Hamasuna, S

1995-06-01

Osem, a rice gene homologous to the wheat Em gene, which encodes one of the late-embryogenesis abundant proteins was isolated. The gene was characterized with respect to control of transcription by abscisic acid (ABA) and the transcriptional activator VP1, which is involved in the ABA-regulated gene expression during late embryo-genesis. A fusion gene (Osem-GUS) consisting of the Osem promoter and the bacterial beta-glucuronidase (GUS) gene was constructed and tested in a transient expression system, using protoplasts derived from a suspension-cultured line of rice cells, for activation by ABA and by co-transfection with an expression vector (35S-Osvp1) for the rice VP1 (OSVP1) cDNA. The expression of Osem-GUS was strongly (40- to 150-fold) activated by externally applied ABA and by over-expression of (OS)VP1. The Osem promoter has three ACGTG-containing sequences, motif A, motif B and motif A', which resemble the abscisic acid-responsive element (ABRE) that was previously identified in the wheat Em and the rice Rab16. There is also a CATGCATG sequence, which is known as the Sph box and is shown to be essential for the regulation by VP1 of the maize anthocyanin regulatory gene C1. Focusing on these sequence elements, various mutant derivatives of the Osem promoter in the transient expression system were assayed. The analysis revealed that motif A functions not only as an ABRE but also as a sequence element required for the regulation by (OS)VP1.
Complexity in the cattle CD94/NKG2 gene families.

PubMed

Birch, James; Ellis, Shirley A

2007-04-01

Natural killer cell responses are controlled to a large extent by the interaction of an array of inhibitory and activating receptors with their ligands. The mostly nonpolymorphic CD94/NKG2 receptors in both humans and mice were shown to recognize a single nonclassical MHC class I molecule in each case. In this paper, we describe the CD94/NKG2 gene family in cattle. NKG2 and CD94 sequences were amplified from cDNA derived from four animals. Four CD94 sequences, ten NKG2A, and three NKG2C sequences were identified in total. In contrast to human, we show that cattle have multiple distinct NKG2A genes, some of which show minor allelic variation. All of the sequences designated NKG2A have two tyrosine-based inhibitory motifs in the cytoplasmic domain and one putative gene has, in addition, a charged residue in the transmembrane domain. NKG2C appears to be essentially monomorphic in cattle. All of the NKG2A sequences are similar apart from NKG2A-01, which, in contrast, shares the majority of its carbohydrate recognition domain with NKG2-C. Most of the genes appear to generate multiple alternatively spliced forms. These findings suggest that the CD94/NKG2A heterodimers in cattle, in contrast to other species, are binding several different ligands. Because NKG2C is not polymorphic, this raises questions as to the combined functional capacity of the CD94/NKG2 gene families in cattle.
Finding Hidden Location Patterns of Two Competitive Supermarkets in Thailand

NASA Astrophysics Data System (ADS)

Khumsri, Jinattaporn; Fujihara, Akihiro

There are two famous supermarkets in Thailand: Big C and Lotus. They are the highest competitive supermarkets whose hold the most market share by lots of promotions and also gather all convenience services including banking, restaurant, and others. In recent years, they gradually expand their stores and they take a similar strategy to determine where to locate a store. It is important for them to consider store allocation to obtain new customers efficiently. To consider this, we gather geographical locations of these supermarkets from Twitter using Twitter API. We gathered tweets having these supermarket names and geotags for seven months. To extract hidden location patterns from gathered data, we introduce location motif which is a directed subgraph whose edges are linked to every pair of the shortest-distance opponent node. We investigate every possible configuration of location motif when they have a small number of nodes and find that the configuration increases exponentially. We also visualize location motifs generated from gathered data on the map of Thailand and count the frequency of observed location motifs. As a result, we find that even if the possible location motifs exponentially increase as the number of nodes grows, limited location motifs can be observed. Using location motif, we successfully find an evidence of biased store allocation in reality.
The MARVEL transmembrane motif of occludin mediates oligomerization and targeting to the basolateral surface in epithelia.

PubMed

Yaffe, Yakey; Shepshelovitch, Jeanne; Nevo-Yassaf, Inbar; Yeheskel, Adva; Shmerling, Hedva; Kwiatek, Joanna M; Gaus, Katharina; Pasmanik-Chor, Metsada; Hirschberg, Koret

2012-08-01

Occludin (Ocln), a MARVEL-motif-containing protein, is found in all tight junctions. MARVEL motifs are comprised of four transmembrane helices associated with the localization to or formation of diverse membrane subdomains by interacting with the proximal lipid environment. The functions of the Ocln MARVEL motif are unknown. Bioinformatics sequence- and structure-based analyses demonstrated that the MARVEL domain of Ocln family proteins has distinct evolutionarily conserved sequence features that are consistent with its basolateral membrane localization. Live-cell microscopy, fluorescence resonance energy transfer (FRET) and bimolecular fluorescence complementation (BiFC) were used to analyze the intracellular distribution and self-association of fluorescent-protein-tagged full-length human Ocln or the Ocln MARVEL motif excluding the cytosolic C- and N-termini (amino acids 60-269, FP-MARVEL-Ocln). FP-MARVEL-Ocln efficiently arrived at the plasma membrane (PM) and was sorted to the basolateral PM in filter-grown polarized MDCK cells. A series of conserved aromatic amino acids within the MARVEL domain were found to be associated with Ocln dimerization using BiFC. FP-MARVEL-Ocln inhibited membrane pore growth during Triton-X-100-induced solubilization and was shown to increase the membrane-ordered state using Laurdan, a lipid dye. These data demonstrate that the Ocln MARVEL domain mediates self-association and correct sorting to the basolateral membrane.
Identification of sequence–structure RNA binding motifs for SELEX-derived aptamers

PubMed Central

Hoinka, Jan; Zotenko, Elena; Friedman, Adam; Sauna, Zuben E.; Przytycka, Teresa M.

2012-01-01

Motivation: Systematic Evolution of Ligands by EXponential Enrichment (SELEX) represents a state-of-the-art technology to isolate single-stranded (ribo)nucleic acid fragments, named aptamers, which bind to a molecule (or molecules) of interest via specific structural regions induced by their sequence-dependent fold. This powerful method has applications in designing protein inhibitors, molecular detection systems, therapeutic drugs and antibody replacement among others. However, full understanding and consequently optimal utilization of the process has lagged behind its wide application due to the lack of dedicated computational approaches. At the same time, the combination of SELEX with novel sequencing technologies is beginning to provide the data that will allow the examination of a variety of properties of the selection process. Results: To close this gap we developed, Aptamotif, a computational method for the identification of sequence–structure motifs in SELEX-derived aptamers. To increase the chances of identifying functional motifs, Aptamotif uses an ensemble-based approach. We validated the method using two published aptamer datasets containing experimentally determined motifs of increasing complexity. We were able to recreate the author's findings to a high degree, thus proving the capability of our approach to identify binding motifs in SELEX data. Additionally, using our new experimental dataset, we illustrate the application of Aptamotif to elucidate several properties of the selection process. Contact: przytyck@ncbi.nlm.nih.gov, Zuben.Sauna@fda.hhs.gov PMID:22689764
Placing a Disrupted Degradation Motif at the C Terminus of Proteasome Substrates Attenuates Degradation without Impairing Ubiquitylation*

PubMed Central

Alfassy, Omri S.; Cohen, Itamar; Reiss, Yuval; Tirosh, Boaz; Ravid, Tommer

2013-01-01

Protein elimination by the ubiquitin-proteasome system requires the presence of a cis-acting degradation signal. Efforts to discern degradation signals of misfolded proteasome substrates thus far revealed a general mechanism whereby the exposure of cryptic hydrophobic motifs provides a degradation determinant. We have previously characterized such a determinant, employing the yeast kinetochore protein Ndc10 as a model substrate. Ndc10 is essentially a stable protein that is rapidly degraded upon exposure of a hydrophobic motif located at the C-terminal region. The degradation motif comprises two distinct and essential elements: DegA, encompassing two amphipathic helices, and DegB, a hydrophobic sequence within the loosely structured C-terminal tail of Ndc10. Here we show that the hydrophobic nature of DegB is irrelevant for the ubiquitylation of substrates containing the Ndc10 degradation motif, but is essential for proteasomal degradation. Mutant DegB, in which the hydrophobic sequence was disrupted, acted as a dominant degradation inhibitory element when expressed at the C-terminal regions of ubiquitin-dependent and -independent substrates of the 26S proteasome. This mutant stabilized substrates in both yeast and mammalian cells, indicative of a modular recognition moiety. The dominant function of the mutant DegB provides a powerful experimental tool for evaluating the physiological implications of stabilization of specific proteasome substrates in intact cells and for studying the associated pathological effects. PMID:23519465
A molecule in teleost fish, related with human MHC-encoded G6F, has a cytoplasmic tail with ITAM and marks the surface of thrombocytes and in some fishes also of erythrocytes.

PubMed

Ohashi, Ken; Takizawa, Fumio; Tokumaru, Norihiro; Nakayasu, Chihaya; Toda, Hideaki; Fischer, Uwe; Moritomo, Tadaaki; Hashimoto, Keiichiro; Nakanishi, Teruyuki; Dijkstra, Johannes Martinus

2010-08-01

In teleost fish, a novel gene G6F-like was identified, encoding a type I transmembrane molecule with four extracellular Ig-like domains and a cytoplasmic tail with putative tyrosine phosphorylation motifs including YxN and an immunoreceptor tyrosine-based activation motif (ITAM). G6F-like maps to a teleost genomic region where stretches corresponding to human chromosomes 6p (with the MHC), 12p (with CD4 and LAG-3), and 19q are tightly linked. This genomic organization resembles the ancestral "Ur-MHC" proposed for the jawed vertebrate ancestor. The deduced G6F-like molecule shows sequence similarity with members of the CD4/LAG-3 family and with the human major histocompatibility complex-encoded thrombocyte marker G6F. Despite some differences in molecular organization, teleost G6F-like and tetrapod G6F seem orthologous as they map to similar genomic location, share typical motifs in transmembrane and cytoplasmic regions, and are both expressed by thrombocytes/platelets. In the crucian carps goldfish (Carassius auratus auratus) and ginbuna (Carassius auratus langsdorfii), G6F-like was found expressed not only by thrombocytes but also by erythrocytes, supporting that erythroid and thromboid cells in teleost fish form a hematopoietic lineage like they do in mammals. The ITAM-bearing of G6F-like suggests that the molecule plays an important role in cell activation, and G6F-like expression by erythrocytes suggests that these cells have functional overlap potential with thrombocytes.
Characterization of a unique motif in LIM mineralization protein-1 that interacts with jun activation-domain-binding protein 1.

PubMed

Sangadala, Sreedhara; Yoshioka, Katsuhito; Enyo, Yoshio; Liu, Yunshan; Titus, Louisa; Boden, Scott D

2014-01-01

Development and repair of the skeletal system and other organs are highly dependent on precise regulation of the bone morphogenetic protein (BMP) pathway. The use of BMPs clinically to induce bone formation has been limited in part by the requirement of much higher doses of recombinant proteins in primates than were needed in cell culture or rodents. Therefore, increasing cellular responsiveness to BMPs has become our focus. We determined that an osteogenic LIM mineralization protein, LMP-1 interacts with Smurf1 (Smad ubiquitin regulatory factor 1) and prevents ubiquitination of Smads resulting in potentiation of BMP activity. In the region of LMP-1 responsible for bone formation, there is a motif that directly interacts with the Smurf1 WW2 domain and thus effectively competes for binding with Smad1 and Smad5, key signaling proteins in the BMP pathway. Here we show that the same region also contains a motif that interacts with Jun activation-domain-binding protein 1 (Jab1) which targets a common Smad, Smad4, shared by both the BMP and transforming growth factor-β (TGF-β) pathways, for proteasomal degradation. Jab1 was first identified as a coactivator of the transcription factor c-Jun. Jab1 binds to Smad4, Smad5, and Smad7, key intracellular signaling molecules of the TGF-β superfamily, and causes ubiquitination and/or degradation of these Smads. We confirmed a direct interaction of Jab1 with LMP-1 using recombinantly expressed wild-type and mutant proteins in slot-blot-binding assays. We hypothesized that LMP-1 binding to Jab1 prevents the binding and subsequent degradation of these Smads causing increased accumulation of osteogenic Smads in cells. We identified a sequence motif in LMP-1 that was predicted to interact with Jab1 based on the MAME/MAST sequence analysis of several cellular signaling molecules that are known to interact with Jab-1. We further mutated the potential key interacting residues in LMP-1 and showed loss of binding to Jab1 in binding assays in vitro. The activities of various wild-type and mutant LMP-1 proteins were evaluated using a BMP-responsive luciferase reporter and alkaline phosphatase assay in mouse myoblastic cells that were differentiated toward the osteoblastic phenotype. Finally, to strengthen physiological relevance of LMP-1 and Jab1 interaction, we showed that overexpression of LMP-1 caused nuclear accumulation of Smad4 upon BMP treatment which is reflective of increased Smad signaling in cells.
Multiple Copies of a Simple MYB-Binding Site Confers Trans-regulation by Specific Flavonoid-Related R2R3 MYBs in Diverse Species.

PubMed

Brendolise, Cyril; Espley, Richard V; Lin-Wang, Kui; Laing, William; Peng, Yongyan; McGhie, Tony; Dejnoprat, Supinya; Tomes, Sumathi; Hellens, Roger P; Allan, Andrew C

2017-01-01

In apple, the MYB transcription factor MYB10 controls the accumulation of anthocyanins. MYB10 is able to auto-activate its expression by binding its own promoter at a specific motif, the R1 motif. In some apple accessions a natural mutation, termed R6, has more copies of this motif within the MYB10 promoter resulting in stronger auto-activation and elevated anthocyanins. Here we show that other anthocyanin-related MYBs selected from apple, pear, strawberry, petunia, kiwifruit and Arabidopsis are able to activate promoters containing the R6 motif. To examine the specificity of this motif, members of the R2R3 MYB family were screened against a promoter harboring the R6 mutation. Only MYBs from subgroups 5 and 6 activate expression by binding the R6 motif, with these MYBs sharing conserved residues in their R2R3 DNA binding domains. Insertion of the apple R6 motif into orthologous promoters of MYB10 in pear ( PcMYB10 ) and Arabidopsis ( AtMY75 ) elevated anthocyanin levels. Introduction of the R6 motif into the promoter region of an anthocyanin biosynthetic enzyme F3'5'H of kiwifruit imparts regulation by MYB10. This results in elevated levels of delphinidin in both tobacco and kiwifruit. Finally, an R6 motif inserted into the promoter the vitamin C biosynthesis gene GDP-L-Gal phosphorylase increases vitamin C content in a MYB10-dependent manner. This motif therefore provides a tool to re-engineer novel MYB-regulated responses in plants.
Identification and biochemical characterization of a GDSL-motif carboxylester hydrolase from Carica papaya latex.

PubMed

Abdelkafi, Slim; Ogata, Hiroyuki; Barouh, Nathalie; Fouquet, Benjamin; Lebrun, Régine; Pina, Michel; Scheirlinckx, Frantz; Villeneuve, Pierre; Carrière, Frédéric

2009-11-01

An esterase (CpEst) showing high specific activities on tributyrin and short chain vinyl esters was obtained from Carica papaya latex after an extraction step with zwitterionic detergent and sonication, followed by gel filtration chromatography. Although the protein could not be purified to complete homogeneity due to its presence in high molecular mass aggregates, a major protein band with an apparent molecular mass of 41 kDa was obtained by SDS-PAGE. This material was digested with trypsin and the amino acid sequences of the tryptic peptides were determined by LC/ESI/MS/MS. These sequences were used to identify a partial cDNA (679 bp) from expressed sequence tags (ESTs) of C. papaya. Based upon EST sequences, a full-length gene was identified in the genome of C. papaya, with an open reading frame of 1029 bp encoding a protein of 343 amino acid residues, with a theoretical molecular mass of 38 kDa. From sequence analysis, CpEst was identified as a GDSL-motif carboxylester hydrolase belonging to the SGNH protein family and four potential N-glycosylation sites were identified. The putative catalytic triad was localised (Ser(35)-Asp(307)-His(310)) with the nucleophile serine being part of the GDSL-motif. A 3D-model of CpEst was built from known X-ray structures and sequence alignments and the catalytic triad was found to be exposed at the surface of the molecule, thus confirming the results of CpEst inhibition by tetrahydrolipstatin suggesting a direct accessibility of the inhibitor to the active site.
Sequence and conformational preferences at termini of α-helices in membrane proteins: role of the helix environment.

PubMed

Shelar, Ashish; Bansal, Manju

2014-12-01

α-Helices are amongst the most common secondary structural elements seen in membrane proteins and are packed in the form of helix bundles. These α-helices encounter varying external environments (hydrophobic, hydrophilic) that may influence the sequence preferences at their N and C-termini. The role of the external environment in stabilization of the helix termini in membrane proteins is still unknown. Here we analyze α-helices in a high-resolution dataset of integral α-helical membrane proteins and establish that their sequence and conformational preferences differ from those in globular proteins. We specifically examine these preferences at the N and C-termini in helices initiating/terminating inside the membrane core as well as in linkers connecting these transmembrane helices. We find that the sequence preferences and structural motifs at capping (Ncap and Ccap) and near-helical (N' and C') positions are influenced by a combination of features including the membrane environment and the innate helix initiation and termination property of residues forming structural motifs. We also find that a large number of helix termini which do not form any particular capping motif are stabilized by formation of hydrogen bonds and hydrophobic interactions contributed from the neighboring helices in the membrane protein. We further validate the sequence preferences obtained from our analysis with data from an ultradeep sequencing study that identifies evolutionarily conserved amino acids in the rat neurotensin receptor. The results from our analysis provide insights for the secondary structure prediction, modeling and design of membrane proteins. © 2014 Wiley Periodicals, Inc.
MotifMark: Finding regulatory motifs in DNA sequences.

PubMed

Hassanzadeh, Hamid Reza; Kolhe, Pushkar; Isbell, Charles L; Wang, May D

2017-07-01

The interaction between proteins and DNA is a key driving force in a significant number of biological processes such as transcriptional regulation, repair, recombination, splicing, and DNA modification. The identification of DNA-binding sites and the specificity of target proteins in binding to these regions are two important steps in understanding the mechanisms of these biological activities. A number of high-throughput technologies have recently emerged that try to quantify the affinity between proteins and DNA motifs. Despite their success, these technologies have their own limitations and fall short in precise characterization of motifs, and as a result, require further downstream analysis to extract useful and interpretable information from a haystack of noisy and inaccurate data. Here we propose MotifMark, a new algorithm based on graph theory and machine learning, that can find binding sites on candidate probes and rank their specificity in regard to the underlying transcription factor. We developed a pipeline to analyze experimental data derived from compact universal protein binding microarrays and benchmarked it against two of the most accurate motif search methods. Our results indicate that MotifMark can be a viable alternative technique for prediction of motif from protein binding microarrays and possibly other related high-throughput techniques.
Deletion of transcription factor binding motifs using the CRISPR/spCas9 system in the β-globin LCR.

PubMed

Kim, Yea Woon; Kim, AeRi

2017-07-20

Transcription factors play roles in gene transcription through direct binding to their motifs in genome, and inhibiting this binding provides an effective strategy for studying their roles. Here we applied the CRISPR/spCas9 system to mutate the binding motifs of transcription factors. Binding motifs for erythroid specific transcription factors were mutated in the locus control region hypersensitive sites of the human β-globin locus. Guide RNAs targeting binding motifs were cloned into lentiviral CRISPR vector containing the spCas9 gene, and transduced into MEL/ch11 cells carrying a human chromosome 11. DNA mutations in clonal cells were initially screened by quantitative PCR in genomic DNA and then clarified by sequencing. Mutations in binding motifs reduced occupancy by transcription factors in a chromatin environment. Characterization of mutations revealed that the CRISPR/spCas9 system mainly induced deletions in short regions of <20 bp and preferentially deleted nucleotides around the fifth nucleotide upstream of Protospacer adjacent motifs. These results indicate that the CRISPR/Cas9 system is suitable for mutating the binding motifs of transcription factors, and, consequently, would contribute to elucidate the direct roles of transcription factors. ©2017 The Author(s).
An experimental and computational evolution-based method to study a mode of co-evolution of overlapping open reading frames in the AAV2 viral genome.

PubMed

Kawano, Yasuhiro; Neeley, Shane; Adachi, Kei; Nakai, Hiroyuki

2013-01-01

Overlapping open reading frames (ORFs) in viral genomes undergo co-evolution; however, how individual amino acids coded by overlapping ORFs are structurally, functionally, and co-evolutionarily constrained remains difficult to address by conventional homologous sequence alignment approaches. We report here a new experimental and computational evolution-based methodology to address this question and report its preliminary application to elucidating a mode of co-evolution of the frame-shifted overlapping ORFs in the adeno-associated virus (AAV) serotype 2 viral genome. These ORFs encode both capsid VP protein and non-structural assembly-activating protein (AAP). To show proof of principle of the new method, we focused on the evolutionarily conserved QVKEVTQ and KSKRSRR motifs, a pair of overlapping heptapeptides in VP and AAP, respectively. In the new method, we first identified a large number of capsid-forming VP3 mutants and functionally competent AAP mutants of these motifs from mutant libraries by experimental directed evolution under no co-evolutionary constraints. We used Illumina sequencing to obtain a large dataset and then statistically assessed the viability of VP and AAP heptapeptide mutants. The obtained heptapeptide information was then integrated into an evolutionary algorithm, with which VP and AAP were co-evolved from random or native nucleotide sequences in silico. As a result, we demonstrate that these two heptapeptide motifs could exhibit high degeneracy if coded by separate nucleotide sequences, and elucidate how overlap-evoked co-evolutionary constraints play a role in making the VP and AAP heptapeptide sequences into the present shape. Specifically, we demonstrate that two valine (V) residues and β-strand propensity in QVKEVTQ are structurally important, the strongly negative and hydrophilic nature of KSKRSRR is functionally important, and overlap-evoked co-evolution imposes strong constraints on serine (S) residues in KSKRSRR, despite high degeneracy of the motifs in the absence of co-evolutionary constraints.
Use of eluted peptide sequence data to identify the binding characteristics of peptides to the insulin-dependent diabetes susceptibility allele HLA-DQ8 (DQ 3.2).

PubMed

Godkin, A; Friede, T; Davenport, M; Stevanovic, S; Willis, A; Jewell, D; Hill, A; Rammensee, H G

1997-06-01

HLA-DQ8 (A1*0301, B1*0302) and -DQ2 (A1*0501, B1*0201) are both associated with diseases such as insulin-dependent diabetes mellitus and coeliac disease. We used the technique of pool sequencing to look at the requirements of peptides binding to HLA-DQ8, and combined these data with naturally sequenced ligands and in vitro binding assays to describe a novel motif for HLA-DQ8. The motif, which has the same basic format as many HLA-DR molecules, consists of four or five anchor regions, in the positions from the N-terminus of the binding core of n, n + 3, n + 5/6 and n + 8, i.e. P1, P4, P6/7 and P9. P1 and P9 require negative or polar residues, with mainly aliphatic residues at P4 and P6/7. The features of the HLA-DQ8 motif were then compared to a pool sequence of peptides eluted from HLA-DQ2. A consensus motif for the binding of a common peptide which may be involved in disease pathogenesis is described. Neither of the disease-associated alleles HLA-DQ2 and -DQ8 have Asp at position 57 of the beta-chain. This Asp, if present, may form a salt bridge with an Arg at position 79 of the alpha-chain and so alter the binding specificity of P9. HLA-DQ2 and -DQ8 both appear to prefer negatively charged amino acids at P9. In contrast, HLA-DQ7 (A1*0301, B1*0301), which is not associated with diabetes, has Asp at beta 57, allowing positively charged amino acids at P9. This analysis of the sequence features of DQ-binding peptides suggests molecular characteristics which may be useful to predict epitopes involved in disease pathogenesis.
Sequence analyses reveal that a TPR–DP module, surrounded by recombinable flanking introns, could be at the origin of eukaryotic Hop and Hip TPR–DP domains and prokaryotic GerD proteins

PubMed Central

Papandreou, Nikolaos; Chomilier, Jacques

2008-01-01

The co-chaperone Hop [heat shock protein (HSP) organising protein] is known to bind both Hsp70 and Hsp90. Hop comprises three repeats of a tetratricopeptide repeat (TPR) domain, each consisting of three TPR motifs. The first and last TPR domains are followed by a domain containing several dipeptide (DP) repeats called the DP domain. These analyses suggest that the hop genes result from successive recombination events of an ancestral TPR–DP module. From a hydrophobic cluster analysis of homologous Hop protein sequences derived from gene families, we can postulate that shifts in the open reading frames are at the origin of the present sequences. Moreover, these shifts can be related to the presence or absence of biological function. We propose to extend the family of Hop co-chaperons into the kingdom of bacteria, as several structurally related genes have been identified by hydrophobic cluster analysis. We also provide evidence of common structural characteristics between hop and hip genes, suggesting a shared precursor of ancestral TPR–DP domains. Electronic supplementary material The online version of this article (doi:10.1007/s12192-008-0083-8) contains supplementary material, which is available to authorized users. PMID:18987995
G-Boxes, Bigfoot Genes, and Environmental Response: Characterization of Intragenomic Conserved Noncoding Sequences in Arabidopsis[W

PubMed Central

Freeling, Michael; Rapaka, Lakshmi; Lyons, Eric; Pedersen, Brent; Thomas, Brian C.

2007-01-01

A tetraploidy left Arabidopsis thaliana with 6358 pairs of homoeologs that, when aligned, generated 14,944 intragenomic conserved noncoding sequences (CNSs). Our previous work assembled these phylogenetic footprints into a database. We show that known transcription factor (TF) binding motifs, including the G-box, are overrepresented in these CNSs. A total of 254 genes spanning long lengths of CNS-rich chromosomes (Bigfoot) dominate this database. Therefore, we made subdatabases: one containing Bigfoot genes and the other containing genes with three to five CNSs (Smallfoot). Bigfoot genes are generally TFs that respond to signals, with their modal CNS positioned 3.1 kb 5′ from the ATG. Smallfoot genes encode components of signal transduction machinery, the cytoskeleton, or involve transcription. We queried each subdatabase with each possible 7-nucleotide sequence. Among hundreds of hits, most were purified from CNSs, and almost all of those significantly enriched in CNSs had no experimental history. The 7-mers in CNSs are not 5′- to 3′-oriented in Bigfoot genes but are often oriented in Smallfoot genes. CNSs with one G-box tend to have two G-boxes. CNSs were shared with the homoeolog only and with no other gene, suggesting that binding site turnover impedes detection. Bigfoot genes may function in adaptation to environmental change. PMID:17496117

Some links on this page may take you to non-federal websites. Their policies may differ from this site.