Sample records for protein sequence embedding

  1. Embedding strategies for effective use of information from multiple sequence alignments.

    PubMed Central

    Henikoff, S.; Henikoff, J. G.

    1997-01-01

    We describe a new strategy for utilizing multiple sequence alignment information to detect distant relationships in searches of sequence databases. A single sequence representing a protein family is enriched by replacing conserved regions with position-specific scoring matrices (PSSMs) or consensus residues derived from multiple alignments of family members. In comprehensive tests of these and other family representations, PSSM-embedded queries produced the best results overall when used with a special version of the Smith-Waterman searching algorithm. Moreover, embedding consensus residues instead of PSSMs improved performance with readily available single sequence query searching programs, such as BLAST and FASTA. Embedding PSSMs or consensus residues into a representative sequence improves searching performance by extracting multiple alignment information from motif regions while retaining single sequence information where alignment is uncertain. PMID:9070452

  2. Structure-related statistical singularities along protein sequences: a correlation study.

    PubMed

    Colafranceschi, Mauro; Colosimo, Alfredo; Zbilut, Joseph P; Uversky, Vladimir N; Giuliani, Alessandro

    2005-01-01

    A data set composed of 1141 proteins representative of all eukaryotic protein sequences in the Swiss-Prot Protein Knowledge base was coded by seven physicochemical properties of amino acid residues. The resulting numerical profiles were submitted to correlation analysis after the application of a linear (simple mean) and a nonlinear (Recurrence Quantification Analysis, RQA) filter. The main RQA variables, Recurrence and Determinism, were subsequently analyzed by Principal Component Analysis. The RQA descriptors showed that (i) within protein sequences is embedded specific information neither present in the codes nor in the amino acid composition and (ii) the most sensitive code for detecting ordered recurrent (deterministic) patterns of residues in protein sequences is the Miyazawa-Jernigan hydrophobicity scale. The most deterministic proteins in terms of autocorrelation properties of primary structures were found (i) to be involved in protein-protein and protein-DNA interactions and (ii) to display a significantly higher proportion of structural disorder with respect to the average data set. A study of the scaling behavior of the average determinism with the setting parameters of RQA (embedding dimension and radius) allows for the identification of patterns of minimal length (six residues) as possible markers of zones specifically prone to inter- and intramolecular interactions.

  3. TryTransDB: A web-based resource for transport proteins in Trypanosomatidae.

    PubMed

    Sonar, Krushna; Kabra, Ritika; Singh, Shailza

    2018-03-12

    TryTransDB is a web-based resource that stores transport protein data which can be retrieved using a standalone BLAST tool. We have attempted to create an integrated database that can be a one-stop shop for the researchers working with transport proteins of Trypanosomatidae family. TryTransDB (Trypanosomatidae Transport Protein Database) is a web based comprehensive resource that can fire a BLAST search against most of the transport protein sequences (protein and nucleotide) from Trypanosomatidae family organisms. This web resource further allows to compute a phylogenetic tree by performing multiple sequence alignment (MSA) using CLUSTALW suite embedded in it. Also, cross-linking to other databases helps in gathering more information for a certain transport protein in a single website.

  4. HMG-D is an architecture-specific protein that preferentially binds to DNA containing the dinucleotide TG.

    PubMed Central

    Churchill, M E; Jones, D N; Glaser, T; Hefner, H; Searles, M A; Travers, A A

    1995-01-01

    The high mobility group (HMG) protein HMG-D from Drosophila melanogaster is a highly abundant chromosomal protein that is closely related to the vertebrate HMG domain proteins HMG1 and HMG2. In general, chromosomal HMG domain proteins lack sequence specificity. However, using both NMR spectroscopy and standard biochemical techniques we show that binding of HMG-D to a single DNA site is sequence selective. The preferred duplex DNA binding site comprises at least 5 bp and contains the deformable dinucleotide TG embedded in A/T-rich sequences. The TG motif constitutes a common core element in the binding sites of the well-characterized sequence-specific HMG domain proteins. We show that a conserved aromatic residue in helix 1 of the HMG domain may be involved in recognition of this core sequence. In common with other HMG domain proteins HMG-D binds preferentially to DNA sites that are stably bent and underwound, therefore HMG-D can be considered an architecture-specific protein. Finally, we show that HMG-D bends DNA and may confer a superhelical DNA conformation at a natural DNA binding site in the Drosophila fushi tarazu scaffold-associated region. Images PMID:7720717

  5. Patterns and plasticity in RNA-protein interactions enable recruitment of multiple proteins through a single site

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Valley, Cary T.; Porter, Douglas F.; Qiu, Chen

    2012-06-28

    mRNA control hinges on the specificity and affinity of proteins for their RNA binding sites. Regulatory proteins must bind their own sites and reject even closely related noncognate sites. In the PUF [Pumilio and fem-3 binding factor (FBF)] family of RNA binding proteins, individual proteins discriminate differences in the length and sequence of binding sites, allowing each PUF to bind a distinct battery of mRNAs. Here, we show that despite these differences, the pattern of RNA interactions is conserved among PUF proteins: the two ends of the PUF protein make critical contacts with the two ends of the RNA sites.more » Despite this conserved 'two-handed' pattern of recognition, the RNA sequence is flexible. Among the binding sites of yeast Puf4p, RNA sequence dictates the pattern in which RNA bases are flipped away from the binding surface of the protein. Small differences in RNA sequence allow new modes of control, recruiting Puf5p in addition to Puf4p to a single site. This embedded information adds a new layer of biological meaning to the connections between RNA targets and PUF proteins.« less

  6. Generation of the novel monoclonal antibody against TLS/EWS-CHOP chimeric oncoproteins that is applicable to one of the most sensitive assays for myxoid and round cell liposarcomas.

    PubMed

    Oikawa, Kosuke; Ishida, Tsuyoshi; Imamura, Tetsuo; Yoshida, Keiichi; Takanashi, Masakatsu; Hattori, Hiroyuki; Ishikawa, Akio; Fujita, Koji; Yamamoto, Kengo; Matsubayashi, Jun; Kuroda, Masahiko; Mukai, Kiyoshi

    2006-03-01

    The fusion oncoproteins, TLS-CHOP and EWS-CHOP, are characteristic markers for myxoid and round cell liposarcomas (MLS/RCLS). Especially, the peptide sequence of 26 amino acids corresponding to the normally untranslated CHOP exon 2 and parts of exon 3 (5'-UTR) is a unique structure for these chimeric proteins. In this report, we have generated monoclonal antibodies against the unique peptide sequence of TLS/EWS-CHOP oncoproteins. These antibodies reacted with TLS-CHOP fusion protein, but not reacted with normal TLS and CHOP proteins by Western blot analysis. In addition, one of the antibodies also recognized the chimeric oncoprotein in archival paraffin-embedded tissue samples of MLS/RCLS. The oncoprotein was detectable by the antibody even in the paraffin-embedded tissue samples whose mRNAs were too degraded to be detected by a nested reverse transcription-polymerase chain reaction-based assay. Thus, the molecular assay using the novel antibody is expected to be one of the most sensitive diagnostic assays for MLS/RCLS.

  7. Defining functional distance using manifold embeddings of gene ontology annotations

    PubMed Central

    Lerman, Gilad; Shakhnovich, Boris E.

    2007-01-01

    Although rigorous measures of similarity for sequence and structure are now well established, the problem of defining functional relationships has been particularly daunting. Here, we present several manifold embedding techniques to compute distances between Gene Ontology (GO) functional annotations and consequently estimate functional distances between protein domains. To evaluate accuracy, we correlate the functional distance to the well established measures of sequence, structural, and phylogenetic similarities. Finally, we show that manual classification of structures into folds and superfamilies is mirrored by proximity in the newly defined function space. We show how functional distances place structure–function relationships in biological context resulting in insight into divergent and convergent evolution. The methods and results in this paper can be readily generalized and applied to a wide array of biologically relevant investigations, such as accuracy of annotation transference, the relationship between sequence, structure, and function, or coherence of expression modules. PMID:17595300

  8. Cloning and characterization of a novel human STAR domain containing cDNA KHDRBS2.

    PubMed

    Wang, Liu; Xu, Jian; Zeng, Li; Ye, Xin; Wu, Qihan; Dai, Jianfeng; Ji, Chaoneng; Gu, Shaohua; Zhao, Chunhua; Xie, Yi; Mao, Yumin

    2002-12-01

    KHDRBS2, KH domain containing, RNA binding, signal transduction associated 2, is an RNA-binding protein that is tyrosine phosphorylated by Src during mitosis. It contains a KH domain,which is embedded in a larger conserved domain called the STAR domain. This protein has a 99% sequence identity with rat SLM-1 (the Sam68-like mammalian protein 1) and 98% sequence identity with mouse SLM-1 in its STAR domain. KHDRBS2 has the characteristic Sam68 SH2 and SH3 domain binding sites. RT-PCR analysis showed its transcript is ubiquitously expressed. The characterization of KHDRBS2 indicates it may link tyrosine kinase signaling cascades with some aspect of RNA metabolism.

  9. Codon Optimizing for Increased Membrane Protein Production: A Minimalist Approach.

    PubMed

    Mirzadeh, Kiavash; Toddo, Stephen; Nørholm, Morten H H; Daley, Daniel O

    2016-01-01

    Reengineering a gene with synonymous codons is a popular approach for increasing production levels of recombinant proteins. Here we present a minimalist alternative to this method, which samples synonymous codons only at the second and third positions rather than the entire coding sequence. As demonstrated with two membrane-embedded transporters in Escherichia coli, the method was more effective than optimizing the entire coding sequence. The method we present is PCR based and requires three simple steps: (1) the design of two PCR primers, one of which is degenerate; (2) the amplification of a mini-library by PCR; and (3) screening for high-expressing clones.

  10. Peptide library synthesis on spectrally encoded beads for multiplexed protein/peptide bioassays

    NASA Astrophysics Data System (ADS)

    Nguyen, Huy Q.; Brower, Kara; Harink, Björn; Baxter, Brian; Thorn, Kurt S.; Fordyce, Polly M.

    2017-02-01

    Protein-peptide interactions are essential for cellular responses. Despite their importance, these interactions remain largely uncharacterized due to experimental challenges associated with their measurement. Current techniques (e.g. surface plasmon resonance, fluorescence polarization, and isothermal calorimetry) either require large amounts of purified material or direct fluorescent labeling, making high-throughput measurements laborious and expensive. In this report, we present a new technology for measuring antibody-peptide interactions in vitro that leverages spectrally encoded beads for biological multiplexing. Specific peptide sequences are synthesized directly on encoded beads with a 1:1 relationship between peptide sequence and embedded code, thereby making it possible to track many peptide sequences throughout the course of an experiment within a single small volume. We demonstrate the potential of these bead-bound peptide libraries by: (1) creating a set of 46 peptides composed of 3 commonly used epitope tags (myc, FLAG, and HA) and single amino-acid scanning mutants; (2) incubating with a mixture of fluorescently-labeled antimyc, anti-FLAG, and anti-HA antibodies; and (3) imaging these bead-bound libraries to simultaneously identify the embedded spectral code (and thus the sequence of the associated peptide) and quantify the amount of each antibody bound. To our knowledge, these data demonstrate the first customized peptide library synthesized directly on spectrally encoded beads. While the implementation of the technology provided here is a high-affinity antibody/protein interaction with a small code space, we believe this platform can be broadly applicable to any range of peptide screening applications, with the capability to multiplex into libraries of hundreds to thousands of peptides in a single assay.

  11. A discontinuous hammerhead ribozyme embedded in a mammalian messenger RNA

    PubMed Central

    Martick, Monika; Horan, Lucas H.; Noller, Harry F.; Scott, William G.

    2008-01-01

    Structured RNAs embedded in the untranslated regions (UTRs) of messenger RNAs can regulate gene expression. In bacteria, control of a metabolite gene is mediated by the self-cleaving activity of a ribozyme embedded in its 5′ UTR1. This discovery has raised the question of whether gene-regulating ribozymes also exist in eukaryotic mRNAs. Here we show that highly active hammerhead ribozymes2,3 are present in the 3′ UTRs of rodent C-type lectin type II (Clec2) genes4–7. Using a hammerhead RNA motif search with relaxed delimitation of the non-conserved regions, we detected ribozyme sequences in which the invariant regions, in contrast to the previously identified continuous hammerheads8–10, occur as two fragments separated by hundreds of nucleotides. Notably, a fragment pair can assemble to form an active hammerhead ribozyme structure between the translation termination and the poly-adenylation signals within the 3′ UTR. We demonstrate that this hammerhead structure can self-cleave both in vitro and in vivo, and is able to reduce protein expression in mouse cells. These results indicate that an unrecognized mechanism of post-transcriptional gene regulation involving association of discontinuous ribozyme sequences within an mRNA may be modulating the expression of several CLEC2 proteins that function in bone remodelling and the immune response of several mammals. PMID:18615019

  12. High-Throughput Sequencing and Copy Number Variation Detection Using Formalin Fixed Embedded Tissue in Metastatic Gastric Cancer

    PubMed Central

    Hong, Min Eui; Do, In-Gu; Kang, So Young; Ha, Sang Yun; Kim, Seung Tae; Park, Se Hoon; Kang, Won Ki; Choi, Min-Gew; Lee, Jun Ho; Sohn, Tae Sung; Bae, Jae Moon; Kim, Sung; Kim, Duk-Hwan; Kim, Kyoung-Mee

    2014-01-01

    In the era of targeted therapy, mutation profiling of cancer is a crucial aspect of making therapeutic decisions. To characterize cancer at a molecular level, the use of formalin-fixed paraffin-embedded tissue is important. We tested the Ion AmpliSeq Cancer Hotspot Panel v2 and nCounter Copy Number Variation Assay in 89 formalin-fixed paraffin-embedded gastric cancer samples to determine whether they are applicable in archival clinical samples for personalized targeted therapies. We validated the results with Sanger sequencing, real-time quantitative PCR, fluorescence in situ hybridization and immunohistochemistry. Frequently detected somatic mutations included TP53 (28.17%), APC (10.1%), PIK3CA (5.6%), KRAS (4.5%), SMO (3.4%), STK11 (3.4%), CDKN2A (3.4%) and SMAD4 (3.4%). Amplifications of HER2, CCNE1, MYC, KRAS and EGFR genes were observed in 8 (8.9%), 4 (4.5%), 2 (2.2%), 1 (1.1%) and 1 (1.1%) cases, respectively. In the cases with amplification, fluorescence in situ hybridization for HER2 verified gene amplification and immunohistochemistry for HER2, EGFR and CCNE1 verified the overexpression of proteins in tumor cells. In conclusion, we successfully performed semiconductor-based sequencing and nCounter copy number variation analyses in formalin-fixed paraffin-embedded gastric cancer samples. High-throughput screening in archival clinical samples enables faster, more accurate and cost-effective detection of hotspot mutations or amplification in genes. PMID:25372287

  13. Proteus: a random forest classifier to predict disorder-to-order transitioning binding regions in intrinsically disordered proteins

    NASA Astrophysics Data System (ADS)

    Basu, Sankar; Söderquist, Fredrik; Wallner, Björn

    2017-05-01

    The focus of the computational structural biology community has taken a dramatic shift over the past one-and-a-half decades from the classical protein structure prediction problem to the possible understanding of intrinsically disordered proteins (IDP) or proteins containing regions of disorder (IDPR). The current interest lies in the unraveling of a disorder-to-order transitioning code embedded in the amino acid sequences of IDPs/IDPRs. Disordered proteins are characterized by an enormous amount of structural plasticity which makes them promiscuous in binding to different partners, multi-functional in cellular activity and atypical in folding energy landscapes resembling partially folded molten globules. Also, their involvement in several deadly human diseases (e.g. cancer, cardiovascular and neurodegenerative diseases) makes them attractive drug targets, and important for a biochemical understanding of the disease(s). The study of the structural ensemble of IDPs is rather difficult, in particular for transient interactions. When bound to a structured partner, an IDPR adapts an ordered conformation in the complex. The residues that undergo this disorder-to-order transition are called protean residues, generally found in short contiguous stretches and the first step in understanding the modus operandi of an IDP/IDPR would be to predict these residues. There are a few available methods which predict these protean segments from their amino acid sequences; however, their performance reported in the literature leaves clear room for improvement. With this background, the current study presents `Proteus', a random forest classifier that predicts the likelihood of a residue undergoing a disorder-to-order transition upon binding to a potential partner protein. The prediction is based on features that can be calculated using the amino acid sequence alone. Proteus compares favorably with existing methods predicting twice as many true positives as the second best method (55 vs. 27%) with a much higher precision on an independent data set. The current study also sheds some light on a possible `disorder-to-order' transitioning consensus, untangled, yet embedded in the amino acid sequence of IDPs. Some guidelines have also been suggested for proceeding with a real-life structural modeling involving an IDPR using Proteus.

  14. Visualization of protein sequence features using JavaScript and SVG with pViz.js.

    PubMed

    Mukhyala, Kiran; Masselot, Alexandre

    2014-12-01

    pViz.js is a visualization library for displaying protein sequence features in a Web browser. By simply providing a sequence and the locations of its features, this lightweight, yet versatile, JavaScript library renders an interactive view of the protein features. Interactive exploration of protein sequence features over the Web is a common need in Bioinformatics. Although many Web sites have developed viewers to display these features, their implementations are usually focused on data from a specific source or use case. Some of these viewers can be adapted to fit other use cases but are not designed to be reusable. pViz makes it easy to display features as boxes aligned to a protein sequence with zooming functionality but also includes predefined renderings for secondary structure and post-translational modifications. The library is designed to further customize this view. We demonstrate such applications of pViz using two examples: a proteomic data visualization tool with an embedded viewer for displaying features on protein structure, and a tool to visualize the results of the variant_effect_predictor tool from Ensembl. pViz.js is a JavaScript library, available on github at https://github.com/Genentech/pviz. This site includes examples and functional applications, installation instructions and usage documentation. A Readme file, which explains how to use pViz with examples, is available as Supplementary Material A. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  15. BIPAD: A web server for modeling bipartite sequence elements

    PubMed Central

    Bi, Chengpeng; Rogan, Peter K

    2006-01-01

    Background Many dimeric protein complexes bind cooperatively to families of bipartite nucleic acid sequence elements, which consist of pairs of conserved half-site sequences separated by intervening distances that vary among individual sites. Results We introduce the Bipad Server [1], a web interface to predict sequence elements embedded within unaligned sequences. Either a bipartite model, consisting of a pair of one-block position weight matrices (PWM's) with a gap distribution, or a single PWM matrix for contiguous single block motifs may be produced. The Bipad program performs multiple local alignment by entropy minimization and cyclic refinement using a stochastic greedy search strategy. The best models are refined by maximizing incremental information contents among a set of potential models with varying half site and gap lengths. Conclusion The web service generates information positional weight matrices, identifies binding site motifs, graphically represents the set of discovered elements as a sequence logo, and depicts the gap distribution as a histogram. Server performance was evaluated by generating a collection of bipartite models for distinct DNA binding proteins. PMID:16503993

  16. A third genotype of the human parvovirus PARV4 in sub-Saharan Africa.

    PubMed

    Simmonds, Peter; Douglas, Jill; Bestetti, Giovanna; Longhi, Erika; Antinori, Spinello; Parravicini, Carlo; Corbellino, Mario

    2008-09-01

    PARV4 is a recently discovered human parvovirus widely distributed in injecting drug users in the USA and Europe, particularly in those co-infected with human immunodeficiency virus (HIV). Like parvovirus B19, PARV4 persists in previously exposed individuals. In bone marrow and lymphoid tissue, PARV4 sequences were detected in two sub-Saharan African study subjects with AIDS but without a reported history of parenteral exposure and who were uninfected with hepatitis C virus. PARV4 variants infecting these subjects were phylogenetically distinct from genotypes 1 and 2 (formerly PARV5) that were reported previously. Analysis of near-complete genome sequences demonstrated that they should be classified as a third (equidistant) PARV4 genotype. The availability of a further near-complete genome sequence of this novel genotype facilitated identification of conserved novel open reading frames embedded in the ORF2 coding sequence; one encoded a putative protein with identifiable homology to SAT proteins of members of the genus Parvovirus.

  17. Chemical property based sequence characterization of PpcA and its homolog proteins PpcB-E: A mathematical approach

    PubMed Central

    Pal Choudhury, Pabitra

    2017-01-01

    Periplasmic c7 type cytochrome A (PpcA) protein is determined in Geobacter sulfurreducens along with its other four homologs (PpcB-E). From the crystal structure viewpoint the observation emerges that PpcA protein can bind with Deoxycholate (DXCA), while its other homologs do not. But it is yet to be established with certainty the reason behind this from primary protein sequence information. This study is primarily based on primary protein sequence analysis through the chemical basis of embedded amino acids. Firstly, we look for the chemical group specific score of amino acids. Along with this, we have developed a new methodology for the phylogenetic analysis based on chemical group dissimilarities of amino acids. This new methodology is applied to the cytochrome c7 family members and pinpoint how a particular sequence is differing with others. Secondly, we build a graph theoretic model on using amino acid sequences which is also applied to the cytochrome c7 family members and some unique characteristics and their domains are highlighted. Thirdly, we search for unique patterns as subsequences which are common among the group or specific individual member. In all the cases, we are able to show some distinct features of PpcA that emerges PpcA as an outstanding protein compared to its other homologs, resulting towards its binding with deoxycholate. Similarly, some notable features for the structurally dissimilar protein PpcD compared to the other homologs are also brought out. Further, the five members of cytochrome family being homolog proteins, they must have some common significant features which are also enumerated in this study. PMID:28362850

  18. Adaptor protein 2–mediated endocytosis of the β-secretase BACE1 is dispensable for amyloid precursor protein processing

    PubMed Central

    Prabhu, Yogikala; Burgos, Patricia V.; Schindler, Christina; Farías, Ginny G.; Magadár, Javier G.; Bonifacino, Juan S.

    2012-01-01

    The β-site amyloid precursor protein (APP)–cleaving enzyme 1 (BACE1) is a transmembrane aspartyl protease that catalyzes the proteolytic processing of APP and other plasma membrane protein precursors. BACE1 cycles between the trans-Golgi network (TGN), the plasma membrane, and endosomes by virtue of signals contained within its cytosolic C-terminal domain. One of these signals is the DXXLL-motif sequence DISLL, which controls transport between the TGN and endosomes via interaction with GGA proteins. Here we show that the DISLL sequence is embedded within a longer [DE]XXXL[LI]-motif sequence, DDISLL, which mediates internalization from the plasma membrane by interaction with the clathrin-associated, heterotetrameric adaptor protein 2 (AP-2) complex. Mutation of this signal or knockdown of either AP-2 or clathrin decreases endosomal localization and increases plasma membrane localization of BACE1. Remarkably, internalization-defective BACE1 is able to cleave an APP mutant that itself cannot be delivered to endosomes. The drug brefeldin A reversibly prevents BACE1-catalyzed APP cleavage, ruling out that this reaction occurs in the endoplasmic reticulum (ER) or ER–Golgi intermediate compartment. Taken together, these observations support the notion that BACE1 is capable of cleaving APP in late compartments of the secretory pathway. PMID:22553349

  19. Development of a Tightly Controlled Off Switch for Saccharomyces cerevisiae Regulated by Camphor, a Low-Cost Natural Product

    PubMed Central

    Ikushima, Shigehito; Zhao, Yu; Boeke, Jef D.

    2015-01-01

    Here we describe the engineering of a distant homolog of the Tet repressor, CamR, isolated from Pseudomonas putida, that is regulated by camphor, a very inexpensive small molecule (at micromolar concentrations) for use in Saccharomyces cerevisiae. The repressor was engineered by expression from a constitutive yeast promoter, fusion to a viral activator protein cassette, and codon optimization. A suitable promoter responsive to the CamR fusion protein was engineered by embedding a P. putida operator binding sequence within an upstream activating sequence (UAS)-less CYC1 promoter from S. cerevisiae. The switch, named the Camphor-Off switch, activates expression of a reporter gene in camphor-free media and represses it with micromolar concentrations of camphor. PMID:26206350

  20. Production of Supra-regular Spatial Sequences by Macaque Monkeys.

    PubMed

    Jiang, Xinjian; Long, Tenghai; Cao, Weicong; Li, Junru; Dehaene, Stanislas; Wang, Liping

    2018-06-18

    Understanding and producing embedded sequences in language, music, or mathematics, is a central characteristic of our species. These domains are hypothesized to involve a human-specific competence for supra-regular grammars, which can generate embedded sequences that go beyond the regular sequences engendered by finite-state automata. However, is this capacity truly unique to humans? Using a production task, we show that macaque monkeys can be trained to produce time-symmetrical embedded spatial sequences whose formal description requires supra-regular grammars or, equivalently, a push-down stack automaton. Monkeys spontaneously generalized the learned grammar to novel sequences, including longer ones, and could generate hierarchical sequences formed by an embedding of two levels of abstract rules. Compared to monkeys, however, preschool children learned the grammars much faster using a chunking strategy. While supra-regular grammars are accessible to nonhuman primates through extensive training, human uniqueness may lie in the speed and learning strategy with which they are acquired. Copyright © 2018 Elsevier Ltd. All rights reserved.

  1. The light-harvesting antenna of Chlorobium tepidum: interactions between the FMO protein and the major chlorosome protein CsmA studied by surface plasmon resonance.

    PubMed

    Pedersen, Marie Østergaard; Borch, Jonas; Højrup, Peter; Cox, Raymond P; Miller, Mette

    2006-09-01

    Green sulfur bacteria possess two external light-harvesting antenna systems, the chlorosome and the FMO protein, which participate in a sequential energy transfer to the reaction centers embedded in the cytoplasmic membrane. However, little is known about the physical interaction between these two antenna systems. We have studied the interaction between the major chlorosome protein, CsmA, and the FMO protein in Chlorobium tepidum using surface plasmon resonance (SPR). Our results show an interaction between the FMO protein and an immobilized synthetic peptide corresponding to 17 amino acids at the C terminal of CsmA. This interaction is dependent on the presence of a motif comprising six amino acids that are highly conserved in all the currently available CsmA protein sequences.

  2. The bioinformatics of nucleotide sequence coding for proteins requiring metal coenzymes and proteins embedded with metals

    NASA Astrophysics Data System (ADS)

    Tremberger, G.; Dehipawala, Sunil; Cheung, E.; Holden, T.; Sullivan, R.; Nguyen, A.; Lieberman, D.; Cheung, T.

    2015-09-01

    All metallo-proteins need post-translation metal incorporation. In fact, the isotope ratio of Fe, Cu, and Zn in physiology and oncology have emerged as an important tool. The nickel containing F430 is the prosthetic group of the enzyme methyl coenzyme M reductase which catalyzes the release of methane in the final step of methano-genesis, a prime energy metabolism candidate for life exploration space mission in the solar system. The 3.5 Gyr early life sulfite reductase as a life switch energy metabolism had Fe-Mo clusters. The nitrogenase for nitrogen fixation 3 billion years ago had Mo. The early life arsenite oxidase needed for anoxygenic photosynthesis energy metabolism 2.8 billion years ago had Mo and Fe. The selection pressure in metal incorporation inside a protein would be quantifiable in terms of the related nucleotide sequence complexity with fractal dimension and entropy values. Simulation model showed that the studied metal-required energy metabolism sequences had at least ten times more selection pressure relatively in comparison to the horizontal transferred sequences in Mealybug, guided by the outcome histogram of the correlation R-sq values. The metal energy metabolism sequence group was compared to the circadian clock KaiC sequence group using magnesium atomic level bond shifting mechanism in the protein, and the simulation model would suggest a much higher selection pressure for the energy life switch sequence group. The possibility of using Kepler 444 as an example of ancient life in Galaxy with the associated exoplanets has been proposed and is further discussed in this report. Examples of arsenic metal bonding shift probed by Synchrotron-based X-ray spectroscopy data and Zn controlled FOXP2 regulated pathways in human and chimp brain studied tissue samples are studied in relationship to the sequence bioinformatics. The analysis results suggest that relatively large metal bonding shift amount is associated with low probability correlation R-sq outcome in the bioinformatics simulation.

  3. ExoLocator--an online view into genetic makeup of vertebrate proteins.

    PubMed

    Khoo, Aik Aun; Ogrizek-Tomas, Mario; Bulovic, Ana; Korpar, Matija; Gürler, Ece; Slijepcevic, Ivan; Šikic, Mile; Mihalek, Ivana

    2014-01-01

    ExoLocator (http://exolocator.eopsf.org) collects in a single place information needed for comparative analysis of protein-coding exons from vertebrate species. The main source of data--the genomic sequences, and the existing exon and homology annotation--is the ENSEMBL database of completed vertebrate genomes. To these, ExoLocator adds the search for ostensibly missing exons in orthologous protein pairs across species, using an extensive computational pipeline to narrow down the search region for the candidate exons and find a suitable template in the other species, as well as state-of-the-art implementations of pairwise alignment algorithms. The resulting complements of exons are organized in a way currently unique to ExoLocator: multiple sequence alignments, both on the nucleotide and on the peptide levels, clearly indicating the exon boundaries. The alignments can be inspected in the web-embedded viewer, downloaded or used on the spot to produce an estimate of conservation within orthologous sets, or functional divergence across paralogues.

  4. Non-B-Form DNA Is Enriched at Centromeres

    PubMed Central

    Henikoff, Steven

    2018-01-01

    Abstract Animal and plant centromeres are embedded in repetitive “satellite” DNA, but are thought to be epigenetically specified. To define genetic characteristics of centromeres, we surveyed satellite DNA from diverse eukaryotes and identified variation in <10-bp dyad symmetries predicted to adopt non-B-form conformations. Organisms lacking centromeric dyad symmetries had binding sites for sequence-specific DNA-binding proteins with DNA-bending activity. For example, human and mouse centromeres are depleted for dyad symmetries, but are enriched for non-B-form DNA and are associated with binding sites for the conserved DNA-binding protein CENP-B, which is required for artificial centromere function but is paradoxically nonessential. We also detected dyad symmetries and predicted non-B-form DNA structures at neocentromeres, which form at ectopic loci. We propose that centromeres form at non-B-form DNA because of dyad symmetries or are strengthened by sequence-specific DNA binding proteins. This may resolve the CENP-B paradox and provide a general basis for centromere specification. PMID:29365169

  5. Transmembrane peptides as sensors of the membrane physical state

    NASA Astrophysics Data System (ADS)

    Piotto, Stefano; Di Biasi, Luigi; Sessa, Lucia; Concilio, Simona

    2018-05-01

    Cell membranes are commonly considered fundamental structures having multiple roles such as confinement, storage of lipids, sustain and control of membrane proteins. In spite of their importance, many aspects remain unclear. The number of lipid types is orders of magnitude larger than the number of amino acids, and this compositional complexity is not clearly embedded in any membrane model. A diffused hypothesis is that the large lipid palette permits to recruit and organize specific proteins controlling the formation of specialized lipid domains and the lateral pressure profile of the bilayer. Unfortunately, a satisfactory knowledge of lipid abundance remains utopian because of the technical difficulties in isolating definite membrane regions. More importantly, a theoretical framework where to fit the lipidomic data is still missing. In this work, we wish to utilize the amino acid sequence and frequency of the membrane proteins as bioinformatics sensors of cell bilayers. The use of an alignment-free method to find a correlation between the sequences of transmembrane portion of membrane proteins with the membrane physical state suggested a new approach for the discovery of antimicrobial peptides.

  6. The mechanochemistry of copper reports on the directionality of unfolding in model cupredoxin proteins

    NASA Astrophysics Data System (ADS)

    Beedle, Amy E. M.; Lezamiz, Ainhoa; Stirnemann, Guillaume; Garcia-Manyes, Sergi

    2015-08-01

    Understanding the directionality and sequence of protein unfolding is crucial to elucidate the underlying folding free energy landscape. An extra layer of complexity is added in metalloproteins, where a metal cofactor participates in the correct, functional fold of the protein. However, the precise mechanisms by which organometallic interactions are dynamically broken and reformed on (un)folding are largely unknown. Here we use single molecule force spectroscopy AFM combined with protein engineering and MD simulations to study the individual unfolding pathways of the blue-copper proteins azurin and plastocyanin. Using the nanomechanical properties of the native copper centre as a structurally embedded molecular reporter, we demonstrate that both proteins unfold via two independent, competing pathways. Our results provide experimental evidence of a novel kinetic partitioning scenario whereby the protein can stochastically unfold through two distinct main transition states placed at the N and C termini that dictate the direction in which unfolding occurs.

  7. Frameshifting in alphaviruses: a diversity of 3' stimulatory structures.

    PubMed

    Chung, Betty Y-W; Firth, Andrew E; Atkins, John F

    2010-03-26

    Programmed ribosomal frameshifting allows the synthesis of alternative, N-terminally coincident, C-terminally distinct proteins from the same RNA. Many viruses utilize frameshifting to optimize the coding potential of compact genomes, to circumvent the host cell's canonical rule of one functional protein per mRNA, or to express alternative proteins in a fixed ratio. Programmed frameshifting is also used in the decoding of a small number of cellular genes. Recently, specific ribosomal -1 frameshifting was discovered at a conserved U_UUU_UUA motif within the sequence encoding the alphavirus 6K protein. In this case, frameshifting results in the synthesis of an additional protein, termed TF (TransFrame). This new case of frameshifting is unusual in that the -1 frame ORF is very short and completely embedded within the sequence encoding the overlapping polyprotein. The present work shows that there is remarkable diversity in the 3' sequences that are functionally important for efficient frameshifting at the U_UUU_UUA motif. While many alphavirus species utilize a 3' RNA structure such as a hairpin or pseudoknot, some species (such as Semliki Forest virus) apparently lack any intra-mRNA stimulatory structure, yet just 20 nt 3'-adjacent to the shift site stimulates up to 10% frameshifting. The analysis, both experimental and bioinformatic, significantly expands the known repertoire of -1 frameshifting stimulators in mammalian and insect systems.

  8. Sequence2Vec: a novel embedding approach for modeling transcription factor binding affinity landscape.

    PubMed

    Dai, Hanjun; Umarov, Ramzan; Kuwahara, Hiroyuki; Li, Yu; Song, Le; Gao, Xin

    2017-11-15

    An accurate characterization of transcription factor (TF)-DNA affinity landscape is crucial to a quantitative understanding of the molecular mechanisms underpinning endogenous gene regulation. While recent advances in biotechnology have brought the opportunity for building binding affinity prediction methods, the accurate characterization of TF-DNA binding affinity landscape still remains a challenging problem. Here we propose a novel sequence embedding approach for modeling the transcription factor binding affinity landscape. Our method represents DNA binding sequences as a hidden Markov model which captures both position specific information and long-range dependency in the sequence. A cornerstone of our method is a novel message passing-like embedding algorithm, called Sequence2Vec, which maps these hidden Markov models into a common nonlinear feature space and uses these embedded features to build a predictive model. Our method is a novel combination of the strength of probabilistic graphical models, feature space embedding and deep learning. We conducted comprehensive experiments on over 90 large-scale TF-DNA datasets which were measured by different high-throughput experimental technologies. Sequence2Vec outperforms alternative machine learning methods as well as the state-of-the-art binding affinity prediction methods. Our program is freely available at https://github.com/ramzan1990/sequence2vec. xin.gao@kaust.edu.sa or lsong@cc.gatech.edu. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  9. The glucose transporter 1 -GLUT1- from the white shrimp Litopenaeus vannamei is up-regulated during hypoxia.

    PubMed

    Martínez-Quintana, José A; Peregrino-Uriarte, Alma B; Gollas-Galván, Teresa; Gómez-Jiménez, Silvia; Yepiz-Plascencia, Gloria

    2014-12-01

    During hypoxia the shrimp Litopenaeus vannamei accelerates anaerobic glycolysis to obtain energy; therefore, a correct supply of glucose to the cells is needed. Facilitated glucose transport across the cells is mediated by a group of membrane embedded integral proteins called GLUT; being GLUT1 the most ubiquitous form. In this work, we report the first cDNA nucleotide and deduced amino acid sequences of a glucose transporter 1 from L. vannamei. A 1619 bp sequence was obtained by RT-PCR and RACE approaches. The 5´ UTR is 161 bp and the poly A tail is exactly after the stop codon in the mRNA. The ORF is 1485 bp and codes for 485 amino acids. The deduced protein sequence has high identity to GLUT1 proteins from several species and contains all the main features of glucose transporter proteins, including twelve transmembrane domains, the conserved motives and amino acids involved in transport activity, ligands binding and membrane anchor. Therefore, we decided to name this sequence, glucose transporter 1 of L. vannamei (LvGLUT1). A partial gene sequence of 8.87 Kbp was also obtained; it contains the complete coding sequence divided in 10 exons. LvGlut1 expression was detected in hemocytes, hepatopancreas, intestine gills, muscle and pleopods. The higher relative expression was found in gills and the lower in hemocytes. This indicates that LvGlut1 is ubiquitously expressed but its levels are tissue-specific and upon short-term hypoxia, the GLUT1 transcripts increase 3.7-fold in hepatopancreas and gills. To our knowledge, this is the first evidence of expression of GLUT1 in crustaceans.

  10. Perception and the Temporal Properties of Speech.

    DTIC Science & Technology

    1993-01-11

    conditions. In the embedded condition, phoneme sequences equivalent to these words formed the second syllable of a two-syllable word. In the unembedded ... unembedded in the sequence "warm lips". These priming sequences were based on the sequences used in Experiment 2. Each combinable priming sequence in...unrelated, to the embedded or unembedded prime word. The probes used in this experiment were identical to the ones used in Experiment 2. Subjects were tested

  11. Identification of tissue-embedded ascarid larvae by ribosomal DNA sequencing.

    PubMed

    Ishiwata, Kenji; Shinohara, Akio; Yagi, Kinpei; Horii, Yoichiro; Tsuchiya, Kimiyuki; Nawa, Yukifumi

    2004-01-01

    Polymerase chain reaction (PCR) was applied to identify tissue-embedded ascarid nematode larvae. Two sequences of the internal transcribed spacer (ITS) regions of ribosomal DNA (rDNA), ITS1 and ITS2, of the ascarid parasites were amplified and compared with those of ascarid-nematodes registered in a DNA database (GenBank). The ITS sequences of the PCR products obtained from the ascarid parasite specimen in our laboratory were compatible with those of registered adult Ascaris and Toxocara parasites. PCR amplification of the ITS regions was sensitive enough to detect a single larva of Ascaris suum mixed with porcine liver tissue. Using this method, ascarid larvae embedded in the liver of a naturally infected turkey were identified as Toxocara canis. These results suggest that even a single larva embedded in tissues from patients with larva migrans could be identified by sequencing the ITS regions.

  12. Prediction of Nucleotide Binding Peptides Using Star Graph Topological Indices.

    PubMed

    Liu, Yong; Munteanu, Cristian R; Fernández Blanco, Enrique; Tan, Zhiliang; Santos Del Riego, Antonino; Pazos, Alejandro

    2015-11-01

    The nucleotide binding proteins are involved in many important cellular processes, such as transmission of genetic information or energy transfer and storage. Therefore, the screening of new peptides for this biological function is an important research topic. The current study proposes a mixed methodology to obtain the first classification model that is able to predict new nucleotide binding peptides, using only the amino acid sequence. Thus, the methodology uses a Star graph molecular descriptor of the peptide sequences and the Machine Learning technique for the best classifier. The best model represents a Random Forest classifier based on two features of the embedded and non-embedded graphs. The performance of the model is excellent, considering similar models in the field, with an Area Under the Receiver Operating Characteristic Curve (AUROC) value of 0.938 and true positive rate (TPR) of 0.886 (test subset). The prediction of new nucleotide binding peptides with this model could be useful for drug target studies in drug development. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  13. MemBrain: An Easy-to-Use Online Webserver for Transmembrane Protein Structure Prediction

    NASA Astrophysics Data System (ADS)

    Yin, Xi; Yang, Jing; Xiao, Feng; Yang, Yang; Shen, Hong-Bin

    2018-03-01

    Membrane proteins are an important kind of proteins embedded in the membranes of cells and play crucial roles in living organisms, such as ion channels, transporters, receptors. Because it is difficult to determinate the membrane protein's structure by wet-lab experiments, accurate and fast amino acid sequence-based computational methods are highly desired. In this paper, we report an online prediction tool called MemBrain, whose input is the amino acid sequence. MemBrain consists of specialized modules for predicting transmembrane helices, residue-residue contacts and relative accessible surface area of α-helical membrane proteins. MemBrain achieves a prediction accuracy of 97.9% of A TMH, 87.1% of A P, 3.2 ± 3.0 of N-score, 3.1 ± 2.8 of C-score. MemBrain-Contact obtains 62%/64.1% prediction accuracy on training and independent dataset on top L/5 contact prediction, respectively. And MemBrain-Rasa achieves Pearson correlation coefficient of 0.733 and its mean absolute error of 13.593. These prediction results provide valuable hints for revealing the structure and function of membrane proteins. MemBrain web server is free for academic use and available at www.csbio.sjtu.edu.cn/bioinf/MemBrain/. [Figure not available: see fulltext.

  14. Embedding and Chemical Reactivation of Green Fluorescent Protein in the Whole Mouse Brain for Optical Micro-Imaging

    PubMed Central

    Gang, Yadong; Zhou, Hongfu; Jia, Yao; Liu, Ling; Liu, Xiuli; Rao, Gong; Li, Longhui; Wang, Xiaojun; Lv, Xiaohua; Xiong, Hanqing; Yang, Zhongqin; Luo, Qingming; Gong, Hui; Zeng, Shaoqun

    2017-01-01

    Resin embedding has been widely applied to fixing biological tissues for sectioning and imaging, but has long been regarded as incompatible with green fluorescent protein (GFP) labeled sample because it reduces fluorescence. Recently, it has been reported that resin-embedded GFP-labeled brain tissue can be imaged with high resolution. In this protocol, we describe an optimized protocol for resin embedding and chemical reactivation of fluorescent protein labeled mouse brain, we have used mice as experiment model, but the protocol should be applied to other species. This method involves whole brain embedding and chemical reactivation of the fluorescent signal in resin-embedded tissue. The whole brain embedding process takes a total of 7 days. The duration of chemical reactivation is ~2 min for penetrating 4 μm below the surface in the resin-embedded brain. This protocol provides an efficient way to prepare fluorescent protein labeled sample for high-resolution optical imaging. This kind of sample was demonstrated to be imaged by various optical micro-imaging methods. Fine structures labeled with GFP across a whole brain can be detected. PMID:28352214

  15. Evolutionary conservation and regulation of particular alternative splicing events in plant SR proteins

    PubMed Central

    Kalyna, Maria; Lopato, Sergiy; Voronin, Viktor; Barta, Andrea

    2006-01-01

    Alternative splicing is an important mechanism for fine tuning of gene expression at the post-transcriptional level. SR proteins govern splice site selection and spliceosome assembly. The Arabidopsis genome encodes 19 SR proteins, several of which have no orthologues in metazoan. Three of the plant specific subfamilies are characterized by the presence of a relatively long alternatively spliced intron located in their first RNA recognition motif, which potentially results in an extremely truncated protein. In atRSZ33, a member of the RS2Z subfamily, this alternative splicing event was shown to be autoregulated. Here we show that atRSp31, a member of the RS subfamily, does not autoregulate alternative splicing of its similarily positioned intron. Interestingly, this alternative splicing event is regulated by atRSZ33. We demonstrate that the positions of these long introns and their capability for alternative splicing are conserved from green algae to flowering plants. Moreover, in particular alternative splicing events the splicing signals are embedded into highly conserved sequences. In different taxa, these conserved sequences occur in at least one gene within a subfamily. The evolutionary preservation of alternative splice forms together with highly conserved intron features argues for additional functions hidden in the genes of these plant-specific SR proteins. PMID:16936312

  16. Collagen-Like Proteins in Pathogenic E. coli Strains

    PubMed Central

    Ghosh, Neelanjana; McKillop, Thomas J.; Jowitt, Thomas A.; Howard, Marjorie; Davies, Heather; Holmes, David F.; Roberts, Ian S.; Bella, Jordi

    2012-01-01

    The genome sequences of enterohaemorrhagic E. coli O157:H7 strains show multiple open-reading frames with collagen-like sequences that are absent from the common laboratory strain K-12. These putative collagens are included in prophages embedded in O157:H7 genomes. These prophages carry numerous genes related to strain virulence and have been shown to be inducible and capable of disseminating virulence factors by horizontal gene transfer. We have cloned two collagen-like proteins from E. coli O157:H7 into a laboratory strain and analysed the structure and conformation of the recombinant proteins and several of their constituting domains by a variety of spectroscopic, biophysical, and electron microscopy techniques. We show that these molecules exhibit many of the characteristics of vertebrate collagens, including trimer formation and the presence of a collagen triple helical domain. They also contain a C-terminal trimerization domain, and a trimeric α-helical coiled-coil domain with an unusual amino acid sequence almost completely lacking leucine, valine or isoleucine residues. Intriguingly, these molecules show high thermal stability, with the collagen domain being more stable than those of vertebrate fibrillar collagens, which are much longer and post-translationally modified. Under the electron microscope, collagen-like proteins from E. coli O157:H7 show a dumbbell shape, with two globular domains joined by a hinged stalk. This morphology is consistent with their likely role as trimeric phage side-tail proteins that participate in the attachment of phage particles to E. coli target cells, either directly or through assembly with other phage tail proteins. Thus, collagen-like proteins in enterohaemorrhagic E. coli genomes may have a direct role in the dissemination of virulence-related genes through infection of harmless strains by induced bacteriophages. PMID:22701585

  17. Operating characteristics of the implicit learning system supporting serial interception sequence learning.

    PubMed

    Sanchez, Daniel J; Reber, Paul J

    2012-04-01

    The memory system that supports implicit perceptual-motor sequence learning relies on brain regions that operate separately from the explicit, medial temporal lobe memory system. The implicit learning system therefore likely has distinct operating characteristics and information processing constraints. To attempt to identify the limits of the implicit sequence learning mechanism, participants performed the serial interception sequence learning (SISL) task with covertly embedded repeating sequences that were much longer than most previous studies: ranging from 30 to 60 (Experiment 1) and 60 to 90 (Experiment 2) items in length. Robust sequence-specific learning was observed for sequences up to 80 items in length, extending the known capacity of implicit sequence learning. In Experiment 3, 12-item repeating sequences were embedded among increasing amounts of irrelevant nonrepeating sequences (from 20 to 80% of training trials). Despite high levels of irrelevant trials, learning occurred across conditions. A comparison of learning rates across all three experiments found a surprising degree of constancy in the rate of learning regardless of sequence length or embedded noise. Sequence learning appears to be constant with the logarithm of the number of sequence repetitions practiced during training. The consistency in learning rate across experiments and conditions implies that the mechanisms supporting implicit sequence learning are not capacity-constrained by very long sequences nor adversely affected by high rates of irrelevant sequences during training.

  18. Prediction of enhancer-promoter interactions via natural language processing.

    PubMed

    Zeng, Wanwen; Wu, Mengmeng; Jiang, Rui

    2018-05-09

    Precise identification of three-dimensional genome organization, especially enhancer-promoter interactions (EPIs), is important to deciphering gene regulation, cell differentiation and disease mechanisms. Currently, it is a challenging task to distinguish true interactions from other nearby non-interacting ones since the power of traditional experimental methods is limited due to low resolution or low throughput. We propose a novel computational framework EP2vec to assay three-dimensional genomic interactions. We first extract sequence embedding features, defined as fixed-length vector representations learned from variable-length sequences using an unsupervised deep learning method in natural language processing. Then, we train a classifier to predict EPIs using the learned representations in supervised way. Experimental results demonstrate that EP2vec obtains F1 scores ranging from 0.841~ 0.933 on different datasets, which outperforms existing methods. We prove the robustness of sequence embedding features by carrying out sensitivity analysis. Besides, we identify motifs that represent cell line-specific information through analysis of the learned sequence embedding features by adopting attention mechanism. Last, we show that even superior performance with F1 scores 0.889~ 0.940 can be achieved by combining sequence embedding features and experimental features. EP2vec sheds light on feature extraction for DNA sequences of arbitrary lengths and provides a powerful approach for EPIs identification.

  19. The Disordered C-Terminus of Yeast Hsf1 Contains a Cryptic Low-Complexity Amyloidogenic Region.

    PubMed

    Pujols, Jordi; Santos, Jaime; Pallarès, Irantzu; Ventura, Salvador

    2018-05-06

    Response mechanisms to external stress rely on networks of proteins able to activate specific signaling pathways to ensure the maintenance of cell proteostasis. Many of the proteins mediating this kind of response contain intrinsically disordered regions, which lack a defined structure, but still are able to interact with a wide range of clients that modulate the protein function. Some of these interactions are mediated by specific short sequences embedded in the longer disordered regions. Because the physicochemical properties that promote functional and abnormal interactions are similar, it has been shown that, in globular proteins, aggregation-prone and binding regions tend to overlap. It could be that the same principle applies for disordered protein regions. In this context, we show here that a predicted low-complexity interacting region in the disordered C-terminus of the stress response master regulator heat shock factor 1 (Hsf1) protein corresponds to a cryptic amyloid region able to self-assemble into fibrillary structures resembling those found in neurodegenerative disorders.

  20. Evaluating Quality of Aged Archival Formalin-Fixed Paraffin-Embedded Samples for RNA-Sequencing

    EPA Science Inventory

    Archival formalin-fixed paraffin-embedded (FFPE) samples offer a vast, untapped source of genomic data for biomarker discovery. However, the quality of FFPE samples is often highly variable, and conventional methods to assess RNA quality for RNA-sequencing (RNA-seq) are not infor...

  1. The complete mitochondrial genome of lesser long-tailed Hamster Cricetulus longicaudatus (Milne-Edwards, 1867) and phylogenetic implications.

    PubMed

    Zhang, Ziqi; Sun, Tong; Kang, Chunlan; Liu, Yang; Liu, Shaoying; Yue, Bisong; Zeng, Tao

    2016-01-01

    The complete mitochondrial genome sequence of Cricetulus longicaudatus (Rodentia Cricetidae: Cricetinae) was determined and was deposited in GenBank (GenBank accession no. KM067270). The mitochondrial genome of C. longicaudatus was 16,302 bp in length and contained 13 protein-coding genes, 2 ribosomal RNA (rRNA) genes, 22 transfer RNA (tRNA) genes and one control region, with an identical order to that of other rodents' mitochondrial genomes. The phylogenetic analysis was performed with Bayesian inference based on the concatenated nucleotide sequence of 12 protein-coding genes on the heavy strand. The result showed that these species from Cricetidae and its two subfamilies (Cricetinae and Arvicolines) formed solid monophyletic group, respectively. The Cricetulus had close phylogenetic relationship with Tscherskia among three genera (Cricetulus, Cricetulus and Mesocricetus). Neodon irene and Myodes regulus were embedded in Microtus and Eothenomys, respectively. The unusual phylogenetic positions of Neodon irene and Myodes regulus remain further study in the future.

  2. Membrane Topology and Insertion of Membrane Proteins: Search for Topogenic Signals

    PubMed Central

    van Geest, Marleen; Lolkema, Juke S.

    2000-01-01

    Integral membrane proteins are found in all cellular membranes and carry out many of the functions that are essential to life. The membrane-embedded domains of integral membrane proteins are structurally quite simple, allowing the use of various prediction methods and biochemical methods to obtain structural information about membrane proteins. A critical step in the biosynthetic pathway leading to the folded protein in the membrane is its insertion into the lipid bilayer. Understanding of the fundamentals of the insertion and folding processes will significantly improve the methods used to predict the three-dimensional membrane protein structure from the amino acid sequence. In the first part of this review, biochemical approaches to elucidate membrane protein topology are reviewed and evaluated, and in the second part, the use of similar techniques to study membrane protein insertion is discussed. The latter studies search for signals in the polypeptide chain that direct the insertion process. Knowledge of the topogenic signals in the nascent chain of a membrane protein is essential for the evaluation of membrane topology studies. PMID:10704472

  3. Literature classification for semi-automated updating of biological knowledgebases

    PubMed Central

    2013-01-01

    Background As the output of biological assays increase in resolution and volume, the body of specialized biological data, such as functional annotations of gene and protein sequences, enables extraction of higher-level knowledge needed for practical application in bioinformatics. Whereas common types of biological data, such as sequence data, are extensively stored in biological databases, functional annotations, such as immunological epitopes, are found primarily in semi-structured formats or free text embedded in primary scientific literature. Results We defined and applied a machine learning approach for literature classification to support updating of TANTIGEN, a knowledgebase of tumor T-cell antigens. Abstracts from PubMed were downloaded and classified as either "relevant" or "irrelevant" for database update. Training and five-fold cross-validation of a k-NN classifier on 310 abstracts yielded classification accuracy of 0.95, thus showing significant value in support of data extraction from the literature. Conclusion We here propose a conceptual framework for semi-automated extraction of epitope data embedded in scientific literature using principles from text mining and machine learning. The addition of such data will aid in the transition of biological databases to knowledgebases. PMID:24564403

  4. Discovery of an unusual biosynthetic origin for circular proteins in legumes

    PubMed Central

    Poth, Aaron G.; Colgrave, Michelle L.; Lyons, Russell E.; Daly, Norelle L.; Craik, David J.

    2011-01-01

    Cyclotides are plant-derived proteins that have a unique cyclic cystine knot topology and are remarkably stable. Their natural function is host defense, but they have a diverse range of pharmaceutically important activities, including uterotonic activity and anti-HIV activity, and have also attracted recent interest as templates in drug design. Here we report an unusual biosynthetic origin of a precursor protein of a cyclotide from the butterfly pea, Clitoria ternatea, a representative member of the Fabaceae plant family. Unlike all previously reported cyclotides, the domain corresponding to the mature cyclotide from this Fabaceae plant is embedded within an albumin precursor protein. We confirmed the expression and correct processing of the cyclotide encoded by the Cter M precursor gene transcript following extraction from C. ternatea leaf and sequencing by tandem mass spectrometry. The sequence was verified by direct chemical synthesis and the peptide was found to adopt a classic knotted cyclotide fold as determined by NMR spectroscopy. Seven additional cyclotide sequences were also identified from C. ternatea leaf and flower, five of which were unique. Cter M displayed insecticidal activity against the cotton budworm Helicoverpa armigera and bound to phospholipid membranes, suggesting its activity is modulated by membrane disruption. The Fabaceae is the third largest family of flowering plants and many Fabaceous plants are of huge significance for human nutrition. Knowledge of Fabaceae cyclotide gene transcripts should enable the production of modified cyclotides in crop plants for a variety of agricultural or pharmaceutical applications, including plant-produced designer peptide drugs. PMID:21593408

  5. No evidence for a role of modified live virus vaccines in the emergence of canine parvovirus.

    PubMed

    Truyen, U; Geissler, K; Parrish, C R; Hermanns, W; Siegl, G

    1998-05-01

    In this study the early evolution and potential origins of canine parvovirus (CPV) were examined. We cloned and sequenced the VP2 capsid protein genes of three German CPV strains isolated in 1979-1980, as well as two feline panleukopenia virus (FPV) vaccine viruses that were previously shown to have some restriction enzyme cleavage sites in common with CPV. Other partial VP2 gene sequences were obtained by amplifying CPV DNA from paraffin-embedded tissues of dogs which were early parvovirus disease cases in Germany in 1978-1979. Sequences were analysed with respect to their evolutionary relationships to other CPV and FPV isolates. Those analyses did not support the hypothesis that CPV emerged as a variant of an FPV vaccine virus. Neither did they reveal ancestral sequences among the very early CPV isolates examined. Other possible sources for the origin of CPV are examined, including the involvement of viruses from wild carnivores.

  6. Development of a neutral embedding resin for optical imaging of fluorescently labeled biological tissue

    NASA Astrophysics Data System (ADS)

    Zhou, Hongfu; Gang, Yadong; Chen, Shenghua; Wang, Yu; Xiong, Yumiao; Li, Longhui; Yin, Fangfang; Liu, Yue; Liu, Xiuli; Zeng, Shaoqun

    2017-10-01

    Plastic embedding is widely applied in light microscopy analyses. Previous studies have shown that embedding agents and related techniques can greatly affect the quality of biological tissue embedding and fluorescent imaging. Specifically, it is difficult to preserve endogenous fluorescence using currently available acidic commercial embedding resins and related embedding techniques directly. Here, we developed a neutral embedding resin that improved the green fluorescent protein (GFP), yellow fluorescent protein (YFP), and DsRed fluorescent intensity without adjusting the pH value of monomers or reactivating fluorescence in lye. The embedding resin had a high degree of polymerization, and its fluorescence preservation ratios for GFP, YFP, and DsRed were 126.5%, 155.8%, and 218.4%, respectively.

  7. Chemical reactivation of quenched fluorescent protein molecules enables resin-embedded fluorescence microimaging

    PubMed Central

    Xiong, Hanqing; Zhou, Zhenqiao; Zhu, Mingqiang; Lv, Xiaohua; Li, Anan; Li, Shiwei; Li, Longhui; Yang, Tao; Wang, Siming; Yang, Zhongqin; Xu, Tonghui; Luo, Qingming; Gong, Hui; Zeng, Shaoqun

    2014-01-01

    Resin embedding is a well-established technique to prepare biological specimens for microscopic imaging. However, it is not compatible with modern green-fluorescent protein (GFP) fluorescent-labelling technique because it significantly quenches the fluorescence of GFP and its variants. Previous empirical optimization efforts are good for thin tissue but not successful on macroscopic tissue blocks as the quenching mechanism remains uncertain. Here we show most of the quenched GFP molecules are structurally preserved and not denatured after routine embedding in resin, and can be chemically reactivated to a fluorescent state by alkaline buffer during imaging. We observe up to 98% preservation in yellow-fluorescent protein case, and improve the fluorescence intensity 11.8-fold compared with unprocessed samples. We demonstrate fluorescence microimaging of resin-embedded EGFP/EYFP-labelled tissue block without noticeable loss of labelled structures. This work provides a turning point for the imaging of fluorescent protein-labelled specimens after resin embedding. PMID:24886825

  8. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Abraitiene, Asta; US Department of Agriculture, Agricultural Research Service, Molecular Plant Pathology Laboratory, Room 214 Building 004 BARC-West, 10300 Baltimore Avenue, Beltsville, MD 20705; Zhao Yan

    Transient expression of engineered reporter RNAs encoding an intron-containing green fluorescent protein (GFP) from a Potato virus X-based expression vector previously demonstrated the nuclear targeting capability of the 359 nucleotide Potato spindle tuber viroid (PSTVd) RNA genome. To further delimit the putative nuclear-targeting signal, PSTVd subgenomic fragments were embedded within the intron, and recombinant reporter RNAs were inoculated onto Nicotiana benthamiana plants. Appearance of green fluorescence in leaf tissue inoculated with PSTVd-fragment-containing constructs indicated shuttling of the RNA into the nucleus by fragments as short as 80 nucleotides in length. Plant-to-plant variation in the timing of intron removal and subsequentmore » GFP fluorescence was observed; however, earliest and most abundant GFP expression was obtained with constructs containing the conserved hairpin I palindrome structure and embedded upper central conserved region. Our results suggest that this conserved sequence and/or the stem-loop structure it forms is sufficient for import of PSTVd into the nucleus.« less

  9. Embedding siRNA sequences targeting Apolipoprotein B100 in shRNA and miRNA scaffolds results in differential processing and in vivo efficacy

    PubMed Central

    Maczuga, Piotr; Lubelski, Jacek; van Logtenstein, Richard; Borel, Florie; Blits, Bas; Fakkert, Erwin; Costessi, Adalberto; Butler, Derek; van Deventer, Sander; Petry, Harald; Koornneef, Annemart; Konstantinova, Pavlina

    2013-01-01

    Overexpression of short hairpin RNA (shRNA) often causes cytotoxicity and using microRNA (miRNA) scaffolds can circumvent this problem. In this study, identically predicted small interfering RNA (siRNA) sequences targeting apolipoprotein B100 (siApoB) were embedded in shRNA (shApoB) or miRNA (miApoB) scaffolds and a direct comparison of the processing and long-term in vivo efficacy was performed. Next generation sequencing of small RNAs originating from shApoB- or miApoB-transfected cells revealed substantial differences in processing, resulting in different siApoB length, 5′ and 3′ cleavage sites and abundance of the guide or passenger strands. Murine liver transduction with adeno-associated virus (AAV) vectors expressing shApoB or miApoB resulted in high levels of siApoB expression associated with strong decrease of plasma ApoB protein and cholesterol. Expression of miApoB from the liver-specific LP1 promoter was restricted to the liver, while the H1 promoter-expressed shApoB was ectopically present. Delivery of 1 × 1011 genome copies AAV-shApoB or AAV-miApoB led to a gradual loss of ApoB and plasma cholesterol inhibition, which was circumvented by delivering a 20-fold lower vector dose. In conclusion, incorporating identical siRNA sequences in shRNA or miRNA scaffolds results in differential processing patterns and in vivo efficacy that may have serious consequences for future RNAi-based therapeutics. PMID:23089734

  10. Val-->Ala mutations selectively alter helix-helix packing in the transmembrane segment of phage M13 coat protein.

    PubMed Central

    Deber, C M; Khan, A R; Li, Z; Joensson, C; Glibowicka, M; Wang, J

    1993-01-01

    Val-->Ala mutations within the effective transmembrane segment of a model single-spanning membrane protein, the 50-residue major coat (gene VIII) protein of bacteriophage M13, are shown to have sequence-dependent impacts on stabilization of membrane-embedded helical dimeric structures. Randomized mutagenesis performed on the coat protein hydrophobic segment 21-39 (YIGYAWAMV-VVIVGATIGI) produced a library of viable mutants which included those in which each of the four valine residues was replaced by an alanine residue. Significant variations found among these Val-->Ala mutants in the relative populations and thermal stabilities of monomeric and dimeric helical species observed on SDS/PAGE, and in the range of their alpha-helix-->beta-sheet transition temperatures confirmed that intramembranous valine residues are not simply universal contributors to membrane anchoring. Additional analyses of (i) nonmutatable sites in the mutant protein library, (ii) the properties of the double mutant V29A-V31A obtained by recycling mutant V31A DNA through mutagenesis procedures, and (iii) energy-minimized helical dimer structures of wild-type and mutant V31A transmembrane regions indicated that the transmembrane hydrophobic core helix of the M13 coat protein can be partitioned into alternating pairs of potential protein-interactive residues (V30, V31; G34, A35; G38, I39) and membrane-interactive residues (M28, V29; I32, V33; T36, I37). The overall results consitute an experimental approach to categorizing the distinctive contributions to structure of the residues comprising a protein-protein packing interface vs. those facing lipid and confirm the sequence-dependent capacity of specific residues within the transmembrane domain to modulate protein-protein interactions which underlie regulatory events in membrane proteins. Images Fig. 2 Fig. 4 PMID:8265602

  11. Val-->Ala mutations selectively alter helix-helix packing in the transmembrane segment of phage M13 coat protein.

    PubMed

    Deber, C M; Khan, A R; Li, Z; Joensson, C; Glibowicka, M; Wang, J

    1993-12-15

    Val-->Ala mutations within the effective transmembrane segment of a model single-spanning membrane protein, the 50-residue major coat (gene VIII) protein of bacteriophage M13, are shown to have sequence-dependent impacts on stabilization of membrane-embedded helical dimeric structures. Randomized mutagenesis performed on the coat protein hydrophobic segment 21-39 (YIGYAWAMV-VVIVGATIGI) produced a library of viable mutants which included those in which each of the four valine residues was replaced by an alanine residue. Significant variations found among these Val-->Ala mutants in the relative populations and thermal stabilities of monomeric and dimeric helical species observed on SDS/PAGE, and in the range of their alpha-helix-->beta-sheet transition temperatures confirmed that intramembranous valine residues are not simply universal contributors to membrane anchoring. Additional analyses of (i) nonmutatable sites in the mutant protein library, (ii) the properties of the double mutant V29A-V31A obtained by recycling mutant V31A DNA through mutagenesis procedures, and (iii) energy-minimized helical dimer structures of wild-type and mutant V31A transmembrane regions indicated that the transmembrane hydrophobic core helix of the M13 coat protein can be partitioned into alternating pairs of potential protein-interactive residues (V30, V31; G34, A35; G38, I39) and membrane-interactive residues (M28, V29; I32, V33; T36, I37). The overall results consitute an experimental approach to categorizing the distinctive contributions to structure of the residues comprising a protein-protein packing interface vs. those facing lipid and confirm the sequence-dependent capacity of specific residues within the transmembrane domain to modulate protein-protein interactions which underlie regulatory events in membrane proteins.

  12. A microRNA embedded AAV alpha-synuclein gene silencing vector for dopaminergic neurons

    PubMed Central

    Han, Ye; Khodr, Christina E.; Sapru, Mohan K.; Pedapati, Jyothi; Bohn, Martha C.

    2011-01-01

    Alpha-synuclein (SNCA), an abundantly expressed presynaptic protein, is implicated in Parkinson disease (PD). Since over-expression of human SNCA (hSNCA) leads to death of dopaminergic (DA) neurons in human, rodent and fly brain, hSNCA gene silencing may reduce levels of toxic forms of SNCA and ameliorate degeneration of DA neurons in PD. To begin to develop a gene therapy for PD based on hSNCA gene silencing, two AAV gene silencing vectors were designed, and tested for efficiency and specificity of silencing, as well as toxicity in vitro. The same hSNCA silencing sequence (shRNA) was used in both vectors, but in one vector, the shRNA was embedded in a microRNA backbone and driven by a pol II promoter, and in the other the shRNA was not embedded in a microRNA and was driven by a pol III promoter. Both vectors silenced hSNCA to the same extent in 293T cells transfected with hSNCA. In DA PC12 cells, neither vector decreased expression of rat SNCA, tyrosine hydroxylase (TH), dopamine transporter (DAT) or the vesicular monoamine transporter (VMAT). However, the mir30 embedded vector was significantly less toxic to both PC12 and SH-SY5Y cells. Our in vitro data suggest that this miRNA-embedded silencing vector may be ideal for chronic in vivo SNCA gene silencing in DA neurons. PMID:21338582

  13. Detection of a putative novel adenovirus by PCR amplification, sequencing and phylogenetic characterisation of two gene fragments from formalin-fixed paraffin-embedded tissues of a cat diagnosed with disseminated adenovirus disease.

    PubMed

    Lakatos, Béla; Hornyák, Ákos; Demeter, Zoltán; Forgách, Petra; Kennedy, Frances; Rusvai, Miklós

    2017-12-01

    Adenoviral nucleic acid was detected by polymerase chain reaction (PCR) in formalin-fixed paraffin-embedded tissue samples of a cat that had suffered from disseminated adenovirus infection. The identity of the amplified products from the hexon and DNA-dependent DNA polymerase genes was confirmed by DNA sequencing. The sequences were clearly distinguishable from corresponding hexon and polymerase sequences of other mastadenoviruses, including human adenoviruses. These results suggest the possible existence of a distinct feline adenovirus.

  14. Both coding exons of the c-myc gene contribute to its posttranscriptional regulation in the quiescent liver and regenerating liver and after protein synthesis inhibition.

    PubMed Central

    Lavenu, A; Pistoi, S; Pournin, S; Babinet, C; Morello, D

    1995-01-01

    In vivo, the steady-state level of c-myc mRNA is mainly controlled by posttranscriptional mechanisms. Using a panel of transgenic mice in which various versions of the human c-myc proto-oncogene were under the control of major histocompatibility complex H-2Kb class I regulatory sequences, we have shown that the 5' and the 3' noncoding sequences are dispensable for obtaining a regulated expression of the transgene in adult quiescent tissues, at the start of liver regeneration, and after inhibition of protein synthesis. These results indicated that the coding sequences were sufficient to ensure a regulated c-myc expression. In the present study, we have pursued this analysis with transgenes containing one or the other of the two c-myc coding exons either alone or in association with the c-myc 3' untranslated region. We demonstrate that each of the exons contains determinants which control c-myc mRNA expression. Moreover, we show that in the liver, c-myc exon 2 sequences are able to down-regulate an otherwise stable H-2K mRNA when embedded within it and to induce its transient accumulation after cycloheximide treatment and soon after liver ablation. Finally, the use of transgenes with different coding capacities has allowed us to postulate that the primary mRNA sequence itself and not c-Myc peptides is an important component of c-myc posttranscriptional regulation. PMID:7623834

  15. cncRNAs: Bi-functional RNAs with protein coding and non-coding functions

    PubMed Central

    Kumari, Pooja; Sampath, Karuna

    2015-01-01

    For many decades, the major function of mRNA was thought to be to provide protein-coding information embedded in the genome. The advent of high-throughput sequencing has led to the discovery of pervasive transcription of eukaryotic genomes and opened the world of RNA-mediated gene regulation. Many regulatory RNAs have been found to be incapable of protein coding and are hence termed as non-coding RNAs (ncRNAs). However, studies in recent years have shown that several previously annotated non-coding RNAs have the potential to encode proteins, and conversely, some coding RNAs have regulatory functions independent of the protein they encode. Such bi-functional RNAs, with both protein coding and non-coding functions, which we term as ‘cncRNAs’, have emerged as new players in cellular systems. Here, we describe the functions of some cncRNAs identified from bacteria to humans. Because the functions of many RNAs across genomes remains unclear, we propose that RNAs be classified as coding, non-coding or both only after careful analysis of their functions. PMID:26498036

  16. Quantitative Proteomic Analysis of Optimal Cutting Temperature (OCT) Embedded Core-Needle Biopsy of Lung Cancer

    NASA Astrophysics Data System (ADS)

    Zhao, Xiaozheng; Huffman, Kenneth E.; Fujimoto, Junya; Canales, Jamie Rodriguez; Girard, Luc; Nie, Guangjun; Heymach, John V.; Wistuba, Igacio I.; Minna, John D.; Yu, Yonghao

    2017-10-01

    With recent advances in understanding the genomic underpinnings and oncogenic drivers of pathogenesis in different subtypes, it is increasingly clear that proper pretreatment diagnostics are essential for the choice of appropriate treatment options for non-small cell lung cancer (NSCLC). Tumor tissue preservation in optimal cutting temperature (OCT) compound is commonly used in the surgical suite. However, proteins recovered from OCT-embedded specimens pose a challenge for LC-MS/MS experiments, due to the large amounts of polymers present in OCT. Here we present a simple workflow for whole proteome analysis of OCT-embedded NSCLC tissue samples, which involves a simple trichloroacetic acid precipitation step. Comparisons of protein recovery between frozen versus OCT-embedded tissue showed excellent consistency with more than 9200 proteins identified. Using an isobaric labeling strategy, we quantified more than 5400 proteins in tumor versus normal OCT-embedded core needle biopsy samples. Gene ontology analysis indicated that a number of proliferative as well as squamous cell carcinoma (SqCC) marker proteins were overexpressed in the tumor, consistent with the patient's pathology based diagnosis of "poorly differentiated SqCC". Among the most downregulated proteins in the tumor sample, we noted a number of proteins with potential immunomodulatory functions. Finally, interrogation of the aberrantly expressed proteins using a candidate approach and cross-referencing with publicly available databases led to the identification of potential druggable targets in DNA replication and DNA damage repair pathways. We conclude that our approach allows LC-MS/MS proteomic analyses on OCT-embedded lung cancer specimens, opening the way to bring powerful proteomics into the clinic. [Figure not available: see fulltext.

  17. Accuracy of Protein Embedding Potentials: An Analysis in Terms of Electrostatic Potentials.

    PubMed

    Olsen, Jógvan Magnus Haugaard; List, Nanna Holmgaard; Kristensen, Kasper; Kongsted, Jacob

    2015-04-14

    Quantum-mechanical embedding methods have in recent years gained significant interest and may now be applied to predict a wide range of molecular properties calculated at different levels of theory. To reach a high level of accuracy in embedding methods, both the electronic structure model of the active region and the embedding potential need to be of sufficiently high quality. In fact, failures in quantum mechanics/molecular mechanics (QM/MM)-based embedding methods have often been associated with the QM/MM methodology itself; however, in many cases the reason for such failures is due to the use of an inaccurate embedding potential. In this paper, we investigate in detail the quality of the electronic component of embedding potentials designed for calculations on protein biostructures. We show that very accurate explicitly polarizable embedding potentials may be efficiently designed using fragmentation strategies combined with single-fragment ab initio calculations. In fact, due to the self-interaction error in Kohn-Sham density functional theory (KS-DFT), use of large full-structure quantum-mechanical calculations based on conventional (hybrid) functionals leads to less accurate embedding potentials than fragment-based approaches. We also find that standard protein force fields yield poor embedding potentials, and it is therefore not advisable to use such force fields in general QM/MM-type calculations of molecular properties other than energies and structures.

  18. Subjunctive and Sequence of Tense in Three Varieties of Spanish: Corpus and Experimental Studies of Change in Progress

    ERIC Educational Resources Information Center

    Guajardo, Gustavo

    2017-01-01

    Spanish generally shows a Sequence of Tense (SOT) phenomenon in subjunctive clauses: the tense of the embedded clause (present or past) must agree with the tense of the matrix clause. It has been reported, however, that one kind of violation sometimes occurs, in which a present tense subjunctive clause is embedded under a past tense matrix clause…

  19. Chromatin Heterogeneity and Distribution of Regulatory Elements in the Late-Replicating Intercalary Heterochromatin Domains of Drosophila melanogaster Chromosomes

    PubMed Central

    Khoroshko, Varvara A.; Levitsky, Viktor G.; Zykova, Tatyana Yu.; Antonenko, Oksana V.; Belyaeva, Elena S.; Zhimulev, Igor F.

    2016-01-01

    Late-replicating domains (intercalary heterochromatin) in the Drosophila genome display a number of features suggesting their organization is quite unique. Typically, they are quite large and encompass clusters of functionally unrelated tissue-specific genes. They correspond to the topologically associating domains and conserved microsynteny blocks. Our study aims at exploring further details of molecular organization of intercalary heterochromatin and has uncovered surprising heterogeneity of chromatin composition in these regions. Using the 4HMM model developed in our group earlier, intercalary heterochromatin regions were found to host chromatin fragments with a particular epigenetic profile. Aquamarine chromatin fragments (spanning 0.67% of late-replicating regions) are characterized as a class of sequences that appear heterogeneous in terms of their decompactization. These fragments are enriched with enhancer sequences and binding sites for insulator proteins. They likely mark the chromatin state that is related to the binding of cis-regulatory proteins. Malachite chromatin fragments (11% of late-replicating regions) appear to function as universal transitional regions between two contrasting chromatin states. Namely, they invariably delimit intercalary heterochromatin regions from the adjacent active chromatin of interbands. Malachite fragments also flank aquamarine fragments embedded in the repressed chromatin of late-replicating regions. Significant enrichment of insulator proteins CP190, SU(HW), and MOD2.2 was observed in malachite chromatin. Neither aquamarine nor malachite chromatin types appear to correlate with the positions of highly conserved non-coding elements (HCNE) that are typically replete in intercalary heterochromatin. Malachite chromatin found on the flanks of intercalary heterochromatin regions tends to replicate earlier than the malachite chromatin embedded in intercalary heterochromatin. In other words, there exists a gradient of replication progressing from the flanks of intercalary heterochromatin regions center-wise. The peculiar organization and features of replication in large late-replicating regions are discussed as possible factors shaping the evolutionary stability of intercalary heterochromatin. PMID:27300486

  20. Real-time detection of BRAF V600E mutation from archival hairy cell leukemia FFPE tissue by nanopore sequencing.

    PubMed

    Vacca, Davide; Cancila, Valeria; Gulino, Alessandro; Lo Bosco, Giosuè; Belmonte, Beatrice; Di Napoli, Arianna; Florena, Ada Maria; Tripodo, Claudio; Arancio, Walter

    2018-02-01

    The MinION is a miniaturized high-throughput next generation sequencing platform of novel conception. The use of nucleic acids derived from formalin-fixed paraffin-embedded samples is highly desirable, but their adoption for molecular assays is hurdled by the high degree of fragmentation and by the chemical-induced mutations stemming from the fixation protocols. In order to investigate the suitability of MinION sequencing on formalin-fixed paraffin-embedded samples, the presence and frequency of BRAF c.1799T > A mutation was investigated in two archival tissue specimens of Hairy cell leukemia and Hairy cell leukemia Variant. Despite the poor quality of the starting DNA, BRAF mutation was successfully detected in the Hairy cell leukemia sample with around 50% of the reads obtained within 2 h of the sequencing start. Notably, the mutational burden of the Hairy cell leukemia sample as derived from nanopore sequencing proved to be comparable to a sensitive method for the detection of point mutations, namely the Digital PCR, using a validated assay. Nanopore sequencing can be adopted for targeted sequencing of genetic lesions on critical DNA samples such as those extracted from archival routine formalin-fixed paraffin-embedded samples. This result let speculating about the possibility that the nanopore sequencing could be trustably adopted for the real-time targeted sequencing of genetic lesions. Our report opens the window for the adoption of nanopore sequencing in molecular pathology for research and diagnostics.

  1. Germline PMS2 mutation screened by mismatch repair protein immunohistochemistry of colorectal cancer in Japan.

    PubMed

    Sugano, Kokichi; Nakajima, Takeshi; Sekine, Shigeki; Taniguchi, Hirokazu; Saito, Shinya; Takahashi, Masahiro; Ushiama, Mineko; Sakamoto, Hiromi; Yoshida, Teruhiko

    2016-11-01

    Germline PMS2 gene mutations were detected by RT-PCR/direct sequencing of total RNA extracted from puromycin-treated peripheral blood lymphocytes (PBL) and multiplex ligation-dependent probe amplification (MLPA) analyses of Japanese patients with colorectal cancer (CRC) fulfilling either the revised Bethesda Guidelines or being an age at disease onset of younger than 70 years, and screened by mismatch repair protein immunohistochemistry of formalin-fixed paraffin embedded sections. Of the 501 subjects examined, 7 (1.40%) showed the downregulated expression of the PMS2 protein alone and were referred to the genetic counseling clinic. Germline PMS2 mutations were detected in 6 (85.7%), including 3 nonsense and 1 frameshift mutations by RT-PCR/direct sequencing and 2 genomic deletions by MLPA. No mutations were identified in the other MMR genes (i.e. MSH2, MLH1 and MSH6). The prevalence of the downregulated expression of the PMS2 protein alone was 1.40% among the subjects examined and IHC results predicted the presence of PMS2 germline mutations. RT-PCR from puromycin-treated PBL and MLPA may be employed as the first screening step to detect PMS2 mutations without pseudogene interference, followed by the long-range PCR/nested PCR validation using genomic DNA. © 2016 The Authors. Cancer Science published by John Wiley & Sons Australia, Ltd on behalf of Japanese Cancer Association.

  2. LSB-based Steganography Using Reflected Gray Code for Color Quantum Images

    NASA Astrophysics Data System (ADS)

    Li, Panchi; Lu, Aiping

    2018-02-01

    At present, the classical least-significant-bit (LSB) based image steganography has been extended to quantum image processing. For the existing LSB-based quantum image steganography schemes, the embedding capacity is no more than 3 bits per pixel. Therefore, it is meaningful to study how to improve the embedding capacity of quantum image steganography. This work presents a novel LSB-based steganography using reflected Gray code for colored quantum images, and the embedding capacity of this scheme is up to 4 bits per pixel. In proposed scheme, the secret qubit sequence is considered as a sequence of 4-bit segments. For the four bits in each segment, the first bit is embedded in the second LSB of B channel of the cover image, and and the remaining three bits are embedded in LSB of RGB channels of each color pixel simultaneously using reflected-Gray code to determine the embedded bit from secret information. Following the transforming rule, the LSB of stego-image are not always same as the secret bits and the differences are up to almost 50%. Experimental results confirm that the proposed scheme shows good performance and outperforms the previous ones currently found in the literature in terms of embedding capacity.

  3. Minimum curvilinearity to enhance topological prediction of protein interactions by network embedding

    PubMed Central

    Cannistraci, Carlo Vittorio; Alanis-Lobato, Gregorio; Ravasi, Timothy

    2013-01-01

    Motivation: Most functions within the cell emerge thanks to protein–protein interactions (PPIs), yet experimental determination of PPIs is both expensive and time-consuming. PPI networks present significant levels of noise and incompleteness. Predicting interactions using only PPI-network topology (topological prediction) is difficult but essential when prior biological knowledge is absent or unreliable. Methods: Network embedding emphasizes the relations between network proteins embedded in a low-dimensional space, in which protein pairs that are closer to each other represent good candidate interactions. To achieve network denoising, which boosts prediction performance, we first applied minimum curvilinear embedding (MCE), and then adopted shortest path (SP) in the reduced space to assign likelihood scores to candidate interactions. Furthermore, we introduce (i) a new valid variation of MCE, named non-centred MCE (ncMCE); (ii) two automatic strategies for selecting the appropriate embedding dimension; and (iii) two new randomized procedures for evaluating predictions. Results: We compared our method against several unsupervised and supervisedly tuned embedding approaches and node neighbourhood techniques. Despite its computational simplicity, ncMCE-SP was the overall leader, outperforming the current methods in topological link prediction. Conclusion: Minimum curvilinearity is a valuable non-linear framework that we successfully applied to the embedding of protein networks for the unsupervised prediction of novel PPIs. The rationale for our approach is that biological and evolutionary information is imprinted in the non-linear patterns hidden behind the protein network topology, and can be exploited for predicting new protein links. The predicted PPIs represent good candidates for testing in high-throughput experiments or for exploitation in systems biology tools such as those used for network-based inference and prediction of disease-related functional modules. Availability: https://sites.google.com/site/carlovittoriocannistraci/home Contact: kalokagathos.agon@gmail.com or timothy.ravasi@kaust.edu.sa Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23812985

  4. KISS for STRAP: user extensions for a protein alignment editor.

    PubMed

    Gille, Christoph; Lorenzen, Stephan; Michalsky, Elke; Frömmel, Cornelius

    2003-12-12

    The Structural Alignment Program STRAP is a comfortable comprehensive editor and analyzing tool for protein alignments. A wide range of functions related to protein sequences and protein structures are accessible with an intuitive graphical interface. Recent features include mapping of mutations and polymorphisms onto structures and production of high quality figures for publication. Here we address the general problem of multi-purpose program packages to keep up with the rapid development of bioinformatical methods and the demand for specific program functions. STRAP was remade implementing a novel design which aims at Keeping Interfaces in STRAP Simple (KISS). KISS renders STRAP extendable to bio-scientists as well as to bio-informaticians. Scientists with basic computer skills are capable of implementing statistical methods or embedding existing bioinformatical tools in STRAP themselves. For bio-informaticians STRAP may serve as an environment for rapid prototyping and testing of complex algorithms such as automatic alignment algorithms or phylogenetic methods. Further, STRAP can be applied as an interactive web applet to present data related to a particular protein family and as a teaching tool. JAVA-1.4 or higher. http://www.charite.de/bioinf/strap/

  5. SLLE for predicting membrane protein types.

    PubMed

    Wang, Meng; Yang, Jie; Xu, Zhi-Jie; Chou, Kuo-Chen

    2005-01-07

    Introduction of the concept of pseudo amino acid composition (PROTEINS: Structure, Function, and Genetics 43 (2001) 246; Erratum: ibid. 44 (2001) 60) has made it possible to incorporate a considerable amount of sequence-order effects by representing a protein sample in terms of a set of discrete numbers, and hence can significantly enhance the prediction quality of membrane protein type. As a continuous effort along such a line, the Supervised Locally Linear Embedding (SLLE) technique for nonlinear dimensionality reduction is introduced (Science 22 (2000) 2323). The advantage of using SLLE is that it can reduce the operational space by extracting the essential features from the high-dimensional pseudo amino acid composition space, and that the cluster-tolerant capacity can be increased accordingly. As a consequence by combining these two approaches, high success rates have been observed during the tests of self-consistency, jackknife and independent data set, respectively, by using the simplest nearest neighbour classifier. The current approach represents a new strategy to deal with the problems of protein attribute prediction, and hence may become a useful vehicle in the area of bioinformatics and proteomics.

  6. Imperfect duplicate insertions type of mutations in plasmepsin V modulates binding properties of PEXEL motifs of export proteins in Indian Plasmodium vivax.

    PubMed

    Rawat, Manmeet; Vijay, Sonam; Gupta, Yash; Tiwari, Pramod Kumar; Sharma, Arun

    2013-01-01

    Plasmepsin V (PM-V) have functionally conserved orthologues across the Plasmodium genus who's binding and antigenic processing at the PEXEL motifs for export about 200-300 essential proteins is important for the virulence and viability of the causative Plasmodium species. This study was undertaken to determine P. vivax plasmepsin V Ind (PvPM-V-Ind) PEXEL motif export pathway for pathogenicity-related proteins/antigens export thereby altering plasmodium exportome during erythrocytic stages. We identify and characterize Plasmodium vivax plasmepsin-V-Ind (mutant) gene by cloning, sequence analysis, in silico bioinformatic protocols and structural modeling predictions based on docking studies on binding capacity with PEXEL motifs processing in terms of binding and accessibility of export proteins. Cloning and sequence analysis for genetic diversity demonstrates PvPM-V-Ind (mutant) gene is highly conserved among all isolates from different geographical regions of India. Imperfect duplicate insertion types of mutations (SVSE from 246-249 AA and SLSE from 266-269 AA) were identified among all Indian isolates in comparison to P.vivax Sal-1 (PvPM-V-Sal 1) isolate. In silico bioinformatics interaction studies of PEXEL peptide and active enzyme reveal that PvPM-V-Ind (mutant) is only active in endoplasmic reticulum lumen and membrane embedding is essential for activation of plasmepsin V. Structural modeling predictions based on docking studies with PEXEL motif show significant variation in substrate protein binding of these imperfect mutations with data mined PEXEL sequences. The predicted variation in the docking score and interacting amino acids of PvPM-V-Ind (mutant) proteins with PEXEL and lopinavir suggests a modulation in the activity of PvPM-V in terms of binding and accessibility at these sites. Our functional modeled validation of PvPM-V-Ind (mutant) imperfect duplicate insertions with data mined PEXEL sequences leading to altered binding and substrate accessibility of the enzyme makes it a plausible target to investigate export mechanisms for in silico virtual screening and novel pharmacophore designing.

  7. Imperfect Duplicate Insertions Type of Mutations in Plasmepsin V Modulates Binding Properties of PEXEL Motifs of Export Proteins in Indian Plasmodium vivax

    PubMed Central

    Rawat, Manmeet; Vijay, Sonam; Gupta, Yash; Tiwari, Pramod Kumar; Sharma, Arun

    2013-01-01

    Introduction Plasmepsin V (PM-V) have functionally conserved orthologues across the Plasmodium genus who's binding and antigenic processing at the PEXEL motifs for export about 200–300 essential proteins is important for the virulence and viability of the causative Plasmodium species. This study was undertaken to determine P. vivax plasmepsin V Ind (PvPM-V-Ind) PEXEL motif export pathway for pathogenicity-related proteins/antigens export thereby altering plasmodium exportome during erythrocytic stages. Method We identify and characterize Plasmodium vivax plasmepsin-V-Ind (mutant) gene by cloning, sequence analysis, in silico bioinformatic protocols and structural modeling predictions based on docking studies on binding capacity with PEXEL motifs processing in terms of binding and accessibility of export proteins. Results Cloning and sequence analysis for genetic diversity demonstrates PvPM-V-Ind (mutant) gene is highly conserved among all isolates from different geographical regions of India. Imperfect duplicate insertion types of mutations (SVSE from 246–249 AA and SLSE from 266–269 AA) were identified among all Indian isolates in comparison to P.vivax Sal-1 (PvPM-V-Sal 1) isolate. In silico bioinformatics interaction studies of PEXEL peptide and active enzyme reveal that PvPM-V-Ind (mutant) is only active in endoplasmic reticulum lumen and membrane embedding is essential for activation of plasmepsin V. Structural modeling predictions based on docking studies with PEXEL motif show significant variation in substrate protein binding of these imperfect mutations with data mined PEXEL sequences. The predicted variation in the docking score and interacting amino acids of PvPM-V-Ind (mutant) proteins with PEXEL and lopinavir suggests a modulation in the activity of PvPM-V in terms of binding and accessibility at these sites. Conclusion/Significance Our functional modeled validation of PvPM-V-Ind (mutant) imperfect duplicate insertions with data mined PEXEL sequences leading to altered binding and substrate accessibility of the enzyme makes it a plausible target to investigate export mechanisms for in silico virtual screening and novel pharmacophore designing. PMID:23555891

  8. Computational methods using weighed-extreme learning machine to predict protein self-interactions with protein evolutionary information.

    PubMed

    An, Ji-Yong; Zhang, Lei; Zhou, Yong; Zhao, Yu-Jun; Wang, Da-Fu

    2017-08-18

    Self-interactions Proteins (SIPs) is important for their biological activity owing to the inherent interaction amongst their secondary structures or domains. However, due to the limitations of experimental Self-interactions detection, one major challenge in the study of prediction SIPs is how to exploit computational approaches for SIPs detection based on evolutionary information contained protein sequence. In the work, we presented a novel computational approach named WELM-LAG, which combined the Weighed-Extreme Learning Machine (WELM) classifier with Local Average Group (LAG) to predict SIPs based on protein sequence. The major improvement of our method lies in presenting an effective feature extraction method used to represent candidate Self-interactions proteins by exploring the evolutionary information embedded in PSI-BLAST-constructed position specific scoring matrix (PSSM); and then employing a reliable and robust WELM classifier to carry out classification. In addition, the Principal Component Analysis (PCA) approach is used to reduce the impact of noise. The WELM-LAG method gave very high average accuracies of 92.94 and 96.74% on yeast and human datasets, respectively. Meanwhile, we compared it with the state-of-the-art support vector machine (SVM) classifier and other existing methods on human and yeast datasets, respectively. Comparative results indicated that our approach is very promising and may provide a cost-effective alternative for predicting SIPs. In addition, we developed a freely available web server called WELM-LAG-SIPs to predict SIPs. The web server is available at http://219.219.62.123:8888/WELMLAG/ .

  9. Protein extraction from methanol fixed paraffin embedded tissue blocks: A new possibility using cell blocks

    PubMed Central

    Kokkat, Theresa J.; McGarvey, Diane; Patel, Miral S.; Tieniber, Andrew D.; LiVolsi, Virginia A.; Baloch, Zubair W.

    2013-01-01

    Background: Methanol fixed and paraffin embedded (MFPE) cellblocks are an essential cytology preparation. However, MFPE cellblocks often contain limited material and their relatively small size has caused them to be overlooked in biomarker discovery. Advances in the field of molecular biotechnology have made it possible to extract proteins from formalin fixed and paraffin embedded (FFPE) tissue blocks. In contrast, there are no established methods for extracting proteins from MFPE cellblocks. We investigated commonly available CHAPS (3-[(3-cholamidopropyl) dimethylammonio]-1-propanesulfonate) buffer, as well as two commercially available Qiagen® kits and compared their effectiveness on MFPE tissue for protein yields. Materials and Methods: MFPE blocks were made by Cellient™ automated system using human tissue specimens from normal and malignant specimens collected in ThinPrep™ Vials. Protein was extracted from Cellient-methanol fixed and paraffin embedded blocks with CHAPS buffer method as well as FFPE and Mammalian Qiagen® kits. Results: Comparison of protein yields demonstrated the effectiveness of various protein extraction methods on MFPE cellblocks. Conclusion: In the current era of minimally invasive techniques to obtain minimal amount of tissue for diagnostic and prognostic purposes, the use of commercial and lab made buffer on low weight MFPE scrapings obtained by Cellient® processor opens new possibilities for protein biomarker research. PMID:24403950

  10. Russell body inducing threshold depends on the variable domain sequences of individual human IgG clones and the cellular protein homeostasis.

    PubMed

    Stoops, Janelle; Byrd, Samantha; Hasegawa, Haruki

    2012-10-01

    Russell bodies are intracellular aggregates of immunoglobulins. Although the mechanism of Russell body biogenesis has been extensively studied by using truncated mutant heavy chains, the importance of the variable domain sequences in this process and in immunoglobulin biosynthesis remains largely unknown. Using a panel of structurally and functionally normal human immunoglobulin Gs, we show that individual immunoglobulin G clones possess distinctive Russell body inducing propensities that can surface differently under normal and abnormal cellular conditions. Russell body inducing predisposition unique to each immunoglobulin G clone was corroborated by the intrinsic physicochemical properties encoded in the heavy chain variable domain/light chain variable domain sequence combinations that define each immunoglobulin G clone. While the sequence based intrinsic factors predispose certain immunoglobulin G clones to be more prone to induce Russell bodies, extrinsic factors such as stressful cell culture conditions also play roles in unmasking Russell body propensity from immunoglobulin G clones that are normally refractory to developing Russell bodies. By taking advantage of heterologous expression systems, we dissected the roles of individual subunit chains in Russell body formation and examined the effect of non-cognate subunit chain pair co-expression on Russell body forming propensity. The results suggest that the properties embedded in the variable domain of individual light chain clones and their compatibility with the partnering heavy chain variable domain sequences underscore the efficiency of immunoglobulin G biosynthesis, the threshold for Russell body induction, and the level of immunoglobulin G secretion. We propose that an interplay between the unique properties encoded in variable domain sequences and the state of protein homeostasis determines whether an immunoglobulin G expressing cell will develop the Russell body phenotype in a dynamic cellular setting. Copyright © 2012 Elsevier B.V. All rights reserved.

  11. AP1 Keeps Chromatin Poised for Action | Center for Cancer Research

    Cancer.gov

    The human genome harbors gene-encoding DNA, the blueprint for building proteins that regulate cellular function. Embedded across the genome, in non-coding regions, are DNA elements to which regulatory factors bind. The interaction of regulatory factors with DNA at these sites modifies gene expression to modulate cell activity. In cells, DNA exists in a complex with proteins called chromatin that compacts the DNA in the nucleus, strongly restricting access to DNA sequences. As a result, regulatory factors only interact with a small subset of their potential binding elements in a given cell to regulate genes. How factors recognize and select sites in chromatin across the genome is not well understood -- but several discoveries in CCR’s Laboratory of Receptor Biology and Gene Expression (LRBGE) have shed light on the mechanisms that direct factors to DNA.

  12. The Quality of the Embedding Potential Is Decisive for Minimal Quantum Region Size in Embedding Calculations: The Case of the Green Fluorescent Protein.

    PubMed

    Nåbo, Lina J; Olsen, Jógvan Magnus Haugaard; Martínez, Todd J; Kongsted, Jacob

    2017-12-12

    The calculation of spectral properties for photoactive proteins is challenging because of the large cost of electronic structure calculations on large systems. Mixed quantum mechanical (QM) and molecular mechanical (MM) methods are typically employed to make such calculations computationally tractable. This study addresses the connection between the minimal QM region size and the method used to model the MM region in the calculation of absorption properties-here exemplified for calculations on the green fluorescent protein. We find that polarizable embedding is necessary for a qualitatively correct description of the MM region, and that this enables the use of much smaller QM regions compared to fixed charge electrostatic embedding. Furthermore, absorption intensities converge very slowly with system size and inclusion of effective external field effects in the MM region through polarizabilities is therefore very important. Thus, this embedding scheme enables accurate prediction of intensities for systems that are too large to be treated fully quantum mechanically.

  13. Multistep modeling of protein structure: application to bungarotoxin

    NASA Technical Reports Server (NTRS)

    Srinivasan, S.; Shibata, M.; Rein, R.

    1986-01-01

    Modelling of bungarotoxin in atomic details is presented in this article. The model-building procedure utilizes the low-resolution crystal coordinates of the c-alpha atoms of bungarotoxin, sequence homology within the neurotoxin family, as well as high-resolution x-ray diffraction data of cobratoxin and erabutoxin. Our model-building procedure involves: (a) principles of comparative modelling, (b) embedding procedures of distance geometry, and (c) use of molecular mechanics for optimizing packing. The model is not only consistent with the c-alpha coordinates of crystal structure, but also agrees with solution conformational features of the triple-stranded beta sheet as observed by NOE measurements.

  14. Simrank: Rapid and sensitive general-purpose k-mer search tool

    PubMed Central

    2011-01-01

    Background Terabyte-scale collections of string-encoded data are expected from consortia efforts such as the Human Microbiome Project http://nihroadmap.nih.gov/hmp. Intra- and inter-project data similarity searches are enabled by rapid k-mer matching strategies. Software applications for sequence database partitioning, guide tree estimation, molecular classification and alignment acceleration have benefited from embedded k-mer searches as sub-routines. However, a rapid, general-purpose, open-source, flexible, stand-alone k-mer tool has not been available. Results Here we present a stand-alone utility, Simrank, which allows users to rapidly identify database strings the most similar to query strings. Performance testing of Simrank and related tools against DNA, RNA, protein and human-languages found Simrank 10X to 928X faster depending on the dataset. Conclusions Simrank provides molecular ecologists with a high-throughput, open source choice for comparing large sequence sets to find similarity. PMID:21524302

  15. Effective Design of Multifunctional Peptides by Combining Compatible Functions

    PubMed Central

    Diener, Christian; Garza Ramos Martínez, Georgina; Moreno Blas, Daniel; Castillo González, David A.; Corzo, Gerardo; Castro-Obregon, Susana; Del Rio, Gabriel

    2016-01-01

    Multifunctionality is a common trait of many natural proteins and peptides, yet the rules to generate such multifunctionality remain unclear. We propose that the rules defining some protein/peptide functions are compatible. To explore this hypothesis, we trained a computational method to predict cell-penetrating peptides at the sequence level and learned that antimicrobial peptides and DNA-binding proteins are compatible with the rules of our predictor. Based on this finding, we expected that designing peptides for CPP activity may render AMP and DNA-binding activities. To test this prediction, we designed peptides that embedded two independent functional domains (nuclear localization and yeast pheromone activity), linked by optimizing their composition to fit the rules characterizing cell-penetrating peptides. These peptides presented effective cell penetration, DNA-binding, pheromone and antimicrobial activities, thus confirming the effectiveness of our computational approach to design multifunctional peptides with potential therapeutic uses. Our computational implementation is available at http://bis.ifc.unam.mx/en/software/dcf. PMID:27096600

  16. Embedded CMOS basecalling for nanopore DNA sequencing.

    PubMed

    Chengjie Wang; Junli Zheng; Magierowski, Sebastian; Ghafar-Zadeh, Ebrahim

    2016-08-01

    DNA sequencing based on nanopore sensors is now entering the marketplace. The ability to interface this technology to established CMOS microelectronics promises significant improvements in functionality and miniaturization. Among the key functions to benefit from this interface will be basecalling, the conversion of raw electronic molecular signatures to nucleotide sequence predictions. This paper presents the design and performance potential of custom CMOS base-callers embedded alongside nanopore sensors. A basecalliing architecture implemented in 32-nm technology is discussed with the ability to process the equivalent of 20 human genomes per day in real-time at a power density of 5 W/cm2 assuming a 3-mer nanopore sensor.

  17. Ser/Thr Motifs in Transmembrane Proteins: Conservation Patterns and Effects on Local Protein Structure and Dynamics

    PubMed Central

    del Val, Coral; White, Stephen H.

    2014-01-01

    We combined systematic bioinformatics analyses and molecular dynamics simulations to assess the conservation patterns of Ser and Thr motifs in membrane proteins, and the effect of such motifs on the structure and dynamics of α-helical transmembrane (TM) segments. We find that Ser/Thr motifs are often present in β-barrel TM proteins. At least one Ser/Thr motif is present in almost half of the sequences of α-helical proteins analyzed here. The extensive bioinformatics analyses and inspection of protein structures led to the identification of molecular transporters with noticeable numbers of Ser/Thr motifs within the TM region. Given the energetic penalty for burying multiple Ser/Thr groups in the membrane hydrophobic core, the observation of transporters with multiple membrane-embedded Ser/Thr is intriguing and raises the question of how the presence of multiple Ser/Thr affects protein local structure and dynamics. Molecular dynamics simulations of four different Ser-containing model TM peptides indicate that backbone hydrogen bonding of membrane-buried Ser/Thr hydroxyl groups can significantly change the local structure and dynamics of the helix. Ser groups located close to the membrane interface can hydrogen bond to solvent water instead of protein backbone, leading to an enhanced local solvation of the peptide. PMID:22836667

  18. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cournia, Zoe; Allen, Toby W.; Andricioaei, Ioan

    It is fundamental for the flourishing biological cells that membrane proteins mediate the process. Membrane-embedded transporters move ions and larger solutes across membranes; receptors mediate communication between the cell and its environment and membrane-embedded enzymes catalyze chemical reactions. Understanding these mechanisms of action requires knowledge of how the proteins couple to their fluid, hydrated lipid membrane environment. Here, we present here current studies in computational and experimental membrane protein biophysics, and show how they address outstanding challenges in understanding the complex environmental effects on the structure, function, and dynamics of membrane proteins.

  19. A pH-sensitive red fluorescent protein compatible with hydrophobic resin embedding

    NASA Astrophysics Data System (ADS)

    Guo, Wenyan; Gang, Yadong; Liu, Xiuli; Zhou, Hongfu; Zeng, Shaoqun

    2017-02-01

    pH sensitive fluorescent proteins enabling chemical reactivation in resin are useful tools for fluorescence microimaging. EYFP or EGFP improved from GFP in jellyfish are good for such applications. For simultaneous two-color imaging, a suitable red fluorescent protein is of urgent need. Here a pH sensitive red fluorescent protein, pHuji, is selected and verified to be compatible with hydrophobic resin embedding and thus may be promising for dual-colour chemical reactivation imaging in conjunction with EGFP or EYFP.

  20. Atom Probe Tomographic Mapping Directly Reveals the Atomic Distribution of Phosphorus in Resin Embedded Ferritin

    NASA Astrophysics Data System (ADS)

    Perea, Daniel E.; Liu, Jia; Bartrand, Jonah; Dicken, Quinten; Thevuthasan, S. Theva; Browning, Nigel D.; Evans, James E.

    2016-02-01

    Here we report the atomic-scale analysis of biological interfaces within the ferritin protein using atom probe tomography that is facilitated by an advanced specimen preparation approach. Embedding ferritin in an organic polymer resin lacking nitrogen provided chemical contrast to visualise atomic distributions and distinguish the inorganic-organic interface of the ferrihydrite mineral core and protein shell, as well as the organic-organic interface between the ferritin protein shell and embedding resin. In addition, we definitively show the atomic-scale distribution of phosphorus as being at the surface of the ferrihydrite mineral with the distribution of sodium mapped within the protein shell environment with an enhanced distribution at the mineral/protein interface. The sample preparation method is robust and can be directly extended to further enhance the study of biological, organic and inorganic nanomaterials relevant to health, energy or the environment.

  1. Atom Probe Tomographic Mapping Directly Reveals the Atomic Distribution of Phosphorus in Resin Embedded Ferritin

    PubMed Central

    Perea, Daniel E.; Liu, Jia; Bartrand, Jonah; Dicken, Quinten; Thevuthasan, S. Theva; Browning, Nigel D.; Evans, James E.

    2016-01-01

    Here we report the atomic-scale analysis of biological interfaces within the ferritin protein using atom probe tomography that is facilitated by an advanced specimen preparation approach. Embedding ferritin in an organic polymer resin lacking nitrogen provided chemical contrast to visualise atomic distributions and distinguish the inorganic-organic interface of the ferrihydrite mineral core and protein shell, as well as the organic-organic interface between the ferritin protein shell and embedding resin. In addition, we definitively show the atomic-scale distribution of phosphorus as being at the surface of the ferrihydrite mineral with the distribution of sodium mapped within the protein shell environment with an enhanced distribution at the mineral/protein interface. The sample preparation method is robust and can be directly extended to further enhance the study of biological, organic and inorganic nanomaterials relevant to health, energy or the environment. PMID:26924804

  2. Intermolecular detergent-membrane protein noes for the characterization of the dynamics of membrane protein-detergent complexes.

    PubMed

    Eichmann, Cédric; Orts, Julien; Tzitzilonis, Christos; Vögeli, Beat; Smrt, Sean; Lorieau, Justin; Riek, Roland

    2014-12-11

    The interaction between membrane proteins and lipids or lipid mimetics such as detergents is key for the three-dimensional structure and dynamics of membrane proteins. In NMR-based structural studies of membrane proteins, qualitative analysis of intermolecular nuclear Overhauser enhancements (NOEs) or paramagnetic resonance enhancement are used in general to identify the transmembrane segments of a membrane protein. Here, we employed a quantitative characterization of intermolecular NOEs between (1)H of the detergent and (1)H(N) of (2)H-perdeuterated, (15)N-labeled α-helical membrane protein-detergent complexes following the exact NOE (eNOE) approach. Structural considerations suggest that these intermolecular NOEs should show a helical-wheel-type behavior along a transmembrane helix or a membrane-attached helix within a membrane protein as experimentally demonstrated for the complete influenza hemagglutinin fusion domain HAfp23. The partial absence of such a NOE pattern along the amino acid sequence as shown for a truncated variant of HAfp23 and for the Escherichia coli inner membrane protein YidH indicates the presence of large tertiary structure fluctuations such as an opening between helices or the presence of large rotational dynamics of the helices. Detergent-protein NOEs thus appear to be a straightforward probe for a qualitative characterization of structural and dynamical properties of membrane proteins embedded in detergent micelles.

  3. SH2 Domains Recognize Contextual Peptide Sequence Information to Determine Selectivity*

    PubMed Central

    Liu, Bernard A.; Jablonowski, Karl; Shah, Eshana E.; Engelmann, Brett W.; Jones, Richard B.; Nash, Piers D.

    2010-01-01

    Selective ligand recognition by modular protein interaction domains is a primary determinant of specificity in signaling pathways. Src homology 2 (SH2) domains fulfill this capacity immediately downstream of tyrosine kinases, acting to recruit their host polypeptides to ligand proteins harboring phosphorylated tyrosine residues. The degree to which SH2 domains are selective and the mechanisms underlying selectivity are fundamental to understanding phosphotyrosine signaling networks. An examination of interactions between 50 SH2 domains and a set of 192 phosphotyrosine peptides corresponding to physiological motifs within FGF, insulin, and IGF-1 receptor pathways indicates that individual SH2 domains have distinct recognition properties and exhibit a remarkable degree of selectivity beyond that predicted by previously described binding motifs. The underlying basis for such selectivity is the ability of SH2 domains to recognize both permissive amino acid residues that enhance binding and non-permissive amino acid residues that oppose binding in the vicinity of the essential phosphotyrosine. Neighboring positions affect one another so local sequence context matters to SH2 domains. This complex linguistics allows SH2 domains to distinguish subtle differences in peptide ligands. This newly appreciated contextual dependence substantially increases the accessible information content embedded in the peptide ligands that can be effectively integrated to determine binding. This concept may serve more broadly as a paradigm for subtle recognition of physiological ligands by protein interaction domains. PMID:20627867

  4. A dual host vector for Fab phage display and expression of native IgG in mammalian cells.

    PubMed

    Tesar, Devin; Hötzel, Isidro

    2013-10-01

    A significant bottleneck in antibody discovery by phage display is the transfer of immunoglobulin variable regions from phage clones to vectors that express immunoglobulin G (IgG) in mammalian cells for screening. Here, we describe a novel phagemid vector for Fab phage display that allows expression of native IgG in mammalian cells without sub-cloning. The vector uses an optimized mammalian signal sequence that drives robust expression of Fab fragments fused to an M13 phage coat protein in Escherichia coli and IgG expression in mammalian cells. To allow the expression of Fab fragments fused to a phage coat protein in E.coli and full-length IgG in mammalian cells from the same vector without sub-cloning, the sequence encoding the phage coat protein was embedded in an optimized synthetic intron within the immunoglobulin heavy chain gene. This intron is removed from transcripts in mammalian cells by RNA splicing. Using this vector, we constructed a synthetic Fab phage display library with diversity in the heavy chain only and selected for clones binding different antigens. Co-transfection of mammalian cells with DNA from individual phage clones and a plasmid expressing the invariant light chain resulted in the expression of native IgG that was used to assay affinity, ligand blocking activity and specificity.

  5. Polyhedron-like inclusion body formation by a mutant nucleopolyhedrovirus expressing the granulin gene from a granulovirus.

    PubMed

    Zhou, C E; Ko, R; Maeda, S

    1998-01-20

    The polyhedrin gene in Bombyx mori nucleopolyhedrovirus (BmNPV) was replaced with the granulin gene of Trichoplusia ni granulovirus (TnGV). The substitution was verified by Southern hybridization, and expression of granulin by the mutant virus, BmGran, was demonstrated by sodium dodecyl sulfate-polyacrylamide gel electrophoresis and by amino acid sequencing of the predominant protein of BmGran inclusion bodies (IBs). Light and electron microscopy examination of BmGran-infected B. mori and BmN cells revealed large, cuboidal, polyhedron-like IBs in the nucleus and cytoplasm, but granules were not seen. IBs contained small, parallel, electron-dense streaks, which defined the geometric pattern of crystallization. Geometric patterns of nuclear IBs were frequently disrupted by occlusion of polyhedron envelope fragments, resulting in IB instability and fracturing. Virions were not embedded in most of the polyhedron-like IBs, but accumulated with polyhedron envelope fragments. Some virions were coated with matrix protein and were partially wrapped by polyhedron envelope. These results suggested that (1) the amino acid sequence of granulin insufficient for determining IB morphology in TnGV-infected cells, and TnGV may have genes, not present in BmNPV, that control granule formation, and (2) interactions among the virion, the IB envelope, and the matrix protein may be important in virion occlusion and IB morphology and stability.

  6. A theoretical thermochemical study of solute-solvent dielectric effects in the displacement of codon-anticodon base pairs

    NASA Astrophysics Data System (ADS)

    Monajjemi, M.; Razavian, M. H.; Mollaamin, F.; Naderi, F.; Honarparvar, B.

    2008-12-01

    Quantum-chemical solvent effect theories describe the electronic structure of a molecular subsystem embedded in a solvent or other molecular environment. The solvation of biomolecules is important in molecular biology, since numerous processes involve proteins interacting in changing solvent-solute systems. In this theoretical study, we focus on mRNA-tRNA base pairs as a fundamental step in protein synthesis influenced by hydrogen bonding between two antiparallel trinucleotides, namely, the mRNA codon and tRNA anticodon. We use the mean reaction field theories, which describe electrostatic and polarization interactions between solute and solvent in the AAA, UUU, AAG, and UUC triplex sequences optimized in various solvent media such as water, dimethylsulfoxide, methanol, ethanol, and cyclopean using the self-consistent reaction field model. This process depends on either the reaction potential function of the solvent or charge transfer operators that appear in solute-solvent interaction. Because of codon and anticodon biological criteria, we performed nonempirical quantum-mechanical calculations at the BLYP and B3LYP/3-21G, 6-31G, and 6-31G* levels of theory in the gas phase and five solvents at three temperatures. Finally, to obtain more information, we calculated thermochemical parameters to find that the dielectric constant of solvents plays an important role in the displacement of amino acid sequences on codon-anticodon residues in proteins, which can cause some mutations in humans.

  7. Advances in High-Throughput Speed, Low-Latency Communication for Embedded Instrumentation (7th Annual SFAF Meeting, 2012)

    ScienceCinema

    Jordan, Scott

    2018-01-24

    Scott Jordan on "Advances in high-throughput speed, low-latency communication for embedded instrumentation" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  8. GCView: the genomic context viewer for protein homology searches

    PubMed Central

    Grin, Iwan; Linke, Dirk

    2011-01-01

    Genomic neighborhood can provide important insights into evolution and function of a protein or gene. When looking at operons, changes in operon structure and composition can only be revealed by looking at the operon as a whole. To facilitate the analysis of the genomic context of a query in multiple organisms we have developed Genomic Context Viewer (GCView). GCView accepts results from one or multiple protein homology searches such as BLASTp as input. For each hit, the neighboring protein-coding genes are extracted, the regions of homology are labeled for each input and the results are presented as a clear, interactive graphical output. It is also possible to add more searches to iteratively refine the output. GCView groups outputs by the hits for different proteins. This allows for easy comparison of different operon compositions and structures. The tool is embedded in the framework of the Bioinformatics Toolkit of the Max-Planck Institute for Developmental Biology (MPI Toolkit). Job results from the homology search tools inside the MPI Toolkit can be forwarded to GCView and results can be subsequently analyzed by sequence analysis tools. Results are stored online, allowing for later reinspection. GCView is freely available at http://toolkit.tuebingen.mpg.de/gcview. PMID:21609955

  9. Identification and Characterization of Novel Chitin-Binding Proteins from the Larval Cuticle of Silkworm, Bombyx mori.

    PubMed

    Dong, Zhaoming; Zhang, Weiwei; Zhang, Yan; Zhang, Xiaolu; Zhao, Ping; Xia, Qingyou

    2016-05-06

    Cuticle is mainly made of chitin filaments embedded in a matrix of cuticular proteins (CPs). Cuticular chitins have minor differences, whereas CPs are widely variable with respect to their sequences and structures. To understand the molecular basis underlying the mechanical properties of cuticle, it is necessary to know which CPs interact with chitin and how they are assembled into the cuticle structure. In the present study, a chitin-binding assay was performed followed by liquid chromatography-tandem mass spectrometry to identify the extracted proteins from the larval cuticle of silkworm, Bombyx mori. There were 463 proteins identified from the silkworm larval cuticle, 200 of which were recovered in the chitin-binding fraction. A total of 103 proteins were annotated as CPs, which were classified into 11 CP families based on their conserved motifs, including CPR, CPAP, CPT, CPF and CPFL, CPCFC, chitin_bind 3, BmCPH2 homologues, BmCPH9 homologues, BmCPG1 homologues, BmCPG20 homologues, and BmCPG21 homologues. A total of five CP families were newly identified in the chitin-binding fraction, thereby providing new information and insight into the composition, structure, and function of the silkworm larval cuticle.

  10. Ultrastructural identification of peripheral myelin proteins by a pre-embedding immunogold labeling method.

    PubMed

    Canron, Marie-Hélène; Bouillot, Sandrine; Favereaux, Alexandre; Petry, Klaus G; Vital, Anne

    2003-03-01

    Ultrastructural immunolabeling of peripheral nervous system components is an important tool to study the relation between structure and function. Owing to the scarcity of certain antigens and the dense structure of the peripheral nerve, a pre-embedding technique is likely appropriate. After several investigations on procedures for pre-embedding immunolabeling, we propose a method that offers a good compromise between detection of antigenic sites and preservation of morphology at the ultrastructural level, and that is easy to use and suitable for investigations on peripheral nerve biopsies from humans. Pre-fixation by immersion in paraformaldehyde/glutaraldehyde is necessary to stabilize the ultrastructure. Then, ultrasmall gold particles with silver enhancement are advised. Antibodies against myelin protein zero and myelin basic protein were chosen for demonstration. The same technique was applied to localize a 35 kDa myelin protein.

  11. Membrane Protein Structure, Function, and Dynamics: a Perspective from Experiments and Theory

    DOE PAGES

    Cournia, Zoe; Allen, Toby W.; Andricioaei, Ioan; ...

    2015-06-11

    It is fundamental for the flourishing biological cells that membrane proteins mediate the process. Membrane-embedded transporters move ions and larger solutes across membranes; receptors mediate communication between the cell and its environment and membrane-embedded enzymes catalyze chemical reactions. Understanding these mechanisms of action requires knowledge of how the proteins couple to their fluid, hydrated lipid membrane environment. Here, we present here current studies in computational and experimental membrane protein biophysics, and show how they address outstanding challenges in understanding the complex environmental effects on the structure, function, and dynamics of membrane proteins.

  12. Enhancement of object-permanence performance in the Down's syndrome infant.

    PubMed

    Morss, J R

    1984-01-01

    Four infants with Down's syndrome (aged 19-33 months) were presented with a restructured version of an object-permanence task. Restructuring consisted of the embedding of single trials of the task within a sequence of simpler, related steps. Following failure on a standard presentation of the task, three Down's syndrome (DS) infants demonstrated success on trials embedded in the training sequence. Comparison was made with the performance of normal infants (aged 14-19 months) matched in terms of failure on the pre-test. Only two out of nine normal infants registered success on the embedded trials. Results are discussed in terms of the differences between the DS infant and the normal infant, and the former's reliance on the deliberate structuring of his learning environment by a parent or educator.

  13. Evaluation of Targeted Sequencing for Transcriptional Analysis of Archival Formalin-Fixed Paraffin-Embedded (FFPE) Samples

    EPA Science Inventory

    Next-generation sequencing provides unprecedented access to genomic information in archival FFPE tissue samples. However, costs and technical challenges related to RNA isolation and enrichment limit use of whole-genome RNA-sequencing for large-scale studies of FFPE specimens. Rec...

  14. Teaching Formulaic Sequences in an English-Language Class: The Effects of Explicit Instruction versus Coursebook Instruction

    ERIC Educational Resources Information Center

    Le-Thi, Duyen; Rodgers, Michael P. H.; Pellicer-Sánchez, Ana

    2017-01-01

    This study investigates the relative effectiveness of different teaching approaches on the learning of formulaic sequences. Three comparisons were made in this study: the effects of explicit teaching of formulaic sequences versus teaching embedded in traditional coursebook instruction, the effects of the degree of salience of the sequences in the…

  15. Active Spread-Spectrum Steganalysis for Hidden Data Extraction

    DTIC Science & Technology

    2011-09-01

    steganalysis. In particular, we aim to recover blindly se- cret data hidden in image hosts via (multi-signature) direct- sequence SS embedding [18]-[25...access (CDMA) communica- tion systems. Under the assumption that the embedded se- cret messages are independent identically distributed (i.i.d.) random

  16. Physical basis of some membrane shaping mechanisms

    PubMed Central

    2016-01-01

    In vesicular transport pathways, membrane proteins and lipids are internalized, externalized or transported within cells, not by bulk diffusion of single molecules, but embedded in the membrane of small vesicles or thin tubules. The formation of these ‘transport carriers’ follows sequential events: membrane bending, fission from the donor compartment, transport and eventually fusion with the acceptor membrane. A similar sequence is involved during the internalization of drug or gene carriers inside cells. These membrane-shaping events are generally mediated by proteins binding to membranes. The mechanisms behind these biological processes are actively studied both in the context of cell biology and biophysics. Bin/amphiphysin/Rvs (BAR) domain proteins are ideally suited for illustrating how simple soft matter principles can account for membrane deformation by proteins. We review here some experimental methods and corresponding theoretical models to measure how these proteins affect the mechanics and the shape of membranes. In more detail, we show how an experimental method employing optical tweezers to pull a tube from a giant vesicle may give important quantitative insights into the mechanism by which proteins sense and generate membrane curvature and the mechanism of membrane scission. This article is part of the themed issue ‘Soft interfacial materials: from fundamentals to formulation’. PMID:27298443

  17. Bioinformatics analysis identifies several intrinsically disordered human E3 ubiquitin-protein ligases.

    PubMed

    Boomsma, Wouter; Nielsen, Sofie V; Lindorff-Larsen, Kresten; Hartmann-Petersen, Rasmus; Ellgaard, Lars

    2016-01-01

    The ubiquitin-proteasome system targets misfolded proteins for degradation. Since the accumulation of such proteins is potentially harmful for the cell, their prompt removal is important. E3 ubiquitin-protein ligases mediate substrate ubiquitination by bringing together the substrate with an E2 ubiquitin-conjugating enzyme, which transfers ubiquitin to the substrate. For misfolded proteins, substrate recognition is generally delegated to molecular chaperones that subsequently interact with specific E3 ligases. An important exception is San1, a yeast E3 ligase. San1 harbors extensive regions of intrinsic disorder, which provide both conformational flexibility and sites for direct recognition of misfolded targets of vastly different conformations. So far, no mammalian ortholog of San1 is known, nor is it clear whether other E3 ligases utilize disordered regions for substrate recognition. Here, we conduct a bioinformatics analysis to examine >600 human and S. cerevisiae E3 ligases to identify enzymes that are similar to San1 in terms of function and/or mechanism of substrate recognition. An initial sequence-based database search was found to detect candidates primarily based on the homology of their ordered regions, and did not capture the unique disorder patterns that encode the functional mechanism of San1. However, by searching specifically for key features of the San1 sequence, such as long regions of intrinsic disorder embedded with short stretches predicted to be suitable for substrate interaction, we identified several E3 ligases with these characteristics. Our initial analysis revealed that another remarkable trait of San1 is shared with several candidate E3 ligases: long stretches of complete lysine suppression, which in San1 limits auto-ubiquitination. We encode these characteristic features into a San1 similarity-score, and present a set of proteins that are plausible candidates as San1 counterparts in humans. In conclusion, our work indicates that San1 is not a unique case, and that several other yeast and human E3 ligases have sequence properties that may allow them to recognize substrates by a similar mechanism as San1.

  18. Amyloid cores in prion domains: Key regulators for prion conformational conversion.

    PubMed

    Fernández, María Rosario; Batlle, Cristina; Gil-García, Marcos; Ventura, Salvador

    2017-01-02

    Despite the significant efforts devoted to decipher the particular protein features that encode for a prion or prion-like behavior, they are still poorly understood. The well-characterized yeast prions constitute an ideal model system to address this question, because, in these proteins, the prion activity can be univocally assigned to a specific region of their sequence, known as the prion forming domain (PFD). These PFDs are intrinsically disordered, relatively long and, in many cases, of low complexity, being enriched in glutamine/asparagine residues. Computational analyses have identified a significant number of proteins having similar domains in the human proteome. The compositional bias of these regions plays an important role in the transition of the prions to the amyloid state. However, it is difficult to explain how composition alone can account for the formation of specific contacts that position correctly PFDs and provide the enthalpic force to compensate for the large entropic cost of immobilizing these domains in the initial assemblies. We have hypothesized that short, sequence-specific, amyloid cores embedded in PFDs can perform these functions and, accordingly, act as preferential nucleation centers in both spontaneous and seeded aggregation. We have shown that the implementation of this concept in a prediction algorithm allows to score the prion propensities of putative PFDs with high accuracy. Recently, we have provided experimental evidence for the existence of such amyloid cores in the PFDs of Sup35, Ure2, Swi1, and Mot3 yeast prions. The fibrils formed by these short stretches may recognize and promote the aggregation of the complete proteins inside cells, being thus a promising tool for targeted protein inactivation.

  19. Dose-Response Analysis of RNA-Seq Profiles in Archival Formalin-Fixed Paraffin-Embedded (FFPE) Samples.

    EPA Science Inventory

    Use of archival resources has been limited to date by inconsistent methods for genomic profiling of degraded RNA from formalin-fixed paraffin-embedded (FFPE) samples. RNA-sequencing offers a promising way to address this problem. Here we evaluated transcriptomic dose responses us...

  20. Formation of Orthopoxvirus Cytoplasmic A-Type Inclusion Bodies and Embedding of Virions Are Dynamic Processes Requiring Microtubules

    PubMed Central

    Howard, Amanda R.

    2012-01-01

    In cells infected with some orthopoxviruses, numerous mature virions (MVs) become embedded within large, cytoplasmic A-type inclusions (ATIs) that can protect infectivity after cell lysis. ATIs are composed of an abundant viral protein called ATIp, which is truncated in orthopoxviruses such as vaccinia virus (VACV) that do not form ATIs. To study ATI formation and occlusion of MVs within ATIs, we used recombinant VACVs that express the cowpox full-length ATIp or we transfected plasmids encoding ATIp into cells infected with VACV, enabling ATI formation. ATI enlargement and MV embedment required continued protein synthesis and an intact microtubular network. For live imaging of ATIs and MVs, plasmids expressing mCherry fluorescent protein fused to ATIp were transfected into cells infected with VACV expressing the viral core protein A4 fused to yellow fluorescent protein. ATIs appeared as dynamic, mobile bodies that enlarged by multiple coalescence events, which could be prevented by disrupting microtubules. Coalescence of ATIs was confirmed in cells infected with cowpox virus. MVs were predominantly at the periphery of ATIs early in infection. We determined that coalescence contributed to the distribution of MVs within ATIs and that microtubule-disrupting drugs abrogated coalescence-mediated MV embedment. In addition, MVs were shown to move from viral factories at speeds consistent with microtubular transport to the peripheries of ATIs, whereas disruption of microtubules prevented such trafficking. The data indicate an important role for microtubules in the coalescence of ATIs into larger structures, transport of MVs to ATIs, and embedment of MVs within the ATI matrix. PMID:22438543

  1. Embedded Piezoresistive Microcantilever Sensors for Chemical and Biological Sensing

    NASA Astrophysics Data System (ADS)

    Porter, Timothy; Eastman, Michael; Kooser, Ara; Manygoats, Kevin; Zhine, Rosalie

    2003-03-01

    Microcantilever sensors based on embedded piezoresisative technology offer a promising, low-cost method of sensing chemical and biological species. Here, we present data on the detection of various gaseous analytes, including volatile organic compounds (VOC's) and carbon monoxide. Also, we have used these sensors to detect the protein bovine serum albumin (BSA), a protein important in the study of human childhood diabetes.

  2. Detection of a New cfr-Like Gene, cfr(B), in Enterococcus faecium Isolates Recovered from Human Specimens in the United States as Part of the SENTRY Antimicrobial Surveillance Program.

    PubMed

    Deshpande, Lalitagauri M; Ashcraft, Deborah S; Kahn, Heather P; Pankey, George; Jones, Ronald N; Farrell, David J; Mendes, Rodrigo E

    2015-10-01

    Two linezolid-resistant Enterococcus faecium isolates (MICs, 8 μg/ml) from unique patients of a medical center in New Orleans were included in this study. Isolates were initially investigated for the presence of mutations in the V domain of 23S rRNA genes and L3, L4, and L22 ribosomal proteins, as well as cfr. Isolates were subjected to pulsed-field gel electrophoresis (just one band difference), and one representative strain was submitted to whole-genome sequencing. Gene location was also determined by hybridization, and cfr genes were cloned and expressed in a Staphylococcus aureus background. The two isolates had one out of six 23S rRNA alleles mutated (G2576T), had wild-type L3, L4, and L22 sequences, and were positive for a cfr-like gene. The sequence of the protein encoded by the cfr-like gene was most similar (99.7%) to that found in Peptoclostridium difficile, which shared only 74.9% amino acid identity with the proteins encoded by genes previously identified in staphylococci and non-faecium enterococci and was, therefore, denominated Cfr(B). When expressed in S. aureus, the protein conferred a resistance profile similar to that of Cfr. Two copies of cfr(B) were chromosomally located and embedded in a Tn6218 similar to the cfr-carrying transposon described in P. difficile. This study reports the first detection of cfr genes in E. faecium clinical isolates in the United States and characterization of a new cfr variant, cfr(B). cfr(B) has been observed in mobile genetic elements in E. faecium and P. difficile, suggesting potential for dissemination. However, further analysis is necessary to access the resistance levels conferred by cfr(B) when expressed in enterococci. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

  3. Virus characterization and discovery in formalin-fixed paraffin-embedded tissues.

    PubMed

    Bodewes, Rogier; van Run, Peter R W A; Schürch, Anita C; Koopmans, Marion P G; Osterhaus, Albert D M E; Baumgärtner, Wolfgang; Kuiken, Thijs; Smits, Saskia L

    2015-03-01

    Detection and characterization of novel viruses is hampered frequently by the lack of properly stored materials. Especially for the retrospective identification of viruses responsible for past disease outbreaks, often only formalin-fixed paraffin-embedded (FFPE) tissue samples are available. Although FFPE tissues can be used to detect known viral sequences, the application of FFPE tissues for detection of novel viruses is currently unclear. In the present study it was shown that sequence-independent amplification in combination with next-generation sequencing can be used to detect sequences of known and unknown viruses, although with relatively low sensitivity. These findings indicate that this technique could be useful for detecting novel viral sequences in FFPE tissues collected from humans and animals with disease of unknown origin, when other samples are not available. In addition, application of this method to FFPE tissues allows to correlate with the presence of histopathological changes in the corresponding tissue sections. Copyright © 2015 Elsevier B.V. All rights reserved.

  4. Sequential recognition of the pre-mRNA branch point by U2AF65 and a novel spliceosome-associated 28-kDa protein.

    PubMed Central

    Gaur, R K; Valcárcel, J; Green, M R

    1995-01-01

    Splicing of pre-mRNAs occurs via a lariat intermediate in which an intronic adenosine, embedded within a branch point sequence, forms a 2',5'-phosphodiester bond (RNA branch) with the 5' end of the intron. How the branch point is recognized and activated remains largely unknown. Using site-specific photochemical cross-linking, we have identified two proteins that specifically interact with the branch point during the splicing reaction. U2AF65, an essential splicing factor that binds to the adjacent polypyrimidine tract, crosslinks to the branch point at the earliest stage of spliceosome formation in an ATP-independent manner. A novel 28-kDa protein, which is a constituent of the mature spliceosome, contacts the branch point after the first catalytic step. Our results indicate that the branch point is sequentially recognized by distinct splicing factors in the course of the splicing reaction. Images FIGURE 1 FIGURE 2 FIGURE 3 FIGURE 4 FIGURE 5 FIGURE 6 FIGURE 7 FIGURE 8 FIGURE 9 PMID:7493318

  5. Identification of structural and morphogenesis genes of Pseudoalteromonas phage φRIO-1 and placement within the evolutionary history of Podoviridae.

    PubMed

    Hardies, Stephen C; Thomas, Julie A; Black, Lindsay; Weintraub, Susan T; Hwang, Chung Y; Cho, Byung C

    2016-02-01

    The virion proteins of Pseudoalteromonas phage φRIO-1 were identified and quantitated by mass spectrometry and gel densitometry. Bioinformatic methods customized to deal with extreme divergence defined a φRIO-1 tail structure homology group of phages, which was further related to T7 tail and internal virion proteins (IVPs). Similarly, homologs of tubular tail components and internal virion proteins were identified in essentially all completely sequenced podoviruses other than those in the subfamily Picovirinae. The podoviruses were subdivided into several tail structure homology groups, in addition to the RIO-1 and T7 groups. Molecular phylogeny indicated that these groups all arose about the same ancient time as the φRIO-1/T7 split. Hence, the T7-like infection mechanism involving the IVPs was an ancestral property of most podoviruses. The IVPs were found to variably host both tail lysozyme domains and domains destined for the cytoplasm, including the N4 virion RNA polymerase embedded within an IVP-D homolog. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  6. KIT gene mutations and patterns of protein expression in mucosal and acral melanoma.

    PubMed

    Abu-Abed, Suzan; Pennell, Nancy; Petrella, Teresa; Wright, Frances; Seth, Arun; Hanna, Wedad

    2012-01-01

    Recently characterized KIT (CD117) gene mutations have revealed new pathways involved in melanoma pathogenesis. In particular, certain subtypes harbor mutations similar to those observed in gastrointestinal stromal tumors, which are sensitive to treatment with tyrosine kinase inhibitors. The purpose of this study was to characterize KIT gene mutations and patterns of protein expression in mucosal and acral melanoma. Formalin-fixed, paraffin-embedded tissues were retrieved from our archives. Histologic assessment included routine hematoxylin-eosin stains and immunohistochemical staining for KIT. Genomic DNA was used for polymerase chain reaction-based amplification of exons 11 and 13. We identified 59 acral and mucosal melanoma cases, of which 78% showed variable levels of KIT expression. Sequencing of exons 11 and 13 was completed on all cases, and 4 (6.8%) mutant cases were isolated. We successfully optimized conditions for the detection of KIT mutations and showed that 8.6% of mucosal and 4.2% of acral melanoma cases at our institution harbor KIT mutations; all mutant cases showed strong, diffuse KIT protein expression. Our case series represents the first Canadian study to characterize KIT gene mutations and patterns of protein expression in acral and mucosal melanoma.

  7. Mutational status of EGFR and KIT in thymoma and thymic carcinoma.

    PubMed

    Yoh, Kiyotaka; Nishiwaki, Yutaka; Ishii, Genichiro; Goto, Koichi; Kubota, Kaoru; Ohmatsu, Hironobu; Niho, Seiji; Nagai, Kanji; Saijo, Nagahiro

    2008-12-01

    This study was conducted to evaluate the prevalence of EGFR and KIT mutations in thymomas and thymic carcinomas as a means of exploring the potential for molecularly targeted therapy with tyrosine kinase inhibitors. Genomic DNA was isolated from 41 paraffin-embedded tumor samples obtained from 24 thymomas and 17 thymic carcinomas. EGFR exons 18, 19, and 21, and KIT exons 9, 11, 13, and 17, were analyzed for mutations by PCR and direct sequencing. Protein expression of EGFR and KIT was evaluated immunohistochemically. EGFR mutations were detected in 2 of 20 thymomas, but not in any of the thymic carcinomas. All of the EGFR mutations detected were missense mutations (L858R and G863D) in exon 21. EGFR protein was expressed in 71% of the thymomas and 53% of the thymic carcinomas. The mutational analysis of KIT revealed only a missense mutation (L576P) in exon 11 of one thymic carcinoma. KIT protein was expressed in 88% of the thymic carcinomas and 0% of the thymomas. The results of this study indicate that EGFR and KIT mutations in thymomas and thymic carcinomas are rare, but that many of the tumors express EGFR or KIT protein.

  8. Evolution of domain promiscuity in eukaryotic genomes—a perspective from the inferred ancestral domain architectures†

    PubMed Central

    Cohen-Gihon, Inbar; Fong, Jessica H.; Sharan, Roded; Nussinov, Ruth

    2012-01-01

    Most eukaryotic proteins are composed of two or more domains. These assemble in a modular manner to create new proteins usually by the acquisition of one or more domains to an existing protein. Promiscuous domains which are found embedded in a variety of proteins and co-exist with many other domains are of particular interest and were shown to have roles in signaling pathways and mediating network communication. The evolution of domain promiscuity is still an open problem, mostly due to the lack of sequenced ancestral genomes. Here we use inferred domain architectures of ancestral genomes to trace the evolution of domain promiscuity in eukaryotic genomes. We find an increase in average promiscuity along many branches of the eukaryotic tree. Moreover, domain promiscuity can proceed at almost a steady rate over long evolutionary time or exhibit lineage-specific acceleration. We also observe that many signaling and regulatory domains gained domain promiscuity around the Bilateria divergence. In addition we show that those domains that played a role in the creation of two body axes and existed before the divergence of the bilaterians from fungi/metazoan achieve a boost in their promiscuities during the bilaterian evolution. PMID:21127809

  9. Design of Embedded-Hybrid Antimicrobial Peptides with Enhanced Cell Selectivity and Anti-Biofilm Activity

    PubMed Central

    Xu, Wei; Zhu, Xin; Tan, Tingting; Li, Weizhong; Shan, Anshan

    2014-01-01

    Antimicrobial peptides have attracted considerable attention because of their broad-spectrum antimicrobial activity and their low prognostic to induce antibiotic resistance which is the most common source of failure in bacterial infection treatment along with biofilms. The method to design hybrid peptide integrating different functional domains of peptides has many advantages. In this study, we designed an embedded-hybrid peptide R-FV-I16 by replacing a functional defective sequence RR7 with the anti-biofilm sequence FV7 embedded in the middle position of peptide RI16. The results demonstrated that the synthetic hybrid the peptide R-FV-I16 had potent antimicrobial activity over a wide range of Gram-negative and Gram-positive bacteria, as well as anti-biofilm activity. More importantly, R-FV-I16 showed lower hemolytic activity and cytotoxicity. Fluorescent assays demonstrated that R-FV-I16 depolarized the outer and the inner bacterial membranes, while scanning electron microscopy and transmission electron microscopy further indicated that this peptide killed bacterial cells by disrupting the cell membrane, thereby damaging membrane integrity. Results from SEM also provided evidence that R-FV-I16 inherited anti-biofilm activity from the functional peptide sequence FV7. Embedded-hybrid peptides could provide a new pattern for combining different functional domains and showing an effective avenue to screen for novel antimicrobial agents. PMID:24945359

  10. Aromatic Side Chain Water-to-Lipid Transfer Free Energies Show a Depth Dependence across the Membrane Normal.

    PubMed

    McDonald, Sarah K; Fleming, Karen G

    2016-06-29

    Quantitating and understanding the physical forces responsible for the interactions of biomolecules are fundamental to the biological sciences. This is especially challenging for membrane proteins because they are embedded within cellular bilayers that provide a unique medium in which hydrophobic sequences must fold. Knowledge of the energetics of protein-lipid interactions is thus vital to understand cellular processes involving membrane proteins. Here we used a host-guest mutational strategy to calculate the Gibbs free energy changes of water-to-lipid transfer for the aromatic side chains Trp, Tyr, and Phe as a function of depth in the membrane. This work reveals an energetic gradient in the transfer free energies for Trp and Tyr, where transfer was most favorable to the membrane interfacial region and comparatively less favorable into the bilayer center. The transfer energetics follows the concentration gradient of polar atoms across the bilayer normal that naturally occurs in biological membranes. Additional measurements revealed nearest-neighbor coupling in the data set are influenced by a network of aromatic side chains in the host protein. Taken together, these results show that aromatic side chains contribute significantly to membrane protein stability through either aromatic-aromatic interactions or placement at the membrane interface.

  11. Constitutively expressing cell lines that secrete a truncated bovine herpes virus-1 glycoprotein (gpI) stimulate T-lymphocyte responsiveness.

    PubMed

    Leary, T P; Gao, Y; Splitter, G A

    1992-07-01

    The desire to obtain authentically glycosylated viral protein products in sufficient quantity for immunological study has led to the use of eucaryotic expression vectors for protein production. An additional advantage is that these protein products can be studied individually in the absence of their native viral environment. We have cloned a complementary DNA (cDNA) encoding bovine herpes virus-1 (BHV-1) glycoprotein 1 (gpI) into the eucaryotic expression vector, pZipNeo SVX1. Since this protein is normally embedded within the membrane of BHV-1 infected cells, we removed sequences encoding the transmembrane domain of the native protein. After transfection of the plasmid construct into the canine osteosarcoma cell line, D17, or Madin-Darby bovine kidney (MDBK) cells, a truncated BHV-1 (gpI) was secreted into the culture medium as demonstrated by radioimmunoprecipitation and SDS-PAGE. Both a CD4+ T-lymphocyte line specific for BHV-1 and freshly isolated T lymphocytes could recognize and respond to the secreted recombinant gpI. Further, recombinant gpI could elicit both antibody and cellular responses in cattle when used as an immunogen. Having established constitutively glycoprotein producing cell lines, future studies in vaccine evaluation of gpI will be facilitated.

  12. EXors and the stellar birthline

    NASA Astrophysics Data System (ADS)

    Moody, Mackenzie S. L.; Stahler, Steven W.

    2017-04-01

    We assess the evolutionary status of EXors. These low-mass, pre-main-sequence stars repeatedly undergo sharp luminosity increases, each a year or so in duration. We place into the HR diagram all EXors that have documented quiescent luminosities and effective temperatures, and thus determine their masses and ages. Two alternate sets of pre-main-sequence tracks are used, and yield similar results. Roughly half of EXors are embedded objects, I.e., they appear observationally as Class I or flat-spectrum infrared sources. We find that these are relatively young and are located close to the stellar birthline in the HR diagram. Optically visible EXors, on the other hand, are situated well below the birthline. They have ages of several Myr, typical of classical T Tauri stars. Judging from the limited data at hand, we find no evidence that binarity companions trigger EXor eruptions; this issue merits further investigation. We draw several general conclusions. First, repetitive luminosity outbursts do not occur in all pre-main-sequence stars, and are not in themselves a sign of extreme youth. They persist, along with other signs of activity, in a relatively small subset of these objects. Second, the very existence of embedded EXors demonstrates that at least some Class I infrared sources are not true protostars, but very young pre-main-sequence objects still enshrouded in dusty gas. Finally, we believe that the embedded pre-main-sequence phase is of observational and theoretical significance, and should be included in a more complete account of early stellar evolution.

  13. Molecular characterization of oral squamous cell carcinoma using targeted next-generation sequencing.

    PubMed

    Er, Tze-Kiong; Wang, Yen-Yun; Chen, Chih-Chieh; Herreros-Villanueva, Marta; Liu, Ta-Chih; Yuan, Shyng-Shiou F

    2015-10-01

    Many genetic factors play an important role in the development of oral squamous cell carcinoma. The aim of this study was to assess the mutational profile in oral squamous cell carcinoma using formalin-fixed, paraffin-embedded tumors from a Taiwanese population by performing targeted sequencing of 26 cancer-associated genes that are frequently mutated in solid tumors. Next-generation sequencing was performed in 50 formalin-fixed, paraffin-embedded tumor specimens obtained from patients with oral squamous cell carcinoma. Genetic alterations in the 26 cancer-associated genes were detected using a deep sequencing (>1000X) approach. TP53, PIK3CA, MET, APC, CDH1, and FBXW7 were most frequently mutated genes. Most remarkably, TP53 mutations and PIK3CA mutations, which accounted for 68% and 18% of tumors, respectively, were more prevalent in a Taiwanese population. Other genes including MET (4%), APC (4%), CDH1 (2%), and FBXW7 (2%) were identified in our population. In summary, our study shows the feasibility of performing targeted sequencing using formalin-fixed, paraffin-embedded samples. Additionally, this study also reports the mutational landscape of oral squamous cell carcinoma in the Taiwanese population. We believe that this study will shed new light on fundamental aspects in understanding the molecular pathogenesis of oral squamous cell carcinoma and may aid in the development of new targeted therapies. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  14. Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding.

    PubMed

    Min, Xu; Zeng, Wanwen; Chen, Ning; Chen, Ting; Jiang, Rui

    2017-07-15

    Experimental techniques for measuring chromatin accessibility are expensive and time consuming, appealing for the development of computational approaches to predict open chromatin regions from DNA sequences. Along this direction, existing methods fall into two classes: one based on handcrafted k -mer features and the other based on convolutional neural networks. Although both categories have shown good performance in specific applications thus far, there still lacks a comprehensive framework to integrate useful k -mer co-occurrence information with recent advances in deep learning. We fill this gap by addressing the problem of chromatin accessibility prediction with a convolutional Long Short-Term Memory (LSTM) network with k -mer embedding. We first split DNA sequences into k -mers and pre-train k -mer embedding vectors based on the co-occurrence matrix of k -mers by using an unsupervised representation learning approach. We then construct a supervised deep learning architecture comprised of an embedding layer, three convolutional layers and a Bidirectional LSTM (BLSTM) layer for feature learning and classification. We demonstrate that our method gains high-quality fixed-length features from variable-length sequences and consistently outperforms baseline methods. We show that k -mer embedding can effectively enhance model performance by exploring different embedding strategies. We also prove the efficacy of both the convolution and the BLSTM layers by comparing two variations of the network architecture. We confirm the robustness of our model to hyper-parameters by performing sensitivity analysis. We hope our method can eventually reinforce our understanding of employing deep learning in genomic studies and shed light on research regarding mechanisms of chromatin accessibility. The source code can be downloaded from https://github.com/minxueric/ismb2017_lstm . tingchen@tsinghua.edu.cn or ruijiang@tsinghua.edu.cn. Supplementary materials are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  15. Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding

    PubMed Central

    Min, Xu; Zeng, Wanwen; Chen, Ning; Chen, Ting; Jiang, Rui

    2017-01-01

    Abstract Motivation: Experimental techniques for measuring chromatin accessibility are expensive and time consuming, appealing for the development of computational approaches to predict open chromatin regions from DNA sequences. Along this direction, existing methods fall into two classes: one based on handcrafted k-mer features and the other based on convolutional neural networks. Although both categories have shown good performance in specific applications thus far, there still lacks a comprehensive framework to integrate useful k-mer co-occurrence information with recent advances in deep learning. Results: We fill this gap by addressing the problem of chromatin accessibility prediction with a convolutional Long Short-Term Memory (LSTM) network with k-mer embedding. We first split DNA sequences into k-mers and pre-train k-mer embedding vectors based on the co-occurrence matrix of k-mers by using an unsupervised representation learning approach. We then construct a supervised deep learning architecture comprised of an embedding layer, three convolutional layers and a Bidirectional LSTM (BLSTM) layer for feature learning and classification. We demonstrate that our method gains high-quality fixed-length features from variable-length sequences and consistently outperforms baseline methods. We show that k-mer embedding can effectively enhance model performance by exploring different embedding strategies. We also prove the efficacy of both the convolution and the BLSTM layers by comparing two variations of the network architecture. We confirm the robustness of our model to hyper-parameters by performing sensitivity analysis. We hope our method can eventually reinforce our understanding of employing deep learning in genomic studies and shed light on research regarding mechanisms of chromatin accessibility. Availability and implementation: The source code can be downloaded from https://github.com/minxueric/ismb2017_lstm. Contact: tingchen@tsinghua.edu.cn or ruijiang@tsinghua.edu.cn Supplementary information: Supplementary materials are available at Bioinformatics online. PMID:28881969

  16. Evolution and Protein Packaging of Small Molecule RNA Aptamers

    PubMed Central

    Lau, Jolene L.; Baksh, Michael M.; Fiedler, Jason D.; Brown, Steven D.; Kussrow, Amanda; Bornhop, Darryl J.; Ordoukhanian, Phillip

    2011-01-01

    A high-affinity RNA aptamer (Kd = 50 nM) was efficiently identified by SELEX against a heteroaryl dihydropyrimidine structure, chosen as a representative drug-like molecule with no cross reactivity with mammalian or bacterial cells. This aptamer, its weaker-binding variants, and a known aptamer against theophylline were each embedded in a longer RNA sequence that was encapsidated inside a virus-like particle by a convenient expression technique. These nucleoprotein particles were shown by backscattering interferometry to bind to the small-molecule ligands with affinities similar to those of the free (non-encapsidated) aptamers. The system therefore comprises a general approach to the production and sequestration of functional RNA molecules, characterized by a convenient label-free analytical technique. PMID:21899290

  17. Native Mass Spectrometry Characterizes the Photosynthetic Reaction Center Complex from the Purple Bacterium Rhodobacter sphaeroides

    NASA Astrophysics Data System (ADS)

    Zhang, Hao; Harrington, Lucas B.; Lu, Yue; Prado, Mindy; Saer, Rafael; Rempel, Don; Blankenship, Robert E.; Gross, Michael L.

    2017-01-01

    Native mass spectrometry (MS) is an emerging approach to study protein complexes in their near-native states and to elucidate their stoichiometry and topology. Here, we report a native MS study of the membrane-embedded reaction center (RC) protein complex from the purple photosynthetic bacterium Rhodobacter sphaeroides. The membrane-embedded RC protein complex is stabilized by detergent micelles in aqueous solution, directly introduced into a mass spectrometer by nano-electrospray (nESI), and freed of detergents and dissociated in the gas phase by collisional activation. As the collision energy is increased, the chlorophyll pigments are gradually released from the RC complex, suggesting that native MS introduces a near-native structure that continues to bind pigments. Two bacteriochlorophyll a pigments remain tightly bound to the RC protein at the highest collision energy. The order of pigment release and their resistance to release by gas-phase activation indicates the strength of pigment interaction in the RC complex. This investigation sets the stage for future native MS studies of membrane-embedded photosynthetic pigment-protein and related complexes.

  18. [Molecular mechanisms of lung cancer development at its different stages in nuclear industry workers].

    PubMed

    Rusinova, G G; Vyazovskaya, N S; Azizova, T V; Revina, V S; Glazkova, I V; Generozov, E V; Zakharzhevskaya, N B; Guryanov, M Yu; Belosokhov, M V; Osovets, S V

    2015-01-01

    to assess mutational events in exons 5, 7, and 8 of the p53 gene and to reveal mutant p53 protein in verified cases of morphologically altered (proliferative and precancerous changes, lung cancer) and histologically unaltered, lung tissues in workers exposed to occupational radiation. The investigation used formalin-fixed paraffin-embedded unaltered and altered lung tissue blocks (FFPBs) obtained from the human radiobiological tissue repository. The shelf-life of FFPBs was 5-31 years. An immunohistochemical technique using mouse antibodies against p53 protein (, Denmark), stained with diaminobenzidine (DAB) chromogen, was employed to determine p53 protein. DNA was isolated from lung tissue FFPBs with QIAmp DNA FFPE Tissue Kit, (, USA). Polymerase chain reaction (PCR) was performed to amplify the p53 gene exons 5, 7, and 8 selected for examination, by applying the sequences of genes and primers, the specificity of which was checked using the online resource (http://www.ncbi.nlm.nih.gov/blast). PCR products were detected by temporal temperature gradient gel-electrophoresis and the Sanger sequencing method. The obtained DNA fragments were analyzed on a sequencer ABI Prism 3100 Genetic Analizer (, USA). Computer-aided DNA analysis was made using the BLAST program. A package of applied Statistica 6.0 programs was employed for statistical data processing. Results. Immunohistochemical analysis showed that mutant p53 protein was absent in the cells of unaltered lung tissue and the number of cells with mutant p53 protein increased in all the patients with proliferative and precancerous changes and lung cancer, suggesting p53 protein dysfunction. The total number of p53 gene mutations in exons 5, 7, and 8, if there were proliferative and precancerous lung tissue changes and lung cancer, were 25, 20, and 40%, respectively. All the found mutations were transversions (the substitution of purine for pyrimidine or, conversely), indicating the action of exogenous mutagens. The results of this investigation have confirmed other investigators' data showing that p53 gene mutations in lung cancer are observed in 40-70% of cases. The differences in the number of cases of altered lung tissue with mutations in the p53 gene (not more than 40%) and in those of p53 protein expression were found in 100%, suggesting the regulation of p53 gene function in the cell at multiple levels.

  19. 7A projection map of the S-layer protein sbpA obtained with trehalose-embedded monolayer crystals.

    PubMed

    Norville, Julie E; Kelly, Deborah F; Knight, Thomas F; Belcher, Angela M; Walz, Thomas

    2007-12-01

    Two-dimensional crystallization on lipid monolayers is a versatile tool to obtain structural information of proteins by electron microscopy. An inherent problem with this approach is to prepare samples in a way that preserves the crystalline order of the protein array and produces specimens that are sufficiently flat for high-resolution data collection at high tilt angles. As a test specimen to optimize the preparation of lipid monolayer crystals for electron microscopy imaging, we used the S-layer protein sbpA, a protein with potential for designing arrays of both biological and inorganic materials with engineered properties for a variety of nanotechnology applications. Sugar embedding is currently considered the best method to prepare two-dimensional crystals of membrane proteins reconstituted into lipid bilayers. We found that using a loop to transfer lipid monolayer crystals to an electron microscopy grid followed by embedding in trehalose and quick-freezing in liquid ethane also yielded the highest resolution images for sbpA lipid monolayer crystals. Using images of specimens prepared in this way we could calculate a projection map of sbpA at 7A resolution, one of the highest resolution projection structures obtained with lipid monolayer crystals to date.

  20. Correlative Imaging of Fluorescent Proteins in Resin-Embedded Plant Material1

    PubMed Central

    Bell, Karen; Mitchell, Steve; Paultre, Danae; Posch, Markus; Oparka, Karl

    2013-01-01

    Fluorescent proteins (FPs) were developed for live-cell imaging and have revolutionized cell biology. However, not all plant tissues are accessible to live imaging using confocal microscopy, necessitating alternative approaches for protein localization. An example is the phloem, a tissue embedded deep within plant organs and sensitive to damage. To facilitate accurate localization of FPs within recalcitrant tissues, we developed a simple method for retaining FPs after resin embedding. This method is based on low-temperature fixation and dehydration, followed by embedding in London Resin White, and avoids the need for cryosections. We show that a palette of FPs can be localized in plant tissues while retaining good structural cell preservation, and that the polymerized block face can be counterstained with cell wall probes. Using this method we have been able to image green fluorescent protein-labeled plasmodesmata to a depth of more than 40 μm beneath the resin surface. Using correlative light and electron microscopy of the phloem, we were able to locate the same FP-labeled sieve elements in semithin and ultrathin sections. Sections were amenable to antibody labeling, and allowed a combination of confocal and superresolution imaging (three-dimensional-structured illumination microscopy) on the same cells. These correlative imaging methods should find several uses in plant cell biology. PMID:23457228

  1. Structural, Biochemical, and Computational Studies Reveal the Mechanism of Selective Aldehyde Dehydrogenase 1A1 Inhibition by Cytotoxic Duocarmycin Analogues.

    PubMed

    Koch, Maximilian F; Harteis, Sabrina; Blank, Iris D; Pestel, Galina; Tietze, Lutz F; Ochsenfeld, Christian; Schneider, Sabine; Sieber, Stephan A

    2015-11-09

    Analogues of the natural product duocarmycin bearing an indole moiety were shown to bind aldehyde dehydrogenase 1A1 (ALDH1A1) in addition to DNA, while derivatives without the indole solely addressed the ALDH1A1 protein. The molecular mechanism of selective ALDH1A1 inhibition by duocarmycin analogues was unraveled through cocrystallization, mutational studies, and molecular dynamics simulations. The structure of the complex shows the compound embedded in a hydrophobic pocket, where it is stabilized by several crucial π-stacking and van der Waals interactions. This binding mode positions the cyclopropyl electrophile for nucleophilic attack by the noncatalytic residue Cys302, thereby resulting in covalent attachment, steric occlusion of the active site, and inhibition of catalysis. The selectivity of duocarmycin analogues for ALDH1A1 is unique, since only minor alterations in the sequence of closely related protein isoforms restrict compound accessibility. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  2. Visual content highlighting via automatic extraction of embedded captions on MPEG compressed video

    NASA Astrophysics Data System (ADS)

    Yeo, Boon-Lock; Liu, Bede

    1996-03-01

    Embedded captions in TV programs such as news broadcasts, documentaries and coverage of sports events provide important information on the underlying events. In digital video libraries, such captions represent a highly condensed form of key information on the contents of the video. In this paper we propose a scheme to automatically detect the presence of captions embedded in video frames. The proposed method operates on reduced image sequences which are efficiently reconstructed from compressed MPEG video and thus does not require full frame decompression. The detection, extraction and analysis of embedded captions help to capture the highlights of visual contents in video documents for better organization of video, to present succinctly the important messages embedded in the images, and to facilitate browsing, searching and retrieval of relevant clips.

  3. Exploring the Effect of Embedded Scaffolding Within Curricular Tasks on Third-Grade Students' Model-Based Explanations about Hydrologic Cycling

    NASA Astrophysics Data System (ADS)

    Zangori, Laura; Forbes, Cory T.; Schwarz, Christina V.

    2015-10-01

    Opportunities to generate model-based explanations are crucial for elementary students, yet are rarely foregrounded in elementary science learning environments despite evidence that early learners can reason from models when provided with scaffolding. We used a quasi-experimental research design to investigate the comparative impact of a scaffold test condition consisting of embedded physical scaffolds within a curricular modeling task on third-grade (age 8-9) students' formulation of model-based explanations for the water cycle. This condition was contrasted to the control condition where third-grade students used a curricular modeling task with no embedded physical scaffolds. Students from each condition ( n scaffold = 60; n unscaffold = 56) generated models of the water cycle before and after completion of a 10-week water unit. Results from quantitative analyses suggest that students in the scaffolded condition represented and linked more subsurface water process sequences with surface water process sequences than did students in the unscaffolded condition. However, results of qualitative analyses indicate that students in the scaffolded condition were less likely to build upon these process sequences to generate model-based explanations and experienced difficulties understanding their models as abstracted representations rather than recreations of real-world phenomena. We conclude that embedded curricular scaffolds may support students to consider non-observable components of the water cycle but, alone, may be insufficient for generation of model-based explanations about subsurface water movement.

  4. RNA sequencing confirms similarities between PPI-responsive oesophageal eosinophilia and eosinophilic oesophagitis.

    PubMed

    Peterson, K A; Yoshigi, M; Hazel, M W; Delker, D A; Lin, E; Krishnamurthy, C; Consiglio, N; Robson, J; Yandell, M; Clayton, F

    2018-06-04

    Although current American guidelines distinguish proton pump inhibitor-responsive oesophageal eosinophilia (PPI-REE) from eosinophilic oesophagitis (EoE), these entities are broadly similar. While two microarray studies showed that they have similar transcriptomes, more extensive RNA sequencing studies have not been done previously. To determine whether RNA sequencing identifies genetic markers distinguishing PPI-REE from EoE. We retrospectively examined 13 PPI-REE and 14 EoE biopsies, matched for tissue eosinophil content, and 14 normal controls. Patients and controls were not PPI-treated at the time of biopsy. We did RNA sequencing on formalin-fixed, paraffin-embedded tissue, with differential expression confirmation by quantitative polymerase chain reaction (PCR). We validated the use of formalin-fixed, paraffin-embedded vs RNAlater-preserved tissue, and compared our formalin-fixed, paraffin-embedded EoE results to a prior EoE study. By RNA sequencing, no genes were differentially expressed between the EoE and PPI-REE groups at the false discovery rate (FDR) ≤0.01 level. Compared to normal controls, 1996 genes were differentially expressed in the PPI-REE group and 1306 genes in the EoE group. By less stringent criteria, only MAPK8IP2 was differentially expressed between PPI-REE and EoE (FDR = 0.029, 2.2-fold less in EoE than in PPI-REE), with similar results by PCR. KCNJ2, which was differentially expressed in a prior study, was similar in the EoE and PPI-REE groups by both RNA sequencing and real-time PCR. Eosinophilic oesophagitis and PPI-REE have comparable transcriptomes, confirming that they are part of the same disease continuum. © 2018 John Wiley & Sons Ltd.

  5. Subcellular targeting and interactions among the Potato virus X TGB proteins.

    PubMed

    Samuels, Timmy D; Ju, Ho-Jong; Ye, Chang-Ming; Motes, Christy M; Blancaflor, Elison B; Verchot-Lubicz, Jeanmarie

    2007-10-25

    Potato virus X (PVX) encodes three proteins named TGBp1, TGBp2, and TGBp3 which are required for virus cell-to-cell movement. To determine whether PVX TGB proteins interact during virus cell-cell movement, GFP was fused to each TGB coding sequence within the viral genome. Confocal microscopy was used to study subcellular accumulation of each protein in virus-infected plants and protoplasts. GFP:TGBp2 and TGBp3:GFP were both seen in the ER, ER-associated granular vesicles, and perinuclear X-bodies suggesting that these proteins interact in the same subdomains of the endomembrane network. When plasmids expressing CFP:TGBp2 and TGBp3:GFP were co-delivered to tobacco leaf epidermal cells, the fluorescent signals overlapped in ER-associated granular vesicles indicating that these proteins colocalize in this subcellular compartment. GFP:TGBp1 was seen in the nucleus, cytoplasm, rod-like inclusion bodies, and in punctate sites embedded in the cell wall. The puncta were reminiscent of previous reports showing viral proteins in plasmodesmata. Experiments using CFP:TGBp1 and YFP:TGBp2 or TGBp3:GFP showed CFP:TGBp1 remained in the cytoplasm surrounding the endomembrane network. There was no evidence that the granular vesicles contained TGBp1. Yeast two hybrid experiments showed TGBp1 self associates but failed to detect interactions between TGBp1 and TGBp2 or TGBp3. These experiments indicate that the PVX TGB proteins have complex subcellular accumulation patterns and likely cooperate across subcellular compartments to promote virus infection.

  6. Self-assembling triblock proteins for biofunctional surface modification

    NASA Astrophysics Data System (ADS)

    Fischer, Stephen E.

    Despite the tremendous promise of cell/tissue engineering, significant challenges remain in engineering functional scaffolds to precisely regulate the complex processes of tissue growth and development. As the point of contact between the cells and the scaffold, the scaffold surface plays a major role in mediating cellular behaviors. In this dissertation, the development and utility of self-assembling, artificial protein hydrogels as biofunctional surface modifiers is described. The design of these recombinant proteins is based on a telechelic triblock motif, in which a disordered polyelectrolyte central domain containing embedded bioactive ligands is flanked by two leucine zipper domains. Under moderate conditions of temperature and pH, the leucine zipper end domains form amphiphilic alpha-helices that reversibly associate into homo-trimeric aggregates, driving hydrogel formation. Moreover, the amphiphilic nature of these helical domains enables surface adsorption to a variety of scaffold materials to form biofunctional protein coatings. The nature and stability of these coatings in various solution conditions, and their interaction with mammalian cells is the primary focus of this dissertation. In particular, triblock protein coatings functionalized with cell recognition sequences are shown to produce well-defined surfaces with precise control over ligand density. The impact of this is demonstrated in multiple cell types through ligand density-dependent cell-substrate interactions. To improve the stability of these physically self-assembled coatings, two covalent crosslinking strategies are described---one in which a zero-length chemical crosslinker (EDC) is utilized and a second in which disulfide bonds are engineered into the recombinant proteins. These targeted crosslinking approaches are shown to increase the stability of surface adsorbed protein layers with minimal effect on the presentation of many bioactive ligands. Finally, to demonstrate the versatility of the triblock protein hydrogels, and the ease of introducing multiple functionalities to a substrate surface, a surface coating is tailored for neural stem cell culture in order to improve proliferation on the scaffold, while maintaining the stem cell phenotype. These studies demonstrate the unique advantages of genetic engineering over traditional techniques for surface modification. In addition to their unmatched sequence fidelity, recombinant proteins can easily be modified with bioactive ligands and their organization into coherent, supramolecular structures mimics natural self-assembly processes.

  7. The language of geometry: Fast comprehension of geometrical primitives and rules in human adults and preschoolers.

    PubMed

    Amalric, Marie; Wang, Liping; Pica, Pierre; Figueira, Santiago; Sigman, Mariano; Dehaene, Stanislas

    2017-01-01

    During language processing, humans form complex embedded representations from sequential inputs. Here, we ask whether a "geometrical language" with recursive embedding also underlies the human ability to encode sequences of spatial locations. We introduce a novel paradigm in which subjects are exposed to a sequence of spatial locations on an octagon, and are asked to predict future locations. The sequences vary in complexity according to a well-defined language comprising elementary primitives and recursive rules. A detailed analysis of error patterns indicates that primitives of symmetry and rotation are spontaneously detected and used by adults, preschoolers, and adult members of an indigene group in the Amazon, the Munduruku, who have a restricted numerical and geometrical lexicon and limited access to schooling. Furthermore, subjects readily combine these geometrical primitives into hierarchically organized expressions. By evaluating a large set of such combinations, we obtained a first view of the language needed to account for the representation of visuospatial sequences in humans, and conclude that they encode visuospatial sequences by minimizing the complexity of the structured expressions that capture them.

  8. The language of geometry: Fast comprehension of geometrical primitives and rules in human adults and preschoolers

    PubMed Central

    Amalric, Marie; Wang, Liping; Figueira, Santiago; Sigman, Mariano; Dehaene, Stanislas

    2017-01-01

    During language processing, humans form complex embedded representations from sequential inputs. Here, we ask whether a “geometrical language” with recursive embedding also underlies the human ability to encode sequences of spatial locations. We introduce a novel paradigm in which subjects are exposed to a sequence of spatial locations on an octagon, and are asked to predict future locations. The sequences vary in complexity according to a well-defined language comprising elementary primitives and recursive rules. A detailed analysis of error patterns indicates that primitives of symmetry and rotation are spontaneously detected and used by adults, preschoolers, and adult members of an indigene group in the Amazon, the Munduruku, who have a restricted numerical and geometrical lexicon and limited access to schooling. Furthermore, subjects readily combine these geometrical primitives into hierarchically organized expressions. By evaluating a large set of such combinations, we obtained a first view of the language needed to account for the representation of visuospatial sequences in humans, and conclude that they encode visuospatial sequences by minimizing the complexity of the structured expressions that capture them. PMID:28125595

  9. Comparison of the DNA extraction methods for polymerase chain reaction amplification from formalin-fixed and paraffin-embedded tissues.

    PubMed

    Sato, Y; Sugie, R; Tsuchiya, B; Kameya, T; Natori, M; Mukai, K

    2001-12-01

    To obtain an adequate quality and quantity of DNA from formalin-fixed and paraffin-embedded tissue, six different DNA extraction methods were compared. Four methods used deparaffinization by xylene followed by proteinase K digestion and phenol-chloroform extraction. The temperature of the different steps was changed to obtain higher yields and improved quality of extracted DNA. The remaining two methods used microwave heating for deparaffinization. The best DNA extraction method consisted of deparaffinization by microwave irradiation, protein digestion with proteinase K at 48 degrees C overnight, and no further purification steps. By this method, the highest DNA yield was obtained and the amplification of a 989-base pair beta-globin gene fragment was achieved. Furthermore, DNA extracted by means of this procedure from five gastric carcinomas was successfully used for single strand conformation polymorphism and direct sequencing assays of the beta-catenin gene. Because the microwave-based DNA extraction method presented here is simple, has a lower contamination risk, and results in a higher yield of DNA compared with the ordinary organic chemical reagent-based extraction method, it is considered applicable to various clinical and basic fields.

  10. Chemical reactivation of resin-embedded pHuji adds red for simultaneous two-color imaging with EGFP

    PubMed Central

    Guo, Wenyan; Liu, Xiuli; Liu, Yurong; Gang, Yadong; He, Xiaobin; Jia, Yao; Yin, Fangfang; Li, Pei; Huang, Fei; Zhou, Hongfu; Wang, Xiaojun; Gong, Hui; Luo, Qingming; Xu, Fuqiang; Zeng, Shaoqun

    2017-01-01

    The pH-sensitive fluorescent proteins enabling chemical reactivation in resin are useful tools for fluorescence microimaging. EGFP or EYFP is good for such applications. For simultaneous two-color imaging, a suitable red fluorescent protein is an urgent need. Here a pH-sensitive red fluorescent protein, pHuji, is selected and verified to remain pH-sensitive in HM20 resin. We observe 183% fluorescence intensity of pHuji in resin-embeded mouse brain and 29.08-fold fluorescence intensity of reactivated pHuji compared to the quenched state. pHuji and EGFP can be quenched and chemically reactivated simultaneously in resin, thus enabling simultaneous two-color micro-optical sectioning tomography of resin-embedded mouse brain. This method may greatly facilitate the visualization of neuronal morphology and neural circuits to promote understanding of the structure and function of the brain. PMID:28717566

  11. Chemical reactivation of resin-embedded pHuji adds red for simultaneous two-color imaging with EGFP.

    PubMed

    Guo, Wenyan; Liu, Xiuli; Liu, Yurong; Gang, Yadong; He, Xiaobin; Jia, Yao; Yin, Fangfang; Li, Pei; Huang, Fei; Zhou, Hongfu; Wang, Xiaojun; Gong, Hui; Luo, Qingming; Xu, Fuqiang; Zeng, Shaoqun

    2017-07-01

    The pH-sensitive fluorescent proteins enabling chemical reactivation in resin are useful tools for fluorescence microimaging. EGFP or EYFP is good for such applications. For simultaneous two-color imaging, a suitable red fluorescent protein is an urgent need. Here a pH-sensitive red fluorescent protein, pHuji, is selected and verified to remain pH-sensitive in HM20 resin. We observe 183% fluorescence intensity of pHuji in resin-embeded mouse brain and 29.08-fold fluorescence intensity of reactivated pHuji compared to the quenched state. pHuji and EGFP can be quenched and chemically reactivated simultaneously in resin, thus enabling simultaneous two-color micro-optical sectioning tomography of resin-embedded mouse brain. This method may greatly facilitate the visualization of neuronal morphology and neural circuits to promote understanding of the structure and function of the brain.

  12. A Paradox within the Time Value of Money: A Critical Thinking Exercise for Finance Students

    ERIC Educational Resources Information Center

    Delaney, Charles J.; Rich, Steven P.; Rose, John T.

    2016-01-01

    This study presents a paradox within the time value of money (TVM), namely, that the interest-principal sequence embedded in the payment stream of an amortized loan is exactly the opposite of the interest-principal sequence implicit in the present value of a matching annuity. We examine this inverse sequence, both mathematically and intuitively,…

  13. A comparative spectroscopic and kinetic study of photoexcitations in detergent-isolated and membrane-embedded LH2 light-harvesting complexes.

    PubMed

    Freiberg, Arvi; Rätsep, Margus; Timpmann, Kõu

    2012-08-01

    Integral membrane proteins constitute more than third of the total number of proteins present in organisms. Solubilization with mild detergents is a common technique to study the structure, dynamics, and catalytic activity of these proteins in purified form. However beneficial the use of detergents may be for protein extraction, the membrane proteins are often denatured by detergent solubilization as a result of native lipid membrane interactions having been modified. Versatile investigations of the properties of membrane-embedded and detergent-isolated proteins are, therefore, required to evaluate the consequences of the solubilization procedure. Herein, the spectroscopic and kinetic fingerprints have been established that distinguish excitons in individual detergent-solubilized LH2 light-harvesting pigment-protein complexes from them in the membrane-embedded complexes of purple photosynthetic bacteria Rhodobacter sphaeroides. A wide arsenal of spectroscopic techniques in visible optical range that include conventional broadband absorption-fluorescence, fluorescence anisotropy excitation, spectrally selective hole burning and fluorescence line-narrowing, and transient absorption-fluorescence have been applied over broad temperature range between physiological and liquid He temperatures. Significant changes in energetics and dynamics of the antenna excitons upon self-assembly of the proteins into intracytoplasmic membranes are observed, analyzed, and discussed. This article is part of a Special Issue entitled: Photosynthesis Research for Sustainability: from Natural to Artificial. Copyright © 2011. Published by Elsevier B.V.

  14. Sequencing and characterizing the genome of Estrella lausannensis as an undergraduate project: training students and biological insights.

    PubMed

    Bertelli, Claire; Aeby, Sébastien; Chassot, Bérénice; Clulow, James; Hilfiker, Olivier; Rappo, Samuel; Ritzmann, Sébastien; Schumacher, Paolo; Terrettaz, Céline; Benaglio, Paola; Falquet, Laurent; Farinelli, Laurent; Gharib, Walid H; Goesmann, Alexander; Harshman, Keith; Linke, Burkhard; Miyazaki, Ryo; Rivolta, Carlo; Robinson-Rechavi, Marc; van der Meer, Jan Roelof; Greub, Gilbert

    2015-01-01

    With the widespread availability of high-throughput sequencing technologies, sequencing projects have become pervasive in the molecular life sciences. The huge bulk of data generated daily must be analyzed further by biologists with skills in bioinformatics and by "embedded bioinformaticians," i.e., bioinformaticians integrated in wet lab research groups. Thus, students interested in molecular life sciences must be trained in the main steps of genomics: sequencing, assembly, annotation and analysis. To reach that goal, a practical course has been set up for master students at the University of Lausanne: the "Sequence a genome" class. At the beginning of the academic year, a few bacterial species whose genome is unknown are provided to the students, who sequence and assemble the genome(s) and perform manual annotation. Here, we report the progress of the first class from September 2010 to June 2011 and the results obtained by seven master students who specifically assembled and annotated the genome of Estrella lausannensis, an obligate intracellular bacterium related to Chlamydia. The draft genome of Estrella is composed of 29 scaffolds encompassing 2,819,825 bp that encode for 2233 putative proteins. Estrella also possesses a 9136 bp plasmid that encodes for 14 genes, among which we found an integrase and a toxin/antitoxin module. Like all other members of the Chlamydiales order, Estrella possesses a highly conserved type III secretion system, considered as a key virulence factor. The annotation of the Estrella genome also allowed the characterization of the metabolic abilities of this strictly intracellular bacterium. Altogether, the students provided the scientific community with the Estrella genome sequence and a preliminary understanding of the biology of this recently-discovered bacterial genus, while learning to use cutting-edge technologies for sequencing and to perform bioinformatics analyses.

  15. Implementation of image transmission server system using embedded Linux

    NASA Astrophysics Data System (ADS)

    Park, Jong-Hyun; Jung, Yeon Sung; Nam, Boo Hee

    2005-12-01

    In this paper, we performed the implementation of image transmission server system using embedded system that is for the specified object and easy to install and move. Since the embedded system has lower capability than the PC, we have to reduce the quantity of calculation of the baseline JPEG image compression and transmission. We used the Redhat Linux 9.0 OS at the host PC and the target board based on embedded Linux. The image sequences are obtained from the camera attached to the FPGA (Field Programmable Gate Array) board with ALTERA cooperation chip. For effectiveness and avoiding some constraints from the vendor's own, we made the device driver using kernel module.

  16. Role of Mitochondrial Inheritance on Prostate Cancer Outcome in African American Men. Addendum

    DTIC Science & Technology

    2016-11-01

    DNA sequencing technique developed by our collaborator using single amplicon long-range PCR that permits deep coverage (10,000-20,000X on average) of...the mitochondrial genome. We have sequenced 652 samples derived from frozen fully using this technology. The additional DNA samples derived from...paraffin embedded (FFPE) tissue were more challenging, but have now been sequenced . Mapping of DNA variants in our sequenced genomes to mitochondrial

  17. MALDI Top-Down sequencing: calling N- and C-terminal protein sequences with high confidence and speed.

    PubMed

    Suckau, Detlev; Resemann, Anja

    2009-12-01

    The ability to match Top-Down protein sequencing (TDS) results by MALDI-TOF to protein sequences by classical protein database searching was evaluated in this work. Resulting from these analyses were the protein identity, the simultaneous assignment of the N- and C-termini and protein sequences of up to 70 residues from either terminus. In combination with de novo sequencing using the MALDI-TDS data, even fusion proteins were assigned and the detailed sequence around the fusion site was elucidated. MALDI-TDS allowed to efficiently match protein sequences quickly and to validate recombinant protein structures-in particular, protein termini-on the level of undigested proteins.

  18. Exact distribution of a pattern in a set of random sequences generated by a Markov source: applications to biological data

    PubMed Central

    2010-01-01

    Background In bioinformatics it is common to search for a pattern of interest in a potentially large set of rather short sequences (upstream gene regions, proteins, exons, etc.). Although many methodological approaches allow practitioners to compute the distribution of a pattern count in a random sequence generated by a Markov source, no specific developments have taken into account the counting of occurrences in a set of independent sequences. We aim to address this problem by deriving efficient approaches and algorithms to perform these computations both for low and high complexity patterns in the framework of homogeneous or heterogeneous Markov models. Results The latest advances in the field allowed us to use a technique of optimal Markov chain embedding based on deterministic finite automata to introduce three innovative algorithms. Algorithm 1 is the only one able to deal with heterogeneous models. It also permits to avoid any product of convolution of the pattern distribution in individual sequences. When working with homogeneous models, Algorithm 2 yields a dramatic reduction in the complexity by taking advantage of previous computations to obtain moment generating functions efficiently. In the particular case of low or moderate complexity patterns, Algorithm 3 exploits power computation and binary decomposition to further reduce the time complexity to a logarithmic scale. All these algorithms and their relative interest in comparison with existing ones were then tested and discussed on a toy-example and three biological data sets: structural patterns in protein loop structures, PROSITE signatures in a bacterial proteome, and transcription factors in upstream gene regions. On these data sets, we also compared our exact approaches to the tempting approximation that consists in concatenating the sequences in the data set into a single sequence. Conclusions Our algorithms prove to be effective and able to handle real data sets with multiple sequences, as well as biological patterns of interest, even when the latter display a high complexity (PROSITE signatures for example). In addition, these exact algorithms allow us to avoid the edge effect observed under the single sequence approximation, which leads to erroneous results, especially when the marginal distribution of the model displays a slow convergence toward the stationary distribution. We end up with a discussion on our method and on its potential improvements. PMID:20205909

  19. Exact distribution of a pattern in a set of random sequences generated by a Markov source: applications to biological data.

    PubMed

    Nuel, Gregory; Regad, Leslie; Martin, Juliette; Camproux, Anne-Claude

    2010-01-26

    In bioinformatics it is common to search for a pattern of interest in a potentially large set of rather short sequences (upstream gene regions, proteins, exons, etc.). Although many methodological approaches allow practitioners to compute the distribution of a pattern count in a random sequence generated by a Markov source, no specific developments have taken into account the counting of occurrences in a set of independent sequences. We aim to address this problem by deriving efficient approaches and algorithms to perform these computations both for low and high complexity patterns in the framework of homogeneous or heterogeneous Markov models. The latest advances in the field allowed us to use a technique of optimal Markov chain embedding based on deterministic finite automata to introduce three innovative algorithms. Algorithm 1 is the only one able to deal with heterogeneous models. It also permits to avoid any product of convolution of the pattern distribution in individual sequences. When working with homogeneous models, Algorithm 2 yields a dramatic reduction in the complexity by taking advantage of previous computations to obtain moment generating functions efficiently. In the particular case of low or moderate complexity patterns, Algorithm 3 exploits power computation and binary decomposition to further reduce the time complexity to a logarithmic scale. All these algorithms and their relative interest in comparison with existing ones were then tested and discussed on a toy-example and three biological data sets: structural patterns in protein loop structures, PROSITE signatures in a bacterial proteome, and transcription factors in upstream gene regions. On these data sets, we also compared our exact approaches to the tempting approximation that consists in concatenating the sequences in the data set into a single sequence. Our algorithms prove to be effective and able to handle real data sets with multiple sequences, as well as biological patterns of interest, even when the latter display a high complexity (PROSITE signatures for example). In addition, these exact algorithms allow us to avoid the edge effect observed under the single sequence approximation, which leads to erroneous results, especially when the marginal distribution of the model displays a slow convergence toward the stationary distribution. We end up with a discussion on our method and on its potential improvements.

  20. The Genome Sequence of a Widespread Apex Predator, the Golden Eagle (Aquila chrysaetos)

    PubMed Central

    Doyle, Jacqueline M.; Katzner, Todd E.; Bloom, Peter H.; Ji, Yanzhu; Wijayawardena, Bhagya K.; DeWoody, J. Andrew

    2014-01-01

    Biologists routinely use molecular markers to identify conservation units, to quantify genetic connectivity, to estimate population sizes, and to identify targets of selection. Many imperiled eagle populations require such efforts and would benefit from enhanced genomic resources. We sequenced, assembled, and annotated the first eagle genome using DNA from a male golden eagle (Aquila chrysaetos) captured in western North America. We constructed genomic libraries that were sequenced using Illumina technology and assembled the high-quality data to a depth of ∼40x coverage. The genome assembly includes 2,552 scaffolds >10 Kb and 415 scaffolds >1.2 Mb. We annotated 16,571 genes that are involved in myriad biological processes, including such disparate traits as beak formation and color vision. We also identified repetitive regions spanning 92 Mb (∼6% of the assembly), including LINES, SINES, LTR-RTs and DNA transposons. The mitochondrial genome encompasses 17,332 bp and is ∼91% identical to the Mountain Hawk-Eagle (Nisaetus nipalensis). Finally, the data reveal that several anonymous microsatellites commonly used for population studies are embedded within protein-coding genes and thus may not have evolved in a neutral fashion. Because the genome sequence includes ∼800,000 novel polymorphisms, markers can now be chosen based on their proximity to functional genes involved in migration, carnivory, and other biological processes. PMID:24759626

  1. CIDR

    Science.gov Websites

    Genotyping General Information Genome Wide Association Custom FFPE Sample Options Methylation Linkage Enrichment Options 51 Mb 51 Mb plus 6.8 - 24Mb custom option 54 Mb Clinical Exome 71 Mb (includes UTRs) Next Generation Sequencing Platform Illumina HiSeq sequencers Options for Formalin-Fixed Paraffin-Embedded (FFPE

  2. Transposable Elements Are Major Contributors to the Origin, Diversification, and Regulation of Vertebrate Long Noncoding RNAs

    PubMed Central

    Kapusta, Aurélie; Zhuo, Xiaoyu; Ramsay, LeeAnn; Bourque, Guillaume; Yandell, Mark; Feschotte, Cédric

    2013-01-01

    Advances in vertebrate genomics have uncovered thousands of loci encoding long noncoding RNAs (lncRNAs). While progress has been made in elucidating the regulatory functions of lncRNAs, little is known about their origins and evolution. Here we explore the contribution of transposable elements (TEs) to the makeup and regulation of lncRNAs in human, mouse, and zebrafish. Surprisingly, TEs occur in more than two thirds of mature lncRNA transcripts and account for a substantial portion of total lncRNA sequence (∼30% in human), whereas they seldom occur in protein-coding transcripts. While TEs contribute less to lncRNA exons than expected, several TE families are strongly enriched in lncRNAs. There is also substantial interspecific variation in the coverage and types of TEs embedded in lncRNAs, partially reflecting differences in the TE landscapes of the genomes surveyed. In human, TE sequences in lncRNAs evolve under greater evolutionary constraint than their non–TE sequences, than their intronic TEs, or than random DNA. Consistent with functional constraint, we found that TEs contribute signals essential for the biogenesis of many lncRNAs, including ∼30,000 unique sites for transcription initiation, splicing, or polyadenylation in human. In addition, we identified ∼35,000 TEs marked as open chromatin located within 10 kb upstream of lncRNA genes. The density of these marks in one cell type correlate with elevated expression of the downstream lncRNA in the same cell type, suggesting that these TEs contribute to cis-regulation. These global trends are recapitulated in several lncRNAs with established functions. Finally a subset of TEs embedded in lncRNAs are subject to RNA editing and predicted to form secondary structures likely important for function. In conclusion, TEs are nearly ubiquitous in lncRNAs and have played an important role in the lineage-specific diversification of vertebrate lncRNA repertoires. PMID:23637635

  3. Chloroplast outer envelope protein P39 in Arabidopsis thaliana belongs to the Omp85 protein family.

    PubMed

    Hsueh, Yi-Ching; Flinner, Nadine; Gross, Lucia E; Haarmann, Raimund; Mirus, Oliver; Sommer, Maik S; Schleiff, Enrico

    2017-08-01

    Proteins of the Omp85 family chaperone the membrane insertion of β-barrel-shaped outer membrane proteins in bacteria, mitochondria, and probably chloroplasts and facilitate the transfer of nuclear-encoded cytosolically synthesized preproteins across the outer envelope of chloroplasts. This protein family is characterized by N-terminal polypeptide transport-associated (POTRA) domains and a C-terminal membrane-embedded β-barrel. We have investigated a recently identified Omp85 family member of Arabidopsis thaliana annotated as P39. We show by in vitro and in vivo experiments that P39 is localized in chloroplasts. The electrophysiological properties of P39 are consistent with those of other Omp85 family members confirming the sequence based assignment of P39 to this family. Bioinformatic analysis showed that P39 lacks any POTRA domain, while a complete 16 stranded β-barrel including the highly conserved L6 loop is proposed. The electrophysiological properties are most comparable to Toc75-V, which is consistent with the phylogenetic clustering of P39 in the Toc75-V rather than the Toc75-III branch of the Omp85 family tree. Taken together P39 forms a pore with Omp85 family protein characteristics. The bioinformatic comparison of the pore region of Toc75-III, Toc75-V, and P39 shows distinctions of the barrel region most likely related to function. Proteins 2017; 85:1391-1401. © 2014 Wiley Periodicals, Inc. © 2014 Wiley Periodicals, Inc.

  4. Genetic Code Optimization for Cotranslational Protein Folding: Codon Directional Asymmetry Correlates with Antiparallel Betasheets, tRNA Synthetase Classes.

    PubMed

    Seligmann, Hervé; Warthi, Ganesh

    2017-01-01

    A new codon property, codon directional asymmetry in nucleotide content (CDA), reveals a biologically meaningful genetic code dimension: palindromic codons (first and last nucleotides identical, codon structure XZX) are symmetric (CDA = 0), codons with structures ZXX/XXZ are 5'/3' asymmetric (CDA = - 1/1; CDA = - 0.5/0.5 if Z and X are both purines or both pyrimidines, assigning negative/positive (-/+) signs is an arbitrary convention). Negative/positive CDAs associate with (a) Fujimoto's tetrahedral codon stereo-table; (b) tRNA synthetase class I/II (aminoacylate the 2'/3' hydroxyl group of the tRNA's last ribose, respectively); and (c) high/low antiparallel (not parallel) betasheet conformation parameters. Preliminary results suggest CDA-whole organism associations (body temperature, developmental stability, lifespan). Presumably, CDA impacts spatial kinetics of codon-anticodon interactions, affecting cotranslational protein folding. Some synonymous codons have opposite CDA sign (alanine, leucine, serine, and valine), putatively explaining how synonymous mutations sometimes affect protein function. Correlations between CDA and tRNA synthetase classes are weaker than between CDA and antiparallel betasheet conformation parameters. This effect is stronger for mitochondrial genetic codes, and potentially drives mitochondrial codon-amino acid reassignments. CDA reveals information ruling nucleotide-protein relations embedded in reversed (not reverse-complement) sequences (5'-ZXX-3'/5'-XXZ-3').

  5. Relevance of CARC and CRAC Cholesterol-Recognition Motifs in the Nicotinic Acetylcholine Receptor and Other Membrane-Bound Receptors.

    PubMed

    Di Scala, Coralie; Baier, Carlos J; Evans, Luke S; Williamson, Philip T F; Fantini, Jacques; Barrantes, Francisco J

    2017-01-01

    Cholesterol is a ubiquitous neutral lipid, which finely tunes the activity of a wide range of membrane proteins, including neurotransmitter and hormone receptors and ion channels. Given the scarcity of available X-ray crystallographic structures and the even fewer in which cholesterol sites have been directly visualized, application of in silico computational methods remains a valid alternative for the detection and thermodynamic characterization of cholesterol-specific sites in functionally important membrane proteins. The membrane-embedded segments of the paradigm neurotransmitter receptor for acetylcholine display a series of cholesterol consensus domains (which we have coined "CARC"). The CARC motif exhibits a preference for the outer membrane leaflet and its mirror motif, CRAC, for the inner one. Some membrane proteins possess the double CARC-CRAC sequences within the same transmembrane domain. In addition to in silico molecular modeling, the affinity, concentration dependence, and specificity of the cholesterol-recognition motif-protein interaction have recently found experimental validation in other biophysical approaches like monolayer techniques and nuclear magnetic resonance spectroscopy. From the combined studies, it becomes apparent that the CARC motif is now more firmly established as a high-affinity cholesterol-binding domain for membrane-bound receptors and remarkably conserved along phylogenetic evolution. © 2017 Elsevier Inc. All rights reserved.

  6. Centrin protein and genes in Trichomonas vaginalis and close relatives.

    PubMed

    Brugerolle, G; Bricheux, G; Coffe, G

    2000-01-01

    Anti-centrin monoclonal antibodies 20H5 and 11B2 produced against Clamydomononas centrin decorated the group of basal bodies as well as very closely attached structures in all trichomonads studied and in the devescovinids Foaina and Devescovina. Moreover, these antibodies decorated the undulating membrane in Trichomonas vaginalis, Trichomitus batrachorum, and Tritrichomonas foetus, and the cresta in Foaina. Centrin was not demonstrated in the dividing spindle and paradesmosis. Immunogold labeling, both in pre- and post-embedding, confirmed that centrin is associated with the basal body cylinder and is a component of the nine anchoring arms between the terminal plate of flagellar bases and the plasma-membrane. Centrin is also associated with the hook-shaped fibers attached to basal bodies (F1, F3), the X-fiber, and along sigmoid fibers (F2) at the pelta-axostyle junction, which is the microtubule organizing center for pelta-axostyle microtubules. There was no labeling on the striated costa and parabasal fibers nor on microtubular pelta-axostyle, but the fibrous structure inside the undulating membrane was labeled in T. vaginalis. Two proteins of 22-20 kDa corresponding to the centrin molecular mass were recognized by immunoblotting using these antibodies in the three trichomonad species examined. By screening a T. vaginalis cDNA library with 20H5 antibody, two genes encoding identical protein sequences were found. The sequence comprises the 4 typical EF-hand Ca++-binding domains present in every known centrin. Trichomonad centrin is closer to the green algal cluster (70% identity) than to the yeast Cdc31 cluster (55% identity) or the Alveolata cluster (46% identity).

  7. Transcriptional analysis of the Arabidopsis ovule by massively parallel signature sequencing

    PubMed Central

    Sánchez-León, Nidia; Arteaga-Vázquez, Mario; Alvarez-Mejía, César; Mendiola-Soto, Javier; Durán-Figueroa, Noé; Rodríguez-Leal, Daniel; Rodríguez-Arévalo, Isaac; García-Campayo, Vicenta; García-Aguilar, Marcelina; Olmedo-Monfil, Vianey; Arteaga-Sánchez, Mario; Martínez de la Vega, Octavio; Nobuta, Kan; Vemaraju, Kalyan; Meyers, Blake C.; Vielle-Calzada, Jean-Philippe

    2012-01-01

    The life cycle of flowering plants alternates between a predominant sporophytic (diploid) and an ephemeral gametophytic (haploid) generation that only occurs in reproductive organs. In Arabidopsis thaliana, the female gametophyte is deeply embedded within the ovule, complicating the study of the genetic and molecular interactions involved in the sporophytic to gametophytic transition. Massively parallel signature sequencing (MPSS) was used to conduct a quantitative large-scale transcriptional analysis of the fully differentiated Arabidopsis ovule prior to fertilization. The expression of 9775 genes was quantified in wild-type ovules, additionally detecting >2200 new transcripts mapping to antisense or intergenic regions. A quantitative comparison of global expression in wild-type and sporocyteless (spl) individuals resulted in 1301 genes showing 25-fold reduced or null activity in ovules lacking a female gametophyte, including those encoding 92 signalling proteins, 75 transcription factors, and 72 RNA-binding proteins not reported in previous studies based on microarray profiling. A combination of independent genetic and molecular strategies confirmed the differential expression of 28 of them, showing that they are either preferentially active in the female gametophyte, or dependent on the presence of a female gametophyte to be expressed in sporophytic cells of the ovule. Among 18 genes encoding pentatricopeptide-repeat proteins (PPRs) that show transcriptional activity in wild-type but not spl ovules, CIHUATEOTL (At4g38150) is specifically expressed in the female gametophyte and necessary for female gametogenesis. These results expand the nature of the transcriptional universe present in the ovule of Arabidopsis, and offer a large-scale quantitative reference of global expression for future genomic and developmental studies. PMID:22442422

  8. Transcriptional analysis of the Arabidopsis ovule by massively parallel signature sequencing.

    PubMed

    Sánchez-León, Nidia; Arteaga-Vázquez, Mario; Alvarez-Mejía, César; Mendiola-Soto, Javier; Durán-Figueroa, Noé; Rodríguez-Leal, Daniel; Rodríguez-Arévalo, Isaac; García-Campayo, Vicenta; García-Aguilar, Marcelina; Olmedo-Monfil, Vianey; Arteaga-Sánchez, Mario; de la Vega, Octavio Martínez; Nobuta, Kan; Vemaraju, Kalyan; Meyers, Blake C; Vielle-Calzada, Jean-Philippe

    2012-06-01

    The life cycle of flowering plants alternates between a predominant sporophytic (diploid) and an ephemeral gametophytic (haploid) generation that only occurs in reproductive organs. In Arabidopsis thaliana, the female gametophyte is deeply embedded within the ovule, complicating the study of the genetic and molecular interactions involved in the sporophytic to gametophytic transition. Massively parallel signature sequencing (MPSS) was used to conduct a quantitative large-scale transcriptional analysis of the fully differentiated Arabidopsis ovule prior to fertilization. The expression of 9775 genes was quantified in wild-type ovules, additionally detecting >2200 new transcripts mapping to antisense or intergenic regions. A quantitative comparison of global expression in wild-type and sporocyteless (spl) individuals resulted in 1301 genes showing 25-fold reduced or null activity in ovules lacking a female gametophyte, including those encoding 92 signalling proteins, 75 transcription factors, and 72 RNA-binding proteins not reported in previous studies based on microarray profiling. A combination of independent genetic and molecular strategies confirmed the differential expression of 28 of them, showing that they are either preferentially active in the female gametophyte, or dependent on the presence of a female gametophyte to be expressed in sporophytic cells of the ovule. Among 18 genes encoding pentatricopeptide-repeat proteins (PPRs) that show transcriptional activity in wild-type but not spl ovules, CIHUATEOTL (At4g38150) is specifically expressed in the female gametophyte and necessary for female gametogenesis. These results expand the nature of the transcriptional universe present in the ovule of Arabidopsis, and offer a large-scale quantitative reference of global expression for future genomic and developmental studies.

  9. Mapping Ribonucleotides Incorporated into DNA by Hydrolytic End-Sequencing.

    PubMed

    Orebaugh, Clinton D; Lujan, Scott A; Burkholder, Adam B; Clausen, Anders R; Kunkel, Thomas A

    2018-01-01

    Ribonucleotides embedded within DNA render the DNA sensitive to the formation of single-stranded breaks under alkali conditions. Here, we describe a next-generation sequencing method called hydrolytic end sequencing (HydEn-seq) to map ribonucleotides inserted into the genome of Saccharomyce cerevisiae strains deficient in ribonucleotide excision repair. We use this method to map several genomic features in wild-type and replicase variant yeast strains.

  10. Atomic structure of the nuclear pore complex targeting domain of a Nup116 homologue from the yeast, Candida glabrata

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sampathkumar, Parthasarathy; Kim, Seung Joong; Manglicmot, Danalyn

    2012-10-23

    The nuclear pore complex (NPC), embedded in the nuclear envelope, is a large, dynamic molecular assembly that facilitates exchange of macromolecules between the nucleus and the cytoplasm. The yeast NPC is an eightfold symmetric annular structure composed of {approx}456 polypeptide chains contributed by {approx}30 distinct proteins termed nucleoporins. Nup116, identified only in fungi, plays a central role in both protein import and mRNA export through the NPC. Nup116 is a modular protein with N-terminal 'FG' repeats containing a Gle2p-binding sequence motif and a NPC targeting domain at its C-terminus. We report the crystal structure of the NPC targeting domain ofmore » Candida glabrata Nup116, consisting of residues 882-1034 [CgNup116(882-1034)], at 1.94 {angstrom} resolution. The X-ray structure of CgNup116(882-1034) is consistent with the molecular envelope determined in solution by small-angle X-ray scattering. Structural similarities of CgNup116(882-1034) with homologous domains from Saccharomyces cerevisiae Nup116, S. cerevisiae Nup145N, and human Nup98 are discussed.« less

  11. Infrared Mass Spectrometry for Environmental and Biomedical Applications

    NASA Astrophysics Data System (ADS)

    Baltz-Knorr, M. L.; Papantonakis, M. R.; Ermer Haglund, D. R., Jr.

    2000-11-01

    Matrix-assisted laser desorption and ionization (MALDI) mass spectrometry (MS) using a tunable, ultrashort pulse, mid-infrared free electron laser (FEL) has many applications for both environmental and biomedical research. Environmentally, the characterization of stored nuclear materials has been an important area of research. We are developing a method to determine nuclear tank waste constituents using MALDI MS. This includes desorption and ionization of small organic molecules from sodium nitrate solids and slurries (similar to the salt cake found in some tanks) and also from traditional MALDI matrices. Important aspects of the technique are that it does not produce a secondary waste stream and it is potentially field-deployable using solid-state lasers. Biomedically, the ability to do proteomics is being enhanced by the sensitivity and mass accuracy provided by MALDI MS. We are using MALDI MS to identify proteins embedded in liquid matrix materials, which provide a more natural environment for the biomolecules. We are also working on coupling MALDI MS to traditional protein identification and sequencing techniques for rapid analysis of large numbers of proteins. Research supported by the Office of Naval Research and the U.S. Department of Energy

  12. Functional DNA quantification guides accurate next-generation sequencing mutation detection in formalin-fixed, paraffin-embedded tumor biopsies

    PubMed Central

    2013-01-01

    The formalin-fixed, paraffin-embedded (FFPE) biopsy is a challenging sample for molecular assays such as targeted next-generation sequencing (NGS). We compared three methods for FFPE DNA quantification, including a novel PCR assay (‘QFI-PCR’) that measures the absolute copy number of amplifiable DNA, across 165 residual clinical specimens. The results reveal the limitations of commonly used approaches, and demonstrate the value of an integrated workflow using QFI-PCR to improve the accuracy of NGS mutation detection and guide changes in input that can rescue low quality FFPE DNA. These findings address a growing need for improved quality measures in NGS-based patient testing. PMID:24001039

  13. Identification of functional features of synthetic SINEUPs, antisense lncRNAs that specifically enhance protein translation

    PubMed Central

    Kozhuharova, Ana; Sharma, Harshita; Ohyama, Takako; Fasolo, Francesca; Yamazaki, Toshio; Cotella, Diego; Santoro, Claudio; Zucchelli, Silvia; Gustincich, Stefano; Carninci, Piero

    2018-01-01

    SINEUPs are antisense long noncoding RNAs, in which an embedded SINE B2 element UP-regulates translation of partially overlapping target sense mRNAs. SINEUPs contain two functional domains. First, the binding domain (BD) is located in the region antisense to the target, providing specific targeting to the overlapping mRNA. Second, the inverted SINE B2 represents the effector domain (ED) and enhances translation. To adapt SINEUP technology to a broader number of targets, we took advantage of a high-throughput, semi-automated imaging system to optimize synthetic SINEUP BD and ED design in HEK293T cell lines. Using SINEUP-GFP as a model SINEUP, we extensively screened variants of the BD to map features needed for optimal design. We found that most active SINEUPs overlap an AUG-Kozak sequence. Moreover, we report our screening of the inverted SINE B2 sequence to identify active sub-domains and map the length of the minimal active ED. Our synthetic SINEUP-GFP screening of both BDs and EDs constitutes a broad test with flexible applications to any target gene of interest. PMID:29414979

  14. Next-gen tissue: preservation of molecular and morphological fidelity in prostate tissue.

    PubMed

    Gillard, Marc; Tom, Westin R; Antic, Tatjana; Paner, Gladell P; Lingen, Mark W; VanderWeele, David J

    2015-01-01

    Personalization of cancer therapy requires molecular evaluation of tumor tissue. Traditional tissue preservation involves formalin fixation, which degrades the quality of nucleic acids. Strategies to bank frozen prostate tissue can interfere with diagnostic studies. PAXgene is an alternative fixative that preserves protein and nucleic acid quality. Portions of prostates obtained from autopsy specimens were fixed in either 10% buffered formalin or PAXgene, and processed and embedded in paraffin. Additional sections were immediately embedded in OCT and frozen. DNA and RNA were extracted from the formalin-fixed, PAXgene-fixed, or frozen tissue. Quantitative PCR was used to compare the quality of DNA and RNA obtained from all three tissue types. In addition, 5 μm sections were cut from specimens devoid of cancer and from prostate cancer specimens obtained at prostatectomy and fixed in PAXgene. They were either stained with hematoxylin and eosin or interrogated with antibodies for p63, PSA and p504. Comparable tissue morphology was observed in both the formalin and PAXgene-fixed specimens. Similarly, immunohistochemical expression of the P63, PSA and P504 proteins was comparable between formalin and PAXgene fixation techniques. DNA from the PAXgene-fixed tissue was of similar quality to that from frozen tissue. RNA was also amplified with up to 8-fold greater efficiency in the PAXgene fixed tissue compared to the formalin-fixed tissue. Prostate specimens fixed with PAXgene have preserved histologic morphology, stain appropriately, and have preserved quality of nucleic acids. PAXgene fixation facilitates the use of prostatectomy tissue for molecular biology techniques such as next-generation sequencing.

  15. Algorithm, applications and evaluation for protein comparison by Ramanujan Fourier transform.

    PubMed

    Zhao, Jian; Wang, Jiasong; Hua, Wei; Ouyang, Pingkai

    2015-12-01

    The amino acid sequence of a protein determines its chemical properties, chain conformation and biological functions. Protein sequence comparison is of great importance to identify similarities of protein structures and infer their functions. Many properties of a protein correspond to the low-frequency signals within the sequence. Low frequency modes in protein sequences are linked to the secondary structures, membrane protein types, and sub-cellular localizations of the proteins. In this paper, we present Ramanujan Fourier transform (RFT) with a fast algorithm to analyze the low-frequency signals of protein sequences. The RFT method is applied to similarity analysis of protein sequences with the Resonant Recognition Model (RRM). The results show that the proposed fast RFT method on protein comparison is more efficient than commonly used discrete Fourier transform (DFT). RFT can detect common frequencies as significant feature for specific protein families, and the RFT spectrum heat-map of protein sequences demonstrates the information conservation in the sequence comparison. The proposed method offers a new tool for pattern recognition, feature extraction and structural analysis on protein sequences. Copyright © 2015 Elsevier Ltd. All rights reserved.

  16. Resin embedded multicycle imaging (REMI): a tool to evaluate protein domains.

    PubMed

    Busse, B L; Bezrukov, L; Blank, P S; Zimmerberg, J

    2016-08-08

    Protein complexes associated with cellular processes comprise a significant fraction of all biology, but our understanding of their heterogeneous organization remains inadequate, particularly for physiological densities of multiple protein species. Towards resolving this limitation, we here present a new technique based on resin-embedded multicycle imaging (REMI) of proteins in-situ. By stabilizing protein structure and antigenicity in acrylic resins, affinity labels were repeatedly applied, imaged, removed, and replaced. In principle, an arbitrarily large number of proteins of interest may be imaged on the same specimen with subsequent digital overlay. A series of novel preparative methods were developed to address the problem of imaging multiple protein species in areas of the plasma membrane or volumes of cytoplasm of individual cells. For multiplexed examination of antibody staining we used straightforward computational techniques to align sequential images, and super-resolution microscopy was used to further define membrane protein colocalization. We give one example of a fibroblast membrane with eight multiplexed proteins. A simple statistical analysis of this limited membrane proteomic dataset is sufficient to demonstrate the analytical power contributed by additional imaged proteins when studying membrane protein domains.

  17. Resin embedded multicycle imaging (REMI): a tool to evaluate protein domains

    PubMed Central

    Busse, B. L.; Bezrukov, L.; Blank, P. S.; Zimmerberg, J.

    2016-01-01

    Protein complexes associated with cellular processes comprise a significant fraction of all biology, but our understanding of their heterogeneous organization remains inadequate, particularly for physiological densities of multiple protein species. Towards resolving this limitation, we here present a new technique based on resin-embedded multicycle imaging (REMI) of proteins in-situ. By stabilizing protein structure and antigenicity in acrylic resins, affinity labels were repeatedly applied, imaged, removed, and replaced. In principle, an arbitrarily large number of proteins of interest may be imaged on the same specimen with subsequent digital overlay. A series of novel preparative methods were developed to address the problem of imaging multiple protein species in areas of the plasma membrane or volumes of cytoplasm of individual cells. For multiplexed examination of antibody staining we used straightforward computational techniques to align sequential images, and super-resolution microscopy was used to further define membrane protein colocalization. We give one example of a fibroblast membrane with eight multiplexed proteins. A simple statistical analysis of this limited membrane proteomic dataset is sufficient to demonstrate the analytical power contributed by additional imaged proteins when studying membrane protein domains. PMID:27499335

  18. Folding and Stabilization of Native-Sequence-Reversed Proteins

    PubMed Central

    Zhang, Yuanzhao; Weber, Jeffrey K; Zhou, Ruhong

    2016-01-01

    Though the problem of sequence-reversed protein folding is largely unexplored, one might speculate that reversed native protein sequences should be significantly more foldable than purely random heteropolymer sequences. In this article, we investigate how the reverse-sequences of native proteins might fold by examining a series of small proteins of increasing structural complexity (α-helix, β-hairpin, α-helix bundle, and α/β-protein). Employing a tandem protein structure prediction algorithmic and molecular dynamics simulation approach, we find that the ability of reverse sequences to adopt native-like folds is strongly influenced by protein size and the flexibility of the native hydrophobic core. For β-hairpins with reverse-sequences that fail to fold, we employ a simple mutational strategy for guiding stable hairpin formation that involves the insertion of amino acids into the β-turn region. This systematic look at reverse sequence duality sheds new light on the problem of protein sequence-structure mapping and may serve to inspire new protein design and protein structure prediction protocols. PMID:27113844

  19. Folding and Stabilization of Native-Sequence-Reversed Proteins

    NASA Astrophysics Data System (ADS)

    Zhang, Yuanzhao; Weber, Jeffrey K.; Zhou, Ruhong

    2016-04-01

    Though the problem of sequence-reversed protein folding is largely unexplored, one might speculate that reversed native protein sequences should be significantly more foldable than purely random heteropolymer sequences. In this article, we investigate how the reverse-sequences of native proteins might fold by examining a series of small proteins of increasing structural complexity (α-helix, β-hairpin, α-helix bundle, and α/β-protein). Employing a tandem protein structure prediction algorithmic and molecular dynamics simulation approach, we find that the ability of reverse sequences to adopt native-like folds is strongly influenced by protein size and the flexibility of the native hydrophobic core. For β-hairpins with reverse-sequences that fail to fold, we employ a simple mutational strategy for guiding stable hairpin formation that involves the insertion of amino acids into the β-turn region. This systematic look at reverse sequence duality sheds new light on the problem of protein sequence-structure mapping and may serve to inspire new protein design and protein structure prediction protocols.

  20. Optimization of Glial Fibrillary Acidic Protein Immunoreactivity in Formalin-fixed, Paraffin-Embedded Guinea Pig Brain Sections

    DTIC Science & Technology

    2003-09-01

    fixed, paraffin-embedded guinea pig brain sections using a variety of commercially available GFAP antibody clones. Of the 7 clones tested for cross...determining neuropathological consequences in the guinea pig following exposure to chemical warfare nerve agent.

  1. Determination of Protein Expression Level in Cultured Cells by Immunocytochemistry on Paraffin-embedded Cell Blocks.

    PubMed

    Poojan, Shiv; Kim, Han-Seong; Yoon, Ji-Woon; Sim, Hye Won; Hong, Kyeong-Man

    2018-05-20

    Immunofluorescent staining is currently the method of choice for determination of protein expression levels in cell-culture systems when morphological information is also necessary. The protocol of immunocytochemical staining on paraffin-embedded cell blocks, presented herein, is an excellent alternative to immunofluorescent staining on non-paraffin-embedded fixed cells. In this protocol, a paraffin cell block from HeLa cells was prepared using the thromboplastin-plasma method, and immunocytochemistry was performed for the evaluation of two proliferation markers, CKAP2 and Ki-67. The nuclei and cytoplasmic morphology of the HeLa cells were well preserved in the cell-block slides. At the same time, the CKAP2 and Ki-67 staining patterns in the immunocytochemistry were quite similar to those in immunohistochemical staining in paraffin cancer tissues. With modified cell-culture conditions, including pre-incubation of HeLa cells under serum-free conditions, the effect could be evaluated while preserving architectural information. In conclusion, immunocytochemistry on paraffin-embedded cell blocks is an excellent alternative to immunofluorescent staining.

  2. Database-independent Protein Sequencing (DiPS) Enables Full-length de Novo Protein and Antibody Sequence Determination.

    PubMed

    Savidor, Alon; Barzilay, Rotem; Elinger, Dalia; Yarden, Yosef; Lindzen, Moshit; Gabashvili, Alexandra; Adiv Tal, Ophir; Levin, Yishai

    2017-06-01

    Traditional "bottom-up" proteomic approaches use proteolytic digestion, LC-MS/MS, and database searching to elucidate peptide identities and their parent proteins. Protein sequences absent from the database cannot be identified, and even if present in the database, complete sequence coverage is rarely achieved even for the most abundant proteins in the sample. Thus, sequencing of unknown proteins such as antibodies or constituents of metaproteomes remains a challenging problem. To date, there is no available method for full-length protein sequencing, independent of a reference database, in high throughput. Here, we present Database-independent Protein Sequencing, a method for unambiguous, rapid, database-independent, full-length protein sequencing. The method is a novel combination of non-enzymatic, semi-random cleavage of the protein, LC-MS/MS analysis, peptide de novo sequencing, extraction of peptide tags, and their assembly into a consensus sequence using an algorithm named "Peptide Tag Assembler." As proof-of-concept, the method was applied to samples of three known proteins representing three size classes and to a previously un-sequenced, clinically relevant monoclonal antibody. Excluding leucine/isoleucine and glutamic acid/deamidated glutamine ambiguities, end-to-end full-length de novo sequencing was achieved with 99-100% accuracy for all benchmarking proteins and the antibody light chain. Accuracy of the sequenced antibody heavy chain, including the entire variable region, was also 100%, but there was a 23-residue gap in the constant region sequence. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.

  3. Sequence Complexity of Amyloidogenic Regions in Intrinsically Disordered Human Proteins

    PubMed Central

    Das, Swagata; Pal, Uttam; Das, Supriya; Bagga, Khyati; Roy, Anupam; Mrigwani, Arpita; Maiti, Nakul C.

    2014-01-01

    An amyloidogenic region (AR) in a protein sequence plays a significant role in protein aggregation and amyloid formation. We have investigated the sequence complexity of AR that is present in intrinsically disordered human proteins. More than 80% human proteins in the disordered protein databases (DisProt+IDEAL) contained one or more ARs. With decrease of protein disorder, AR content in the protein sequence was decreased. A probability density distribution analysis and discrete analysis of AR sequences showed that ∼8% residue in a protein sequence was in AR and the region was in average 8 residues long. The residues in the AR were high in sequence complexity and it seldom overlapped with low complexity regions (LCR), which was largely abundant in disorder proteins. The sequences in the AR showed mixed conformational adaptability towards α-helix, β-sheet/strand and coil conformations. PMID:24594841

  4. A Highly Organized Structure Mediating Nuclear Localization of a Myb2 Transcription Factor in the Protozoan Parasite Trichomonas vaginalis ▿ †

    PubMed Central

    Chu, Chien-Hsin; Chang, Lung-Chun; Hsu, Hong-Ming; Wei, Shu-Yi; Liu, Hsing-Wei; Lee, Yu; Kuo, Chung-Chi; Indra, Dharmu; Chen, Chinpan; Ong, Shiou-Jeng; Tai, Jung-Hsiang

    2011-01-01

    Nuclear proteins usually contain specific peptide sequences, referred to as nuclear localization signals (NLSs), for nuclear import. These signals remain unexplored in the protozoan pathogen, Trichomonas vaginalis. The nuclear import of a Myb2 transcription factor was studied here using immunodetection of a hemagglutinin-tagged Myb2 overexpressed in the parasite. The tagged Myb2 was localized to the nucleus as punctate signals. With mutations of its polybasic sequences, 48KKQK51 and 61KR62, Myb2 was localized to the nucleus, but the signal was diffusive. When fused to a C-terminal non-nuclear protein, the Myb2 sequence spanning amino acid (aa) residues 48 to 143, which is embedded within the R2R3 DNA-binding domain (aa 40 to 156), was essential and sufficient for efficient nuclear import of a bacterial tetracycline repressor (TetR), and yet the transport efficiency was reduced with an additional fusion of a firefly luciferase to TetR, while classical NLSs from the simian virus 40 T-antigen had no function in this assay system. Myb2 nuclear import and DNA-binding activity were substantially perturbed with mutation of a conserved isoleucine (I74) in helix 2 to proline that altered secondary structure and ternary folding of the R2R3 domain. Disruption of DNA-binding activity alone by point mutation of a lysine residue, K51, preceding the structural domain had little effect on Myb2 nuclear localization, suggesting that nuclear translocation of Myb2, which requires an ordered structural domain, is independent of its DNA binding activity. These findings provide useful information for testing whether myriad Mybs in the parasite use a common module to regulate nuclear import. PMID:22021237

  5. How Many Protein Sequences Fold to a Given Structure? A Coevolutionary Analysis.

    PubMed

    Tian, Pengfei; Best, Robert B

    2017-10-17

    Quantifying the relationship between protein sequence and structure is key to understanding the protein universe. A fundamental measure of this relationship is the total number of amino acid sequences that can fold to a target protein structure, known as the "sequence capacity," which has been suggested as a proxy for how designable a given protein fold is. Although sequence capacity has been extensively studied using lattice models and theory, numerical estimates for real protein structures are currently lacking. In this work, we have quantitatively estimated the sequence capacity of 10 proteins with a variety of different structures using a statistical model based on residue-residue co-evolution to capture the variation of sequences from the same protein family. Remarkably, we find that even for the smallest protein folds, such as the WW domain, the number of foldable sequences is extremely large, exceeding the Avogadro constant. In agreement with earlier theoretical work, the calculated sequence capacity is positively correlated with the size of the protein, or better, the density of contacts. This allows the absolute sequence capacity of a given protein to be approximately predicted from its structure. On the other hand, the relative sequence capacity, i.e., normalized by the total number of possible sequences, is an extremely tiny number and is strongly anti-correlated with the protein length. Thus, although there may be more foldable sequences for larger proteins, it will be much harder to find them. Lastly, we have correlated the evolutionary age of proteins in the CATH database with their sequence capacity as predicted by our model. The results suggest a trade-off between the opposing requirements of high designability and the likelihood of a novel fold emerging by chance. Published by Elsevier Inc.

  6. Chemical reactivation of fluorescein isothiocyanate immunofluorescence-labeled resin-embedded samples

    NASA Astrophysics Data System (ADS)

    Li, Longhui; Rao, Gong; Lv, Xiaohua; Chen, Ruixi; Cheng, Xiaofeng; Wang, Xiaojun; Zeng, Shaoqun; Liu, Xiuli

    2018-02-01

    Resin embedding is widely used and facilitates microscopic imaging of biological tissues. In contrast, quenching of fluorescence during embedding process hinders the application of resin embedding for imaging of fluorescence-labeled samples. For samples expressing fluorescent proteins, it has been demonstrated that the weakened fluorescence could be recovered by reactivating the fluorophore with alkaline buffer. We extended this idea to immunofluorescence-labeling technology. We showed that the fluorescence of pH-sensitive fluorescein isothiocyanate (FITC) was quenched after resin embedding but reactivated after treating by alkaline buffer. We observed 138.5% fluorescence preservation ratio of reactivated state, sixfold compared with the quenched state in embedding resin, which indicated its application for fluorescence imaging of high signal-to-background ratio. Furthermore, we analyzed the chemical reactivation mechanism of FITC fluorophore. This work would show a way for high-resolution imaging of immunofluorescence-labeled samples embedded in resin.

  7. Profiling of potential driver mutations in sarcomas by targeted next generation sequencing.

    PubMed

    Andersson, Carola; Fagman, Henrik; Hansson, Magnus; Enlund, Fredrik

    2016-04-01

    Comprehensive genetic profiling by massively parallel sequencing, commonly known as next generation sequencing (NGS), is becoming the foundation of personalized oncology. For sarcomas very few targeted treatments are currently in routine use. In clinical practice the preoperative diagnostic workup of soft tissue tumours largely relies on core needle biopsies. Although mostly sufficient for histopathological diagnosis, only very limited amounts of formalin fixated paraffin embedded tissue are often available for predictive mutation analysis. Targeted NGS may thus open up new possibilities for comprehensive characterization of scarce biopsies. We therefore set out to search for driver mutations by NGS in a cohort of 55 clinically and morphologically well characterized sarcomas using low input of DNA from formalin fixated paraffin embedded tissues. The aim was to investigate if there are any recurrent or targetable aberrations in cancer driver genes in addition to known chromosome translocations in different types of sarcomas. We employed a panel covering 207 mutation hotspots in 50 cancer-associated genes to analyse DNA from nine gastrointestinal stromal tumours, 14 synovial sarcomas, seven myxoid liposarcomas, 22 Ewing sarcomas and three Ewing-like small round cell tumours at a large sequencing depth to detect also mutations that are subclonal or occur at low allele frequencies. We found nine mutations in eight different potential driver genes, some of which are potentially actionable by currently existing targeted therapies. Even though no recurrent mutations in driver genes were found in the different sarcoma groups, we show that targeted NGS-based sequencing is clearly feasible in a diagnostic setting with very limited amounts of paraffin embedded tissue and may provide novel insights into mesenchymal cell signalling and potentially druggable targets. Interestingly, we also identify five non-synonymous sequence variants in 4 established cancer driver genes in DNA from normal tissue from sarcoma patients that may possibly predispose or contribute to neoplastic development. Copyright © 2016 Elsevier Inc. All rights reserved.

  8. Modular Homogeneous Chromophore–Catalyst Assemblies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mulfort, Karen L.; Utschig, Lisa M.

    2016-05-17

    Photosynthetic reaction center (RC) proteins convert incident solar energy to chemical energy through a network of molecular cofactors which have been evolutionarily tuned to couple efficient light-harvesting, directional electron transfer, and long-lived charge separation with secondary reaction sequences. These molecular cofactors are embedded within a complex protein environment which precisely positions each cofactor in optimal geometries along efficient electron transfer pathways with localized protein environments facilitating sequential and accumulative charge transfer. By contrast, it is difficult to approach a similar level of structural complexity in synthetic architectures for solar energy conversion. However, by using appropriate self-assembly strategies, we anticipate thatmore » molecular modules, which are independently synthesized and optimized for either light-harvesting or redox catalysis, can be organized into spatial arrangements that functionally mimic natural photosynthesis. In this Account, we describe a modular approach to new structural designs for artificial photosynthesis which is largely inspired by photosynthetic RC proteins. We focus on recent work from our lab which uses molecular modules for light-harvesting or proton reduction catalysis in different coordination geometries and different platforms, spanning from discrete supramolecular assemblies to molecule–nanoparticle hybrids to protein-based biohybrids. Molecular modules are particularly amenable to high-resolution characterization of the ground and excited state of each module using a variety of physical techniques; such spectroscopic interrogation helps our understanding of primary artificial photosynthetic mechanisms. In particular, we discuss the use of transient optical spectroscopy, EPR, and X-ray scattering techniques to elucidate dynamic structural behavior and light-induced kinetics and the impact on photocatalytic mechanism. Two different coordination geometries of supramolecular photocatalyst based on the [Ru(bpy)3]2+ (bpy = 2,2'-bipyridine) light-harvesting module with cobaloxime-based catalyst module are compared, with progress in stabilizing photoinduced charge separation identified. These same modules embedded in the small electron transfer protein ferredoxin exhibit much longer charge-separation, enabled by stepwise electron transfer through the native [2Fe-2S] cofactor. We anticipate that the use of interchangeable, molecular modules which can interact in different coordination geometries or within entirely different structural platforms will provide important fundamental insights into the effect of environment on parameters such as electron transfer and charge separation, and ultimately drive more efficient designs for artificial photosynthesis.« less

  9. Phylogeny and polymorphism in the long control regions E6, E7, and L1 of HPV Type 56 in women from southwest China

    PubMed Central

    Jing, Yaling; Wang, Tao; Chen, Zuyi; Ding, Xianping; Xu, Jianju; Mu, Xuemei; Cao, Man; Chen, Honghan

    2018-01-01

    Globally, human papillomavirus (HPV)-56 accounts for a small proportion of all high-risk HPV types; however, HPV-56 is detected at a higher rate in Asia, particularly in southwest China. The present study analyzed polymorphisms, intratypic variants, and genetic variability in the long control regions (LCR), E6, E7, and L1 of HPV-56 (n=75). The LCRs, E6, E7 and L1 were sequenced using a polymerase chain reaction and the sequences were submitted to GenBank. Maximum-likelihood trees were constructed using Kimura's two-parameter model, followed by secondary structure analysis and protein damaging prediction. Additionally, in order to assess the effect of variations in the LCR on putative binding sites for cellular proteins, MATCH server was used. Finally, the selection pressures of the E6-E7 and L1 genes were estimated. A total of 18 point substitutions, a 42-bp deletion and a 19-bp deletion of LCR were identified. Some of those mutations are embedded in the putative binding sites for transcription factors. 18 single nucleotide changes occurred in the E6-E7 sequence, 11/18 were non-synonymous substitutions and 7/18 were synonymous mutations. A total 24 single nucleotide changes were identified in the L1 sequence, 6/24 being non-synonymous mutations and 18/24 synonymous mutations. Selective pressure analysis predicted that the majority of mutations of HPV-56 E6, E7 and L1 were of positive selection. The phylogenetic tree demonstrated that the isolates distributed in two lineages. Data on the prevalence and genetic variation of HPV-56 types in southwest China may aid future studies on viral molecular mechanisms and contribute to future investigations of diagnostic probes and therapeutic vaccines. PMID:29568922

  10. In vitro and in vivo assessment of oral autologous artificial connective tissue characteristics that influence its performance as a graft.

    PubMed

    Fontanilla, Marta Raquel; Espinosa, Lady Giovanna

    2012-09-01

    Several studies have evaluated proteins secreted by fibroblasts comprising skin substitutes, finding that they are secreted in combinations and concentrations that promote wound healing. However, assessment of proteins secreted by oral fibroblasts forming a part of oral substitutes is scarce. In our previous work, collagen type-I scaffolds (CSs) and autologous artificial connective tissue (AACT) were produced and implanted in rabbit oral lesions, evidencing that AACT outperforms CS. The present work determined the secreted factor profile of AACT in the time of grafting as well as that of the AACT embedded in the clot. It also evaluated the proliferation and viability of AACT fibroblasts to establish the dwell time of these cells in the grafted area. Finally, it assessed whether CS, AACT, and clot-embedded AACT increase fibroblast recruitment induced by a fibrin clot, because the cell migratory response has been associated with the wound-healing outcome. We found that some of the factors secreted by AACT fibroblasts are significantly different from those secreted by clot-embedded AACT fibroblasts. Also, that the profile of proteins secreted by AACT fibroblasts and clot-embedded AACT fibroblasts is different from already reported protein secretion profiles of other engineered tissues used in treating oral mucosa wounds. It was also found that AACT fibroblasts are viable when grafted and remain in the treated area for almost 2 weeks, and that the migratory response of fibroblasts to tissue-substitute stimulus is significantly less than the migratory response induced by the clot alone. Overall, data suggest that AACT secretion of proteins is modulated by three-dimensionality and environment factors. This bioactivity and the fact that AACT does not increase fibroblast migration can be held accountable for AACT's good performance as a graft.

  11. In situ hybridization at the electron microscope level: localization of transcripts on ultrathin sections of Lowicryl K4M-embedded tissue using biotinylated probes and protein A-gold complexes

    PubMed Central

    1986-01-01

    A technique has been developed for localizing hybrids formed in situ on semi-thin and ultrathin sections of Lowicryl K4M-embedded tissue. Biotinylated dUTP (Bio-11-dUTP and/or Bio-16-dUTP) was incorporated into mitochondrial rDNA and small nuclear U1 probes by nick- translation. The probes were hybridized to sections of Drosophila ovaries and subsequently detected with an anti-biotin antibody and protein A-gold complex. On semi-thin sections, probe detection was achieved by amplification steps with anti-protein A antibody and protein A-gold with subsequent silver enhancement. At the electron microscope level, specific labeling was obtained over structures known to be the site of expression of the appropriate genes (i.e., either over mitochondria or over nuclei). The labeling pattern at the light microscope level (semi-thin sections) was consistent with that obtained at the electron microscope level. The described nonradioactive procedures for hybrid detection on Lowicryl K4M-embedded tissue sections offer several advantages: rapid signal detection: superior morphological preservation and spatial resolution; and signal-to-noise ratios equivalent to radiolabeling. PMID:3084498

  12. Foldit Standalone: a video game-derived protein structure manipulation interface using Rosetta

    PubMed Central

    Kleffner, Robert; Flatten, Jeff; Leaver-Fay, Andrew; Baker, David; Siegel, Justin B.; Khatib, Firas; Cooper, Seth

    2017-01-01

    Abstract Summary: Foldit Standalone is an interactive graphical interface to the Rosetta molecular modeling package. In contrast to most command-line or batch interactions with Rosetta, Foldit Standalone is designed to allow easy, real-time, direct manipulation of protein structures, while also giving access to the extensive power of Rosetta computations. Derived from the user interface of the scientific discovery game Foldit (itself based on Rosetta), Foldit Standalone has added more advanced features and removed the competitive game elements. Foldit Standalone was built from the ground up with a custom rendering and event engine, configurable visualizations and interactions driven by Rosetta. Foldit Standalone contains, among other features: electron density and contact map visualizations, multiple sequence alignment tools for template-based modeling, rigid body transformation controls, RosettaScripts support and an embedded Lua interpreter. Availability and Implementation: Foldit Standalone is available for download at https://fold.it/standalone, under the Rosetta license, which is free for academic and non-profit users. It is implemented in cross-platform C ++ and binary executables are available for Windows, macOS and Linux. Contact: scooper@ccs.neu.edu PMID:28481970

  13. Homing endonucleases from mobile group I introns: discovery to genome engineering

    PubMed Central

    2014-01-01

    Homing endonucleases are highly specific DNA cleaving enzymes that are encoded within genomes of all forms of microbial life including phage and eukaryotic organelles. These proteins drive the mobility and persistence of their own reading frames. The genes that encode homing endonucleases are often embedded within self-splicing elements such as group I introns, group II introns and inteins. This combination of molecular functions is mutually advantageous: the endonuclease activity allows surrounding introns and inteins to act as invasive DNA elements, while the splicing activity allows the endonuclease gene to invade a coding sequence without disrupting its product. Crystallographic analyses of representatives from all known homing endonuclease families have illustrated both their mechanisms of action and their evolutionary relationships to a wide range of host proteins. Several homing endonucleases have been completely redesigned and used for a variety of genome engineering applications. Recent efforts to augment homing endonucleases with auxiliary DNA recognition elements and/or nucleic acid processing factors has further accelerated their use for applications that demand exceptionally high specificity and activity. PMID:24589358

  14. Phylogenetic analysis of Austrian canine distemper virus strains from clinical samples from dogs and wild carnivores.

    PubMed

    Benetka, V; Leschnik, M; Affenzeller, N; Möstl, K

    2011-04-09

    Austrian field cases of canine distemper (14 dogs, one badger [Meles meles] and one stone marten [Martes foina]) from 2002 to 2007 were investigated and the case histories were summarised briefly. Phylogenetic analysis of fusion (F) and haemagglutinin (H) gene sequences revealed different canine distemper virus (CDV) lineages circulating in Austria. The majority of CDV strains detected from 2002 to 2004 were well embedded in the European lineage. One Austrian canine sample detected in 2003, with a high similarity to Hungarian sequences from 2005 to 2006, could be assigned to the Arctic group (phocine distemper virus type 2-like). The two canine sequences from 2007 formed a clearly distinct group flanked by sequences detected previously in China and the USA on an intermediate position between the European wildlife and the Asia-1 cluster. The Austrian wildlife strains (2006 and 2007) could be assigned to the European wildlife group and were most closely related to, yet clearly different from, the 2007 canine samples. To elucidate the epidemiological role of Austrian wildlife in the transmission of the disease to dogs and vice versa, H protein residues related to receptor and host specificity (residues 530 and 549) were analysed. All samples showed the amino acids expected for their host of origin, with the exception of a canine sequence from 2007, which had an intermediate position between wildlife and canine viral strains. In the period investigated, canine strains circulating in Austria could be assigned to four different lineages reflecting both a high diversity and probably different origins of virus introduction to Austria in different years.

  15. Investigation of Aspergillus fumigatus biofilm formation by various “omics” approaches

    PubMed Central

    Muszkieta, Laetitia; Beauvais, Anne; Pähtz, Vera; Gibbons, John G.; Anton Leberre, Véronique; Beau, Rémi; Shibuya, Kazutoshi; Rokas, Antonis; Francois, Jean M.; Kniemeyer, Olaf; Brakhage, Axel A.; Latgé, Jean P.

    2013-01-01

    In the lung, Aspergillus fumigatus usually forms a dense colony of filaments embedded in a polymeric extracellular matrix called biofilm (BF). This extracellular matrix embeds and glues hyphae together and protects the fungus from an outside hostile environment. This extracellular matrix is absent in fungal colonies grown under classical liquid shake conditions (PL), which were historically used to understand A. fumigatus pathobiology. Recent works have shown that the fungus in this aerial grown BF-like state exhibits reduced susceptibility to antifungal drugs and undergoes major metabolic changes that are thought to be associated to virulence. These differences in pathological and physiological characteristics between BF and liquid shake conditions suggest that the PL condition is a poor in vitro disease model. In the laboratory, A. fumigatus mycelium embedded by the extracellular matrix can be produced in vitro in aerial condition using an agar-based medium. To provide a global and accurate understanding of A. fumigatus in vitro BF growth, we utilized microarray, RNA-sequencing, and proteomic analysis to compare the global gene and protein expression profiles of A. fumigatus grown under BF and PL conditions. In this review, we will present the different signatures obtained with these three “omics” methods. We will discuss the advantages and limitations of each method and their complementarity. PMID:23407341

  16. Quantum-dot-tagged microbeads for multiplexed optical coding of biomolecules.

    PubMed

    Han, M; Gao, X; Su, J Z; Nie, S

    2001-07-01

    Multicolor optical coding for biological assays has been achieved by embedding different-sized quantum dots (zinc sulfide-capped cadmium selenide nanocrystals) into polymeric microbeads at precisely controlled ratios. Their novel optical properties (e.g., size-tunable emission and simultaneous excitation) render these highly luminescent quantum dots (QDs) ideal fluorophores for wavelength-and-intensity multiplexing. The use of 10 intensity levels and 6 colors could theoretically code one million nucleic acid or protein sequences. Imaging and spectroscopic measurements indicate that the QD-tagged beads are highly uniform and reproducible, yielding bead identification accuracies as high as 99.99% under favorable conditions. DNA hybridization studies demonstrate that the coding and target signals can be simultaneously read at the single-bead level. This spectral coding technology is expected to open new opportunities in gene expression studies, high-throughput screening, and medical diagnostics.

  17. Deep learning meets ontologies: experiments to anchor the cardiovascular disease ontology in the biomedical literature.

    PubMed

    Arguello Casteleiro, Mercedes; Demetriou, George; Read, Warren; Fernandez Prieto, Maria Jesus; Maroto, Nava; Maseda Fernandez, Diego; Nenadic, Goran; Klein, Julie; Keane, John; Stevens, Robert

    2018-04-12

    Automatic identification of term variants or acceptable alternative free-text terms for gene and protein names from the millions of biomedical publications is a challenging task. Ontologies, such as the Cardiovascular Disease Ontology (CVDO), capture domain knowledge in a computational form and can provide context for gene/protein names as written in the literature. This study investigates: 1) if word embeddings from Deep Learning algorithms can provide a list of term variants for a given gene/protein of interest; and 2) if biological knowledge from the CVDO can improve such a list without modifying the word embeddings created. We have manually annotated 105 gene/protein names from 25 PubMed titles/abstracts and mapped them to 79 unique UniProtKB entries corresponding to gene and protein classes from the CVDO. Using more than 14 M PubMed articles (titles and available abstracts), word embeddings were generated with CBOW and Skip-gram. We setup two experiments for a synonym detection task, each with four raters, and 3672 pairs of terms (target term and candidate term) from the word embeddings created. For Experiment I, the target terms for 64 UniProtKB entries were those that appear in the titles/abstracts; Experiment II involves 63 UniProtKB entries and the target terms are a combination of terms from PubMed titles/abstracts with terms (i.e. increased context) from the CVDO protein class expressions and labels. In Experiment I, Skip-gram finds term variants (full and/or partial) for 89% of the 64 UniProtKB entries, while CBOW finds term variants for 67%. In Experiment II (with the aid of the CVDO), Skip-gram finds term variants for 95% of the 63 UniProtKB entries, while CBOW finds term variants for 78%. Combining the results of both experiments, Skip-gram finds term variants for 97% of the 79 UniProtKB entries, while CBOW finds term variants for 81%. This study shows performance improvements for both CBOW and Skip-gram on a gene/protein synonym detection task by adding knowledge formalised in the CVDO and without modifying the word embeddings created. Hence, the CVDO supplies context that is effective in inducing term variability for both CBOW and Skip-gram while reducing ambiguity. Skip-gram outperforms CBOW and finds more pertinent term variants for gene/protein names annotated from the scientific literature.

  18. Analysis of sequence repeats of proteins in the PDB.

    PubMed

    Mary Rajathei, David; Selvaraj, Samuel

    2013-12-01

    Internal repeats in protein sequences play a significant role in the evolution of protein structure and function. Applications of different bioinformatics tools help in the identification and characterization of these repeats. In the present study, we analyzed sequence repeats in a non-redundant set of proteins available in the Protein Data Bank (PDB). We used RADAR for detecting internal repeats in a protein, PDBeFOLD for assessing structural similarity, PDBsum for finding functional involvement and Pfam for domain assignment of the repeats in a protein. Through the analysis of sequence repeats, we found that identity of the sequence repeats falls in the range of 20-40% and, the superimposed structures of the most of the sequence repeats maintain similar overall folding. Analysis sequence repeats at the functional level reveals that most of the sequence repeats are involved in the function of the protein through functionally involved residues in the repeat regions. We also found that sequence repeats in single and two domain proteins often contained conserved sequence motifs for the function of the domain. Copyright © 2013 Elsevier Ltd. All rights reserved.

  19. Modular protein domains: an engineering approach toward functional biomaterials.

    PubMed

    Lin, Charng-Yu; Liu, Julie C

    2016-08-01

    Protein domains and peptide sequences are a powerful tool for conferring specific functions to engineered biomaterials. Protein sequences with a wide variety of functionalities, including structure, bioactivity, protein-protein interactions, and stimuli responsiveness, have been identified, and advances in molecular biology continue to pinpoint new sequences. Protein domains can be combined to make recombinant proteins with multiple functionalities. The high fidelity of the protein translation machinery results in exquisite control over the sequence of recombinant proteins and the resulting properties of protein-based materials. In this review, we discuss protein domains and peptide sequences in the context of functional protein-based materials, composite materials, and their biological applications. Copyright © 2016 Elsevier Ltd. All rights reserved.

  20. Implementation of an RBF neural network on embedded systems: real-time face tracking and identity verification.

    PubMed

    Yang, Fan; Paindavoine, M

    2003-01-01

    This paper describes a real time vision system that allows us to localize faces in video sequences and verify their identity. These processes are image processing techniques based on the radial basis function (RBF) neural network approach. The robustness of this system has been evaluated quantitatively on eight video sequences. We have adapted our model for an application of face recognition using the Olivetti Research Laboratory (ORL), Cambridge, UK, database so as to compare the performance against other systems. We also describe three hardware implementations of our model on embedded systems based on the field programmable gate array (FPGA), zero instruction set computer (ZISC) chips, and digital signal processor (DSP) TMS320C62, respectively. We analyze the algorithm complexity and present results of hardware implementations in terms of the resources used and processing speed. The success rates of face tracking and identity verification are 92% (FPGA), 85% (ZISC), and 98.2% (DSP), respectively. For the three embedded systems, the processing speeds for images size of 288 /spl times/ 352 are 14 images/s, 25 images/s, and 4.8 images/s, respectively.

  1. Nanofluidic Device with Embedded Nanopore

    NASA Astrophysics Data System (ADS)

    Zhang, Yuning; Reisner, Walter

    2014-03-01

    Nanofluidic based devices are robust methods for biomolecular sensing and single DNA manipulation. Nanopore-based DNA sensing has attractive features that make it a leading candidate as a single-molecule DNA sequencing technology. Nanochannel based extension of DNA, combined with enzymatic or denaturation-based barcoding schemes, is already a powerful approach for genome analysis. We believe that there is revolutionary potential in devices that combine nanochannels with nanpore detectors. In particular, due to the fast translocation of a DNA molecule through a standard nanopore configuration, there is an unfavorable trade-off between signal and sequence resolution. With a combined nanochannel-nanopore device, based on embedding a nanopore inside a nanochannel, we can in principle gain independent control over both DNA translocation speed and sensing signal, solving the key draw-back of the standard nanopore configuration. We demonstrate that we can detect - using fluorescent microscopy - successful translocation of DNA from the nanochannel out through the nanopore, a possible method to 'select' a given barcode for further analysis. We also show that in equilibrium DNA will not escape through an embedded sub-persistence length nanopore until a certain voltage bias is added.

  2. Active Nuclear Import of Membrane Proteins Revisited

    PubMed Central

    Laba, Justyna K.; Steen, Anton; Popken, Petra; Chernova, Alina; Poolman, Bert; Veenhoff, Liesbeth M.

    2015-01-01

    It is poorly understood how membrane proteins destined for the inner nuclear membrane pass the crowded environment of the Nuclear Pore Complex (NPC). For the Saccharomyces cerevisiae proteins Src1/Heh1 and Heh2, a transport mechanism was proposed where the transmembrane domains diffuse through the membrane while the extralumenal domains encoding a nuclear localization signal (NLS) and intrinsically disordered linker (L) are accompanied by transport factors and travel through the NPC. Here, we validate the proposed mechanism and explore and discuss alternative interpretations of the data. First, to disprove an interpretation where the membrane proteins become membrane embedded only after nuclear import, we present biochemical and localization data to support that the previously used, as well as newly designed reporter proteins are membrane-embedded irrespective of the presence of the sorting signals, the specific transmembrane domain (multipass or tail anchored), independent of GET, and also under conditions that the proteins are trapped in the NPC. Second, using the recently established size limit for passive diffusion of membrane proteins in yeast, and using an improved assay, we confirm active import of polytopic membrane protein with extralumenal soluble domains larger than those that can pass by diffusion on similar timescales. This reinforces that NLS-L dependent active transport is distinct from passive diffusion. Thirdly, we revisit the proposed route through the center of the NPC and conclude that the previously used trapping assay is, unfortunately, poorly suited to address the route through the NPC, and the route thus remains unresolved. Apart from the uncertainty about the route through the NPC, the data confirm active, transport factor dependent, nuclear transport of membrane-embedded mono- and polytopic membrane proteins in baker’s yeast. PMID:26473931

  3. Bridging the gap between structural bioinformatics and receptor research: the membrane-embedded, ligand-gated, P2X glycoprotein receptor.

    PubMed

    Mager, Peter P; Weber, Anje; Illes, Peter

    2004-01-01

    No details on P2X receptor architecture had been known at the atomic resolution level. Using comparative homology-based molecular modelling and threading, it was attempted to predict the three-dimensional structure of P2X receptors. This prediction could not be carried out, however, because important properties of the P2X family differ considerably from that of the potential template proteins. This paper reviews an alternative approach consisting of three research fields: bioinformatics, structural modelling, and a variety of the results of biological experiments. Starting point is the amino acid sequence. Using the sequential data, the first step is a secondary structure prediction. The resulting secondary structure is converted into a three-dimensional geometry. Then, the secondary and tertiary structures are optimized by using the quantum chemistry RHF/3-21G minimal basic set and the all-atom molecular mechanics AMBER96 force field. The fold of the membrane-embedded protein is simulated by a suitable dielectricum. The structure is refined using a conjugate gradient minimizer (Fletcher-Reeves modification of the Polak-Ribiere method). The results of the geometry optimization were checked by a Ramanchandran plot, rotamer analysis, all-atom contact dots, and the C(beta) deviation. As additional tools for the model building, multiple alignment analysis and comparative sequence-function analysis were used. The approach is exemplified on the membrane-embedded, ligand-gated P2X3 receptor subunit, a monovalent-bivalent cation channel-forming glycoprotein that is activated by extracellular adenosine 5'-triphosphate. From these results, a topology of the pore-forming motif of the P2X3 receptor subunit was proposed. It is believed that a fully functional P2X channel requires a precise coupling between (i) two distinct peptide modules, an extracellularly occurring ATP-binding module and a pore module that includes a long transmembrane and short intracellular part, (ii) an interaction surface with membranes, and (iii) hydrogen bonding forces of the residues and hydrated cations. Furthermore, this paper demonstrates the role of quantitative structure-activity relationships (QSARs) in P2X research (calcium ion permeability of the wild-type and after site-directed mutagenesis of the rat P2X2 receptor protein, KN-62 analogs as competitive antagonists of the human P2X7 receptor). EXPERIMENTAL PROOFS: The predictions are experimentally testable and may provide an additional interpretation of experimental observations published in literature. In particular, there is the good agreement of the geometry optimized P2X3 structure with experimentally proposed P2X receptor models obtained by neurophysiological, biochemical, pharmacological, and mutation experiments. Although the rat P2X3 receptor subunit is more complex (397 amino acids) than the KcsA protein (160 amino acids), the overall folds of the peptide backbone atoms are similar. To avoid semantic confusion, it should be noted that "prediction" is defined in a probabilistic sense. Matches to generic rules do not mean "this is true" but rather "this might be true". Only biological and chemical knowledge can determine whether or not these predictions are meaningful. Thus, the results from the computational tools are probabilistic predictions and subject to further experimental verification. The geometry optimized P2X3 receptor subunit is freely available for academic researchers on e-mail request (PDB format).

  4. Shotgun protein sequencing: assembly of peptide tandem mass spectra from mixtures of modified proteins.

    PubMed

    Bandeira, Nuno; Clauser, Karl R; Pevzner, Pavel A

    2007-07-01

    Despite significant advances in the identification of known proteins, the analysis of unknown proteins by MS/MS still remains a challenging open problem. Although Klaus Biemann recognized the potential of MS/MS for sequencing of unknown proteins in the 1980s, low throughput Edman degradation followed by cloning still remains the main method to sequence unknown proteins. The automated interpretation of MS/MS spectra has been limited by a focus on individual spectra and has not capitalized on the information contained in spectra of overlapping peptides. Indeed the powerful shotgun DNA sequencing strategies have not been extended to automated protein sequencing. We demonstrate, for the first time, the feasibility of automated shotgun protein sequencing of protein mixtures by utilizing MS/MS spectra of overlapping and possibly modified peptides generated via multiple proteases of different specificities. We validate this approach by generating highly accurate de novo reconstructions of multiple regions of various proteins in western diamondback rattlesnake venom. We further argue that shotgun protein sequencing has the potential to overcome the limitations of current protein sequencing approaches and thus catalyze the otherwise impractical applications of proteomics methodologies in studies of unknown proteins.

  5. A Novel Marker for Purkinje Cells, Ribosomal Protein MPS1/S27: Expression of MPS1 in Human Cerebellum.

    PubMed

    Fernandez-Pol, J Alberto

    2016-01-01

    The ribosomal protein metallopanstimulin-1 (MPS1/S27) serves critical survival purposes in cell division, in normal and cancerous cells; for this reason, selective pressures of evolution have conserved the DNA sequences encoding MPS1/S27 in Archaea and Eukariotic cells. The expression of MPS1/S27 protein in human adult cerebellum has not been established. The presence of MPS1/S27, was screened in paraffin-embedded human adult brain specimens processed for tissue inmunohistochemistry. Affinity-purified specific antibodies were directed against the N-terminus of MPS1. The antibodies to MPS1 detected Purkinje cells (PC) and their dendrites. In PC, MPS1 antigen-positive staining was found in: the nucleolus, which was strongly stained; ribosomes attached to the external nuclear membrane; cytoplasm of PC, with strong staining in a punctuate fashion; the soma-attached large dendrite trunks of PC, which were MPS1 antigen-positive; and the granular cell layer, where cellular staining in a few cells that appeared to resemble smaller PC was observed. Since MPS1 is involved in cell division, DNA repair, and ribosomal biogenesis, it may be a useful antigen for studying processes such as protein synthesis, oncogenesis, regeneration, aging, and perhaps diseases of the human cerebellum. Copyright© 2016, International Institute of Anticancer Research (Dr. John G. Delinasios), All rights reserved.

  6. Structural and Biochemical Characterization of Chlamydia trachomatis Hypothetical Protein CT263 Supports That Menaquinone Synthesis Occurs through the Futalosine Pathway*

    PubMed Central

    Barta, Michael L.; Thomas, Keisha; Yuan, Hongling; Lovell, Scott; Battaile, Kevin P.; Schramm, Vern L.; Hefty, P. Scott

    2014-01-01

    The obligate intracellular human pathogen Chlamydia trachomatis is the etiological agent of blinding trachoma and sexually transmitted disease. Genomic sequencing of Chlamydia indicated this medically important bacterium was not exclusively dependent on the host cell for energy. In order for the electron transport chain to function, electron shuttling between membrane-embedded complexes requires lipid-soluble quinones (e.g. menaquionone or ubiquinone). The sources or biosynthetic pathways required to obtain these electron carriers within C. trachomatis are poorly understood. The 1.58Å crystal structure of C. trachomatis hypothetical protein CT263 presented here supports a role in quinone biosynthesis. Although CT263 lacks sequence-based functional annotation, the crystal structure of CT263 displays striking structural similarity to 5′-methylthioadenosine nucleosidase (MTAN) enzymes. Although CT263 lacks the active site-associated dimer interface found in prototypical MTANs, co-crystal structures with product (adenine) or substrate (5′-methylthioadenosine) indicate that the canonical active site residues are conserved. Enzymatic characterization of CT263 indicates that the futalosine pathway intermediate 6-amino-6-deoxyfutalosine (kcat/Km = 1.8 × 103 m−1 s−1), but not the prototypical MTAN substrates (e.g. S-adenosylhomocysteine and 5′-methylthioadenosine), is hydrolyzed. Bioinformatic analyses of the chlamydial proteome also support the futalosine pathway toward the synthesis of menaquinone in Chlamydiaceae. This report provides the first experimental support for quinone synthesis in Chlamydia. Menaquinone synthesis provides another target for agents to combat C. trachomatis infection. PMID:25253688

  7. Gene Unprediction with Spurio: A tool to identify spurious protein sequences.

    PubMed

    Höps, Wolfram; Jeffryes, Matt; Bateman, Alex

    2018-01-01

    We now have access to the sequences of tens of millions of proteins. These protein sequences are essential for modern molecular biology and computational biology. The vast majority of protein sequences are derived from gene prediction tools and have no experimental supporting evidence for their translation.  Despite the increasing accuracy of gene prediction tools there likely exists a large number of spurious protein predictions in the sequence databases.  We have developed the Spurio tool to help identify spurious protein predictions in prokaryotes.  Spurio searches the query protein sequence against a prokaryotic nucleotide database using tblastn and identifies homologous sequences. The tblastn matches are used to score the query sequence's likelihood of being a spurious protein prediction using a Gaussian process model. The most informative feature is the appearance of stop codons within the presumed translation of homologous DNA sequences. Benchmarking shows that the Spurio tool is able to distinguish spurious from true proteins. However, transposon proteins are prone to be predicted as spurious because of the frequency of degraded homologs found in the DNA sequence databases. Our initial experiments suggest that less than 1% of the proteins in the UniProtKB sequence database are likely to be spurious and that Spurio is able to identify over 60 times more spurious proteins than the AntiFam resource. The Spurio software and source code is available under an MIT license at the following URL: https://bitbucket.org/bateman-group/spurio.

  8. Comparison of Different Buffers for Protein Extraction from Formalin-Fixed and Paraffin-Embedded Tissue Specimens.

    PubMed

    Shen, Kaini; Sun, Jian; Cao, Xinxin; Zhou, Daobin; Li, Jian

    2015-01-01

    We determined the best extraction buffer for proteomic investigation using formalin-fixation and paraffin-embedded (FFPE) specimens. A Zwittergent 3-16 based buffer, sodium dodecyl sulfate (SDS)-containing buffer with/without polyethylene glycol 20000 (PEG20000), urea-containing buffer, and FFPE-FASP protein preparation kit were compared for protein extraction from different types of rat FFPE tissues, including the heart, brain, liver, lung, and kidney. All of the samples were divided into two groups of laser microdissected (LMD) and non-LMD specimens. For both kinds of specimens, Zwittergent was the most efficient buffer for identifying peptides and proteins, was broadly applicable to different tissues without impairing the enzymatic digestion, and was well compatible with mass spectrometry analysis. As a high molecular weight carrier substance, PEG20000 improved the identification of peptides and proteins; however, such an advantage is limited to tissues containing submicrograms to micrograms of protein. Considering its low lytic strength, urea-containing buffer would not be the first alternative for protein recovery. In conclusion, Zwittergent 3-16 is an effective buffer for extracting proteins from FFPE specimens for downstream proteomics analysis.

  9. Extended Full Computation-Tree Logic with Sequence Modal Operator: Representing Hierarchical Tree Structures

    NASA Astrophysics Data System (ADS)

    Kamide, Norihiro; Kaneiwa, Ken

    An extended full computation-tree logic, CTLS*, is introduced as a Kripke semantics with a sequence modal operator. This logic can appropriately represent hierarchical tree structures where sequence modal operators in CTLS* are applied to tree structures. An embedding theorem of CTLS* into CTL* is proved. The validity, satisfiability and model-checking problems of CTLS* are shown to be decidable. An illustrative example of biological taxonomy is presented using CTLS* formulas.

  10. A proteomic analysis of leaf sheaths from rice.

    PubMed

    Shen, Shihua; Matsubae, Masami; Takao, Toshifumi; Tanaka, Naoki; Komatsu, Setsuko

    2002-10-01

    The proteins extracted from the leaf sheaths of rice seedlings were separated by 2-D PAGE, and analyzed by Edman sequencing and mass spectrometry, followed by database searching. Image analysis revealed 352 protein spots on 2-D PAGE after staining with Coomassie Brilliant Blue. The amino acid sequences of 44 of 84 proteins were determined; for 31 of these proteins, a clear function could be assigned, whereas for 12 proteins, no function could be assigned. Forty proteins did not yield amino acid sequence information, because they were N-terminally blocked, or the obtained sequences were too short and/or did not give unambiguous results. Fifty-nine proteins were analyzed by mass spectrometry; all of these proteins were identified by matching to the protein database. The amino acid sequences of 19 of 27 proteins analyzed by mass spectrometry were similar to the results of Edman sequencing. These results suggest that 2-D PAGE combined with Edman sequencing and mass spectrometry analysis can be effectively used to identify plant proteins.

  11. Sequence space and the ongoing expansion of the protein universe.

    PubMed

    Povolotskaya, Inna S; Kondrashov, Fyodor A

    2010-06-17

    The need to maintain the structural and functional integrity of an evolving protein severely restricts the repertoire of acceptable amino-acid substitutions. However, it is not known whether these restrictions impose a global limit on how far homologous protein sequences can diverge from each other. Here we explore the limits of protein evolution using sequence divergence data. We formulate a computational approach to study the rate of divergence of distant protein sequences and measure this rate for ancient proteins, those that were present in the last universal common ancestor. We show that ancient proteins are still diverging from each other, indicating an ongoing expansion of the protein sequence universe. The slow rate of this divergence is imposed by the sparseness of functional protein sequences in sequence space and the ruggedness of the protein fitness landscape: approximately 98 per cent of sites cannot accept an amino-acid substitution at any given moment but a vast majority of all sites may eventually be permitted to evolve when other, compensatory, changes occur. Thus, approximately 3.5 x 10(9) yr has not been enough to reach the limit of divergent evolution of proteins, and for most proteins the limit of sequence similarity imposed by common function may not exceed that of random sequences.

  12. Genetics Home Reference: glycoprotein VI deficiency

    MedlinePlus

    ... protein called glycoprotein VI (GPVI). This protein is embedded in the outer membrane of blood cell fragments ... erythematosus (SLE). Autoimmune disorders occur when the immune system malfunctions and attacks the body's own cells and ...

  13. An improved strategy and a useful housekeeping gene for RNA analysis from formalin-fixed, paraffin-embedded tissues by PCR.

    PubMed

    Finke, J; Fritzen, R; Ternes, P; Lange, W; Dölken, G

    1993-03-01

    Specific amplification of nucleic acid sequences by PCR has been extensively used for the detection of gene rearrangements and gene expression. Although successful amplification of DNA sequences has been carried out with DNA prepared from formalin-fixed, paraffin-embedded (FFPE) tissues, there are only a few reports regarding RNA analysis in this kind of material. We describe a procedure for RNA extraction from different types of FFPE tissues, involving digestion with proteinase K followed by guanidinium-thiocyanate acid phenol extraction and DNase I digestion. These RNA preparations are suitable for PCR analysis of mRNA and even of intronless genes. Furthermore, the universally expressed porphobilinogen deaminase mRNA proved to be useful as a positive control because of the lack of pseudogenes.

  14. The nature of the embedded population in the Rho Ophiuchi dark cloud - Mid-infrared observations

    NASA Technical Reports Server (NTRS)

    Lada, C. J.; Wilking, B. A.

    1984-01-01

    In combination with previous IR and optical data, the present 10-20 micron observations of previously identified members of the embedded population of the Rho Ophiuchi dark cloud allow determinations to be made of the broadband energy distributions for 32 of the 44 sources. The majority of the sources are found to emit the bulk of their luminosity in the 1-20 micron range, and to be surrounded by dust shells. Because they are, in light of these characteristics, probably premain-sequence in nature, relatively accurate bolometric luminosities for these objects can be obtained through integration of their energy distributions. It is found that 44 percent of the sources are less luminous than the sun, and are among the lowest luminosity premain-sequence/protostellar objects observed to date.

  15. Sustained release of VEGF from PLGA nanoparticles embedded thermo-sensitive hydrogel in full-thickness porcine bladder acellular matrix

    NASA Astrophysics Data System (ADS)

    Geng, Hongquan; Song, Hua; Qi, Jun; Cui, Daxiang

    2011-12-01

    We fabricated a novel vascular endothelial growth factor (VEGF)-loaded poly(lactic- co-glycolic acid) (PLGA)-nanoparticles (NPs)-embedded thermo-sensitive hydrogel in porcine bladder acellular matrix allograft (BAMA) system, which is designed for achieving a sustained release of VEGF protein, and embedding the protein carrier into the BAMA. We identified and optimized various formulations and process parameters to get the preferred particle size, entrapment, and polydispersibility of the VEGF-NPs, and incorporated the VEGF-NPs into the (poly(ethylene oxide)-poly(propylene oxide)-poly(ethylene oxide) (Pluronic®) F127 to achieve the preferred VEGF-NPs thermo-sensitive gel system. Then the thermal behavior of the system was proven by in vitro and in vivo study, and the kinetic-sustained release profile of the system embedded in porcine bladder acellular matrix was investigated. Results indicated that the bioactivity of the encapsulated VEGF released from the NPs was reserved, and the VEGF-NPs thermo-sensitive gel system can achieve sol-gel transmission successfully at appropriate temperature. Furthermore, the system can create a satisfactory tissue-compatible environment and an effective VEGF-sustained release approach. In conclusion, a novel VEGF-loaded PLGA NPs-embedded thermo-sensitive hydrogel in porcine BAMA system is successfully prepared, to provide a promising way for deficient bladder reconstruction therapy.

  16. SIMAP—the database of all-against-all protein sequence similarities and annotations with new interfaces and increased coverage

    PubMed Central

    Arnold, Roland; Goldenberg, Florian; Mewes, Hans-Werner; Rattei, Thomas

    2014-01-01

    The Similarity Matrix of Proteins (SIMAP, http://mips.gsf.de/simap/) database has been designed to massively accelerate computationally expensive protein sequence analysis tasks in bioinformatics. It provides pre-calculated sequence similarities interconnecting the entire known protein sequence universe, complemented by pre-calculated protein features and domains, similarity clusters and functional annotations. SIMAP covers all major public protein databases as well as many consistently re-annotated metagenomes from different repositories. As of September 2013, SIMAP contains >163 million proteins corresponding to ∼70 million non-redundant sequences. SIMAP uses the sensitive FASTA search heuristics, the Smith–Waterman alignment algorithm, the InterPro database of protein domain models and the BLAST2GO functional annotation algorithm. SIMAP assists biologists by facilitating the interactive exploration of the protein sequence universe. Web-Service and DAS interfaces allow connecting SIMAP with any other bioinformatic tool and resource. All-against-all protein sequence similarity matrices of project-specific protein collections are generated on request. Recent improvements allow SIMAP to cover the rapidly growing sequenced protein sequence universe. New Web-Service interfaces enhance the connectivity of SIMAP. Novel tools for interactive extraction of protein similarity networks have been added. Open access to SIMAP is provided through the web portal; the portal also contains instructions and links for software access and flat file downloads. PMID:24165881

  17. STED super-resolution microscopy of clinical paraffin-embedded human rectal cancer tissue.

    PubMed

    Ilgen, Peter; Stoldt, Stefan; Conradi, Lena-Christin; Wurm, Christian Andreas; Rüschoff, Josef; Ghadimi, B Michael; Liersch, Torsten; Jakobs, Stefan

    2014-01-01

    Formalin fixed and paraffin-embedded human tissue resected during cancer surgery is indispensable for diagnostic and therapeutic purposes and represents a vast and largely unexploited resource for research. Optical microscopy of such specimen is curtailed by the diffraction-limited resolution of conventional optical microscopy. To overcome this limitation, we used STED super-resolution microscopy enabling optical resolution well below the diffraction barrier. We visualized nanoscale protein distributions in sections of well-annotated paraffin-embedded human rectal cancer tissue stored in a clinical repository. Using antisera against several mitochondrial proteins, STED microscopy revealed distinct sub-mitochondrial protein distributions, suggesting a high level of structural preservation. Analysis of human tissues stored for up to 17 years demonstrated that these samples were still amenable for super-resolution microscopy. STED microscopy of sections of HER2 positive rectal adenocarcinoma revealed details in the surface and intracellular HER2 distribution that were blurred in the corresponding conventional images, demonstrating the potential of super-resolution microscopy to explore the thus far largely untapped nanoscale regime in tissues stored in biorepositories.

  18. STED Super-Resolution Microscopy of Clinical Paraffin-Embedded Human Rectal Cancer Tissue

    PubMed Central

    Wurm, Christian Andreas; Rüschoff, Josef; Ghadimi, B. Michael; Liersch, Torsten; Jakobs, Stefan

    2014-01-01

    Formalin fixed and paraffin-embedded human tissue resected during cancer surgery is indispensable for diagnostic and therapeutic purposes and represents a vast and largely unexploited resource for research. Optical microscopy of such specimen is curtailed by the diffraction-limited resolution of conventional optical microscopy. To overcome this limitation, we used STED super-resolution microscopy enabling optical resolution well below the diffraction barrier. We visualized nanoscale protein distributions in sections of well-annotated paraffin-embedded human rectal cancer tissue stored in a clinical repository. Using antisera against several mitochondrial proteins, STED microscopy revealed distinct sub-mitochondrial protein distributions, suggesting a high level of structural preservation. Analysis of human tissues stored for up to 17 years demonstrated that these samples were still amenable for super-resolution microscopy. STED microscopy of sections of HER2 positive rectal adenocarcinoma revealed details in the surface and intracellular HER2 distribution that were blurred in the corresponding conventional images, demonstrating the potential of super-resolution microscopy to explore the thus far largely untapped nanoscale regime in tissues stored in biorepositories. PMID:25025184

  19. Nanochannel Device with Embedded Nanopore: a New Approach for Single-Molecule DNA Analysis and Manipulation

    NASA Astrophysics Data System (ADS)

    Zhang, Yuning; Reisner, Walter

    2012-02-01

    Nanopore and nanochannel based devices are robust methods for biomolecular sensing and single DNA manipulation. Nanopore-based DNA sensing has attractive features that make it a leading candidate as a single-molecule DNA sequencing technology. Nanochannel based extension of DNA, combined with enzymatic or denaturation-based barcoding schemes, is already a powerful approach for genome analysis. We believe that there is revolutionary potential in devices that combine nanochannels with nanpore detectors. In particular, due to the fast translocation of a DNA molecule through a standard nanopore configuration, there is an unfavorable trade-off between signal and sequence resolution. With a combined nanochannel-nanopore device, based on embedding a nanopore inside a nanochannel, we can in principle gain independent control over both DNA translocation speed and sensing signal, solving the key draw-back of the standard nanopore configuration. We will discuss our recent progress on device fabrication and characterization. In particular, we demonstrate that we can detect - using fluorescent microscopy - successful translocation of DNA from the nanochannel out through the nanopore, a possible method to 'select' a given barcode for further analysis. In particular, we show that in equilibrium DNA will not escape through an embedded sub-persistence length nanopore, suggesting that the embedded pore could be used as a nanoscale window through which to interrogate a nanochannel extended DNA molecule.

  20. Nanochannel Device with Embedded Nanopore: a New Approach for Single-Molecule DNA Analysis and Manipulation

    NASA Astrophysics Data System (ADS)

    Zhang, Yuning; Reisner, Walter

    2013-03-01

    Nanopore and nanochannel based devices are robust methods for biomolecular sensing and single DNA manipulation. Nanopore-based DNA sensing has attractive features that make it a leading candidate as a single-molecule DNA sequencing technology. Nanochannel based extension of DNA, combined with enzymatic or denaturation-based barcoding schemes, is already a powerful approach for genome analysis. We believe that there is revolutionary potential in devices that combine nanochannels with embedded pore detectors. In particular, due to the fast translocation of a DNA molecule through a standard nanopore configuration, there is an unfavorable trade-off between signal and sequence resolution. With a combined nanochannel-nanopore device, based on embedding a pore inside a nanochannel, we can in principle gain independent control over both DNA translocation speed and sensing signal, solving the key draw-back of the standard nanopore configuration. We demonstrate that we can optically detect successful translocation of DNA from the nanochannel out through the nanopore, a possible method to 'select' a given barcode for further analysis. In particular, we show that in equilibrium DNA will not escape through an embedded sub-persistence length nanopore, suggesting that the pore could be used as a nanoscale window through which to interrogate a nanochannel extended DNA molecule. Furthermore, electrical measurements through the nanopore are performed, indicating that DNA sensing is feasible using the nanochannel-nanopore device.

  1. p53 inactivation in chewing tobacco-induced oral cancers and leukoplakias from India.

    PubMed

    Saranath, D; Tandle, A T; Teni, T R; Dedhia, P M; Borges, A M; Parikh, D; Sanghavi, V; Mehta, A R

    1999-05-01

    The inactivation of p53 tumour suppressor gene vis-á-vis point mutation, overexpression and degradation due to Human Papilloma virus (HPV) 16/18 infection, was examined in chewing tobacco-associated oral cancers and oral leukoplakias from India. The analysis of mutations was assessed by polymerase chain reaction (PCR) with single strand conformation polymorphism (PCR-SSCP) of exons 5-9 on DNA from 83 oral cancer cases, and the mutations confirmed by direct nucleotide sequencing of the PCR products. p53 protein expression was evaluated by immunohistochemical analysis on paraffin-embedded sections of 62 representative oral cancer biopsies and 22 leukoplakias, using p53-specific monoclonal antibody DO-7. The presence of HPV16/18 was detected in the 83 oral cancer cases by PCR analysis using HPV L1 consensus sequences, followed by Southern hybridization with type-specific oligonucleotide probes. Forty-six per cent (38/83) of oral cancer tumours showed p53 alterations, with 17% (14/83) showing point mutations, 37% (23/62) with overexpression and 25% (21/83) with presence of HPV16 wherein the E6 HPV16 protein degrades p53. HPV18 was not detected in any of the samples. Ninety-two per cent concordance was observed between missense point mutations and overexpression of p53 protein. A significant correlation was not observed between p53 alterations in oral cancer and clinico-pathological profile of the patients. Twenty-seven per cent (6/22) of oral leukoplakias showed p53 overexpression. The overall p53 alterations in oral cancer tissues and oral lesions are comparable to data from the oral cancers reported in the Western countries with smoking and alcohol-associated oral cancers, and suggest a critical role for p53 gene in a significant proportion of oral cancers from India. The overexpression of p53 protein in leukoplakias may serve as a valuable biomarker for identifying individuals at high risk of transformation to malignant phenotype.

  2. A Unique Sequence of Financial Accounting Courses Featuring Team Teaching, Linked Courses, Challenging Assignments, and Instruments for Evaluation and Assessment

    ERIC Educational Resources Information Center

    Lundblad, Heidemarie; Wilson, Barbara A.

    2008-01-01

    The Department of Accounting at California State University Northridge (CSUN) has developed a unique sequence of courses designed to ensure that accounting students are trained not only in technical accounting, but also acquire critical thinking, research and communication skills. The courses have proven effective and have embedded assessment…

  3. Sequence-based prediction of protein-binding sites in DNA: comparative study of two SVM models.

    PubMed

    Park, Byungkyu; Im, Jinyong; Tuvshinjargal, Narankhuu; Lee, Wook; Han, Kyungsook

    2014-11-01

    As many structures of protein-DNA complexes have been known in the past years, several computational methods have been developed to predict DNA-binding sites in proteins. However, its inverse problem (i.e., predicting protein-binding sites in DNA) has received much less attention. One of the reasons is that the differences between the interaction propensities of nucleotides are much smaller than those between amino acids. Another reason is that DNA exhibits less diverse sequence patterns than protein. Therefore, predicting protein-binding DNA nucleotides is much harder than predicting DNA-binding amino acids. We computed the interaction propensity (IP) of nucleotide triplets with amino acids using an extensive dataset of protein-DNA complexes, and developed two support vector machine (SVM) models that predict protein-binding nucleotides from sequence data alone. One SVM model predicts protein-binding nucleotides using DNA sequence data alone, and the other SVM model predicts protein-binding nucleotides using both DNA and protein sequences. In a 10-fold cross-validation with 1519 DNA sequences, the SVM model that uses DNA sequence data only predicted protein-binding nucleotides with an accuracy of 67.0%, an F-measure of 67.1%, and a Matthews correlation coefficient (MCC) of 0.340. With an independent dataset of 181 DNAs that were not used in training, it achieved an accuracy of 66.2%, an F-measure 66.3% and a MCC of 0.324. Another SVM model that uses both DNA and protein sequences achieved an accuracy of 69.6%, an F-measure of 69.6%, and a MCC of 0.383 in a 10-fold cross-validation with 1519 DNA sequences and 859 protein sequences. With an independent dataset of 181 DNAs and 143 proteins, it showed an accuracy of 67.3%, an F-measure of 66.5% and a MCC of 0.329. Both in cross-validation and independent testing, the second SVM model that used both DNA and protein sequence data showed better performance than the first model that used DNA sequence data. To the best of our knowledge, this is the first attempt to predict protein-binding nucleotides in a given DNA sequence from the sequence data alone. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  4. Shotgun Protein Sequencing with Meta-contig Assembly*

    PubMed Central

    Guthals, Adrian; Clauser, Karl R.; Bandeira, Nuno

    2012-01-01

    Full-length de novo sequencing from tandem mass (MS/MS) spectra of unknown proteins such as antibodies or proteins from organisms with unsequenced genomes remains a challenging open problem. Conventional algorithms designed to individually sequence each MS/MS spectrum are limited by incomplete peptide fragmentation or low signal to noise ratios and tend to result in short de novo sequences at low sequencing accuracy. Our shotgun protein sequencing (SPS) approach was developed to ameliorate these limitations by first finding groups of unidentified spectra from the same peptides (contigs) and then deriving a consensus de novo sequence for each assembled set of spectra (contig sequences). But whereas SPS enables much more accurate reconstruction of de novo sequences longer than can be recovered from individual MS/MS spectra, it still requires error-tolerant matching to homologous proteins to group smaller contig sequences into full-length protein sequences, thus limiting its effectiveness on sequences from poorly annotated proteins. Using low and high resolution CID and high resolution HCD MS/MS spectra, we address this limitation with a Meta-SPS algorithm designed to overlap and further assemble SPS contigs into Meta-SPS de novo contig sequences extending as long as 100 amino acids at over 97% accuracy without requiring any knowledge of homologous protein sequences. We demonstrate Meta-SPS using distinct MS/MS data sets obtained with separate enzymatic digestions and discuss how the remaining de novo sequencing limitations relate to MS/MS acquisition settings. PMID:22798278

  5. Shotgun protein sequencing with meta-contig assembly.

    PubMed

    Guthals, Adrian; Clauser, Karl R; Bandeira, Nuno

    2012-10-01

    Full-length de novo sequencing from tandem mass (MS/MS) spectra of unknown proteins such as antibodies or proteins from organisms with unsequenced genomes remains a challenging open problem. Conventional algorithms designed to individually sequence each MS/MS spectrum are limited by incomplete peptide fragmentation or low signal to noise ratios and tend to result in short de novo sequences at low sequencing accuracy. Our shotgun protein sequencing (SPS) approach was developed to ameliorate these limitations by first finding groups of unidentified spectra from the same peptides (contigs) and then deriving a consensus de novo sequence for each assembled set of spectra (contig sequences). But whereas SPS enables much more accurate reconstruction of de novo sequences longer than can be recovered from individual MS/MS spectra, it still requires error-tolerant matching to homologous proteins to group smaller contig sequences into full-length protein sequences, thus limiting its effectiveness on sequences from poorly annotated proteins. Using low and high resolution CID and high resolution HCD MS/MS spectra, we address this limitation with a Meta-SPS algorithm designed to overlap and further assemble SPS contigs into Meta-SPS de novo contig sequences extending as long as 100 amino acids at over 97% accuracy without requiring any knowledge of homologous protein sequences. We demonstrate Meta-SPS using distinct MS/MS data sets obtained with separate enzymatic digestions and discuss how the remaining de novo sequencing limitations relate to MS/MS acquisition settings.

  6. Efficient sidelobe ASK based dual-function radar-communications

    NASA Astrophysics Data System (ADS)

    Hassanien, Aboulnasr; Amin, Moeness G.; Zhang, Yimin D.; Ahmad, Fauzia

    2016-05-01

    Recently, dual-function radar-communications (DFRC) has been proposed as means to mitigate the spectrum congestion problem. Existing amplitude-shift keying (ASK) methods for information embedding do not take full advantage of the highest permissable sidelobe level. In this paper, a new ASK-based signaling strategy for enhancing the signal-to-noise ratio (SNR) at the communication receiver is proposed. The proposed method employs one reference waveform and simultaneously transmits a number of orthogonal waveforms equals to the number of 1's in the binary sequence being embedded. 3 dB SNR gain is achieved using the proposed method as compared to existing sidelobe ASK methods. The effectiveness of the proposed information embedding strategy is verified using simulations examples.

  7. Floating gate memory with charge storage dots array formed by Dps protein modified with site-specific binding peptides

    NASA Astrophysics Data System (ADS)

    Kamitake, Hiroki; Uenuma, Mutsunori; Okamoto, Naofumi; Horita, Masahiro; Ishikawa, Yasuaki; Yamashita, Ichro; Uraoka, Yukiharu

    2015-05-01

    We report a nanodot (ND) floating gate memory (NFGM) with a high-density ND array formed by a biological nano process. We utilized two kinds of cage-shaped proteins displaying SiO2 binding peptide (minTBP-1) on their outer surfaces: ferritin and Dps, which accommodate cobalt oxide NDs in their cavities. The diameters of the cobalt NDs were regulated by the cavity sizes of the proteins. Because minTBP-1 is strongly adsorbed on the SiO2 surface, high-density cobalt oxide ND arrays were obtained by a simple spin coating process. The densities of cobalt oxide ND arrays based on ferritin and Dps were 6.8 × 1011 dots cm-2 and 1.2 × 1012 dots cm-2, respectively. After selective protein elimination and embedding in a metal-oxide-semiconductor (MOS) capacitor, the charge capacities of both ND arrays were evaluated by measuring their C-V characteristics. The MOS capacitor embedded with the Dps ND array showed a wider memory window than the device embedded with the ferritin ND array. Finally, we fabricated an NFGM with a high-density ND array based on Dps, and confirmed its competent writing/erasing characteristics and long retention time.

  8. Integrated pipeline for inferring the evolutionary history of a gene family embedded in the species tree: a case study on the STIMATE gene family.

    PubMed

    Song, Jia; Zheng, Sisi; Nguyen, Nhung; Wang, Youjun; Zhou, Yubin; Lin, Kui

    2017-10-03

    Because phylogenetic inference is an important basis for answering many evolutionary problems, a large number of algorithms have been developed. Some of these algorithms have been improved by integrating gene evolution models with the expectation of accommodating the hierarchy of evolutionary processes. To the best of our knowledge, however, there still is no single unifying model or algorithm that can take all evolutionary processes into account through a stepwise or simultaneous method. On the basis of three existing phylogenetic inference algorithms, we built an integrated pipeline for inferring the evolutionary history of a given gene family; this pipeline can model gene sequence evolution, gene duplication-loss, gene transfer and multispecies coalescent processes. As a case study, we applied this pipeline to the STIMATE (TMEM110) gene family, which has recently been reported to play an important role in store-operated Ca 2+ entry (SOCE) mediated by ORAI and STIM proteins. We inferred their phylogenetic trees in 69 sequenced chordate genomes. By integrating three tree reconstruction algorithms with diverse evolutionary models, a pipeline for inferring the evolutionary history of a gene family was developed, and its application was demonstrated.

  9. Exploring the correlation between the sequence composition of the nucleotide binding G5 loop of the FeoB GTPase domain (NFeoB) and intrinsic rate of GDP release.

    PubMed

    Guilfoyle, Amy P; Deshpande, Chandrika N; Schenk, Gerhard; Maher, Megan J; Jormakka, Mika

    2014-12-12

    GDP release from GTPases is usually extremely slow and is in general assisted by external factors, such as association with guanine exchange factors or membrane-embedded GPCRs (G protein-coupled receptors), which accelerate the release of GDP by several orders of magnitude. Intrinsic factors can also play a significant role; a single amino acid substitution in one of the guanine nucleotide recognition motifs, G5, results in a drastically altered GDP release rate, indicating that the sequence composition of this motif plays an important role in spontaneous GDP release. In the present study, we used the GTPase domain from EcNFeoB (Escherichia coli FeoB) as a model and applied biochemical and structural approaches to evaluate the role of all the individual residues in the G5 loop. Our study confirms that several of the residues in the G5 motif have an important role in the intrinsic affinity and release of GDP. In particular, a T151A mutant (third residue of the G5 loop) leads to a reduced nucleotide affinity and provokes a drastically accelerated dissociation of GDP.

  10. ITEMS Project: An online sequence for teaching mathematics and astronomy

    NASA Astrophysics Data System (ADS)

    Martínez, Bernat; Pérez, Josep

    2010-10-01

    This work describes an elearning sequence for teaching geometry and astronomy in lower secondary school created inside the ITEMS (Improving Teacher Education in Mathematics and Science) project. It is based on results from the astronomy education research about studentsŠ difficulties in understanding elementary astronomical observations and models. The sequence consists of a set of computer animations embedded in an elearning environment aimed at supporting students in learning about astronomy ideas that require the use of geometrical concepts and visual-spatial reasoning.

  11. Sequence of events in measles virus replication: role of phosphoprotein-nucleocapsid interactions.

    PubMed

    Brunel, Joanna; Chopy, Damien; Dosnon, Marion; Bloyet, Louis-Marie; Devaux, Patricia; Urzua, Erica; Cattaneo, Roberto; Longhi, Sonia; Gerlier, Denis

    2014-09-01

    The genome of nonsegmented negative-strand RNA viruses is tightly embedded within a nucleocapsid made of a nucleoprotein (N) homopolymer. To ensure processive RNA synthesis, the viral polymerase L in complex with its cofactor phosphoprotein (P) binds the nucleocapsid that constitutes the functional template. Measles virus P and N interact through two binding sites. While binding of the P amino terminus with the core of N (NCORE) prevents illegitimate encapsidation of cellular RNA, the interaction between their C-terminal domains, P(XD) and N(TAIL) is required for viral RNA synthesis. To investigate the binding dynamics between the two latter domains, the P(XD) F497 residue that makes multiple hydrophobic intramolecular interactions was mutated. Using a quantitative mammalian protein complementation assay and recombinant viruses, we found that an increase in P(XD)-to-N(TAIL) binding strength is associated with a slower transcript accumulation rate and that abolishing the interaction renders the polymerase nonfunctional. The use of a newly developed system allowing conditional expression of wild-type or mutated P genes, revealed that the loss of the P(XD)-N(TAIL) interaction results in reduced transcription by preformed transcriptases, suggesting reduced engagement on the genomic template. These intracellular data indicate that the viral polymerase entry into and progression along its genomic template relies on a protein-protein interaction that serves as a tightly controlled dynamic anchor. Mononegavirales have a unique machinery to replicate RNA. Processivity of their polymerase is only achieved when the genome template is entirely embedded into a helical homopolymer of nucleoproteins that constitutes the nucleocapsid. The polymerase binds to the nucleocapsid template through the phosphoprotein. How the polymerase complex enters and travels along the nucleocapsid template to ensure uninterrupted synthesis of up to ∼ 6,700-nucleotide messenger RNAs from six to ten consecutive genes is unknown. Using a quantitative protein complementation assay and a biGene-biSilencing system allowing conditional expression of two P genes copies, the role of the P-to-N interaction in polymerase function was further characterized. We report here a dynamic protein anchoring mechanism that differs from all other known polymerases that rely only onto a sustained and direct binding to their nucleic acid template. Copyright © 2014, American Society for Microbiology. All Rights Reserved.

  12. Visualizing and Clustering Protein Similarity Networks: Sequences, Structures, and Functions.

    PubMed

    Mai, Te-Lun; Hu, Geng-Ming; Chen, Chi-Ming

    2016-07-01

    Research in the recent decade has demonstrated the usefulness of protein network knowledge in furthering the study of molecular evolution of proteins, understanding the robustness of cells to perturbation, and annotating new protein functions. In this study, we aimed to provide a general clustering approach to visualize the sequence-structure-function relationship of protein networks, and investigate possible causes for inconsistency in the protein classifications based on sequences, structures, and functions. Such visualization of protein networks could facilitate our understanding of the overall relationship among proteins and help researchers comprehend various protein databases. As a demonstration, we clustered 1437 enzymes by their sequences and structures using the minimum span clustering (MSC) method. The general structure of this protein network was delineated at two clustering resolutions, and the second level MSC clustering was found to be highly similar to existing enzyme classifications. The clustering of these enzymes based on sequence, structure, and function information is consistent with each other. For proteases, the Jaccard's similarity coefficient is 0.86 between sequence and function classifications, 0.82 between sequence and structure classifications, and 0.78 between structure and function classifications. From our clustering results, we discussed possible examples of divergent evolution and convergent evolution of enzymes. Our clustering approach provides a panoramic view of the sequence-structure-function network of proteins, helps visualize the relation between related proteins intuitively, and is useful in predicting the structure and function of newly determined protein sequences.

  13. MIPS: a database for protein sequences, homology data and yeast genome information.

    PubMed Central

    Mewes, H W; Albermann, K; Heumann, K; Liebl, S; Pfeiffer, F

    1997-01-01

    The MIPS group (Martinsried Institute for Protein Sequences) at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, collects, processes and distributes protein sequence data within the framework of the tripartite association of the PIR-International Protein Sequence Database (,). MIPS contributes nearly 50% of the data input to the PIR-International Protein Sequence Database. The database is distributed on CD-ROM together with PATCHX, an exhaustive supplement of unique, unverified protein sequences from external sources compiled by MIPS. Through its WWW server (http://www.mips.biochem.mpg.de/ ) MIPS permits internet access to sequence databases, homology data and to yeast genome information. (i) Sequence similarity results from the FASTA program () are stored in the FASTA database for all proteins from PIR-International and PATCHX. The database is dynamically maintained and permits instant access to FASTA results. (ii) Starting with FASTA database queries, proteins have been classified into families and superfamilies (PROT-FAM). (iii) The HPT (hashed position tree) data structure () developed at MIPS is a new approach for rapid sequence and pattern searching. (iv) MIPS provides access to the sequence and annotation of the complete yeast genome (), the functional classification of yeast genes (FunCat) and its graphical display, the 'Genome Browser' (). A CD-ROM based on the JAVA programming language providing dynamic interactive access to the yeast genome and the related protein sequences has been compiled and is available on request. PMID:9016498

  14. BLAST and FASTA similarity searching for multiple sequence alignment.

    PubMed

    Pearson, William R

    2014-01-01

    BLAST, FASTA, and other similarity searching programs seek to identify homologous proteins and DNA sequences based on excess sequence similarity. If two sequences share much more similarity than expected by chance, the simplest explanation for the excess similarity is common ancestry-homology. The most effective similarity searches compare protein sequences, rather than DNA sequences, for sequences that encode proteins, and use expectation values, rather than percent identity, to infer homology. The BLAST and FASTA packages of sequence comparison programs provide programs for comparing protein and DNA sequences to protein databases (the most sensitive searches). Protein and translated-DNA comparisons to protein databases routinely allow evolutionary look back times from 1 to 2 billion years; DNA:DNA searches are 5-10-fold less sensitive. BLAST and FASTA can be run on popular web sites, but can also be downloaded and installed on local computers. With local installation, target databases can be customized for the sequence data being characterized. With today's very large protein databases, search sensitivity can also be improved by searching smaller comprehensive databases, for example, a complete protein set from an evolutionarily neighboring model organism. By default, BLAST and FASTA use scoring strategies target for distant evolutionary relationships; for comparisons involving short domains or queries, or searches that seek relatively close homologs (e.g. mouse-human), shallower scoring matrices will be more effective. Both BLAST and FASTA provide very accurate statistical estimates, which can be used to reliably identify protein sequences that diverged more than 2 billion years ago.

  15. Isolation and characterization of target sequences of the chicken CdxA homeobox gene.

    PubMed Central

    Margalit, Y; Yarus, S; Shapira, E; Gruenbaum, Y; Fainsod, A

    1993-01-01

    The DNA binding specificity of the chicken homeodomain protein CDXA was studied. Using a CDXA-glutathione-S-transferase fusion protein, DNA fragments containing the binding site for this protein were isolated. The sources of DNA were oligonucleotides with random sequence and chicken genomic DNA. The DNA fragments isolated were sequenced and tested in DNA binding assays. Sequencing revealed that most DNA fragments are AT rich which is a common feature of homeodomain binding sites. By electrophoretic mobility shift assays it was shown that the different target sequences isolated bind to the CDXA protein with different affinities. The specific sequences bound by the CDXA protein in the genomic fragments isolated, were determined by DNase I footprinting. From the footprinted sequences, the CDXA consensus binding site was determined. The CDXA protein binds the consensus sequence A, A/T, T, A/T, A, T, A/G. The CAUDAL binding site in the ftz promoter is also included in this consensus sequence. When tested, some of the genomic target sequences were capable of enhancing the transcriptional activity of reporter plasmids when introduced into CDXA expressing cells. This study determined the DNA sequence specificity of the CDXA protein and it also shows that this protein can further activate transcription in cells in culture. Images PMID:7909943

  16. Detection of hepatitis C virus RNA using ligation-dependent polymerase chain reaction in formalin-fixed, paraffin-embedded liver tissues.

    PubMed Central

    Park, Y. N.; Abe, K.; Li, H.; Hsuih, T.; Thung, S. N.; Zhang, D. Y.

    1996-01-01

    Reverse transcription polymerase chain reaction (RT-PCR) has been used to detect hepatitis C virus (HCV) sequences in liver tissue. However, RT-PCR has a variable detection sensitivity, especially on routinely processed formalin-fixed, paraffin-embedded (FFPE) specimens. RNA-RNA and RNA-protein cross-links formed during formalin fixation is the major limiting factor preventing reverse trans criptase from extending the primers. To overcome this problem, we applied the ligation-dependent PCR (LD-PCR) for the detection of HCV RNA in FFPE liver tissue. This method uses two capture probes for RNA isolation and two hemiprobes for the subsequent PCR. Despite cross-links, the capture probes and the hemiprobes are able to form hybrids with HCV RNAs released from the FFPE tissue. The hybrids are isolated through binding of the capture probes to paramagnetic beads. The hemiprobes are then ligated by a T4 DNA ligase to form a full probe that serves as a template for the Taq DNA polymerase. A total of 22 FFPE liver specimens, 21 with hepatocellular carcinoma (HCC) and 1 with biliary cirrhosis secondary to bile duct atresia were selected for this study, of which 13 patients were HCV seropositive and 9 seronegative. HCV RNA was detectable by ID-PCR from all 13 HCV-seropositive HCCs and from 5 of 8 HCV-seronegative HCCs but not from the HCV-seronegative liver with biliary atresia. By contrast, RT-PCR detected HCV sequences in only 5 of the HCV-sero-positive and in 1 of the HCV-seronegative HCCs. To resolve the discordance between the LD-PCR and RT-PCR results, RT-PCR was performed on frozen liver tissue of the discrepant specimens, which confirmed the LD-PCR positive results. In conclusion, LD-PCR is a more sensitive method than RT-PCR for the detection of HCV sequences in routinely processed liver tissues. A high rate of HCV infection (86%) is found in HCC specimens, indicating a previously underestimated role of HCV in HCC pathogenesis. Images Figure 2 PMID:8909238

  17. Terminal sequence importance of de novo proteins from binary-patterned library: stable artificial proteins with 11- or 12-amino acid alphabet.

    PubMed

    Okura, Hiromichi; Takahashi, Tsuyoshi; Mihara, Hisakazu

    2012-06-01

    Successful approaches of de novo protein design suggest a great potential to create novel structural folds and to understand natural rules of protein folding. For these purposes, smaller and simpler de novo proteins have been developed. Here, we constructed smaller proteins by removing the terminal sequences from stable de novo vTAJ proteins and compared stabilities between mutant and original proteins. vTAJ proteins were screened from an α3β3 binary-patterned library which was designed with polar/ nonpolar periodicities of α-helix and β-sheet. vTAJ proteins have the additional terminal sequences due to the method of constructing the genetically repeated library sequences. By removing the parts of the sequences, we successfully obtained the stable smaller de novo protein mutants with fewer amino acid alphabets than the originals. However, these mutants showed the differences on ANS binding properties and stabilities against denaturant and pH change. The terminal sequences, which were designed just as flexible linkers not as secondary structure units, sufficiently affected these physicochemical details. This study showed implications for adjusting protein stabilities by designing N- and C-terminal sequences.

  18. Dissecting the Mutational Landscape of Cutaneous Melanoma: An Omic Analysis Based on Patients from Greece

    PubMed Central

    Piroti, Georgia; Papadodima, Olga

    2018-01-01

    Melanoma is a lethal type of skin cancer, unless it is diagnosed early. Formalin-fixed, paraffin-embedded (FFPE) tissue is a valuable source for molecular assays after diagnostic examination, but isolated nucleic acids often suffer from degradation. Here, for the first time, we examine primary melanomas from Greek patients, using whole exome sequencing, so as to derive their mutational profile. Application of a bioinformatic framework revealed a total of 10,030 somatic mutations. Regarding the genes containing putative protein-altering mutations, 73 were common in at least three patients. Sixty-five of these 73 top common genes have been previously identified in melanoma cases. Biological processes related to melanoma were affected by varied genes in each patient, suggesting differences in the components of a pathway possibly contributing to pathogenesis. We performed a multi-level analysis highlighting a short list of candidate genes with a probable causative role in melanoma. PMID:29596374

  19. Identification and Characterization of the Pyridomycin Biosynthetic Gene Cluster of Streptomyces pyridomyceticus NRRL B-2517*

    PubMed Central

    Huang, Tingting; Wang, Yemin; Yin, Jun; Du, Yanhua; Tao, Meifeng; Xu, Jing; Chen, Wenqing; Lin, Shuangjun; Deng, Zixin

    2011-01-01

    Pyridomycin is a structurally unique antimycobacterial cyclodepsipeptide containing rare 3-(3-pyridyl)-l-alanine and 2-hydroxy-3-methylpent-2-enoic acid moieties. The biosynthetic gene cluster for pyridomycin has been cloned and identified from Streptomyces pyridomyceticus NRRL B-2517. Sequence analysis of a 42.5-kb DNA region revealed 26 putative open reading frames, including two nonribosomal peptide synthetase (NRPS) genes and a polyketide synthase gene. A special feature is the presence of a polyketide synthase-type ketoreductase domain embedded in an NRPS. Furthermore, we showed that PyrA functioned as an NRPS adenylation domain that activates 3-hydroxypicolinic acid and transfers it to a discrete peptidyl carrier protein, PyrU, which functions as a loading module that initiates pyridomycin biosynthesis in vivo and in vitro. PyrA could also activate other aromatic acids, generating three pyridomycin analogues in vivo. PMID:21454714

  20. On the Origin of Protein Superfamilies and Superfolds

    NASA Astrophysics Data System (ADS)

    Magner, Abram; Szpankowski, Wojciech; Kihara, Daisuke

    2015-02-01

    Distributions of protein families and folds in genomes are highly skewed, having a small number of prevalent superfamiles/superfolds and a large number of families/folds of a small size. Why are the distributions of protein families and folds skewed? Why are there only a limited number of protein families? Here, we employ an information theoretic approach to investigate the protein sequence-structure relationship that leads to the skewed distributions. We consider that protein sequences and folds constitute an information theoretic channel and computed the most efficient distribution of sequences that code all protein folds. The identified distributions of sequences and folds are found to follow a power law, consistent with those observed for proteins in nature. Importantly, the skewed distributions of sequences and folds are suggested to have different origins: the skewed distribution of sequences is due to evolutionary pressure to achieve efficient coding of necessary folds, whereas that of folds is based on the thermodynamic stability of folds. The current study provides a new information theoretic framework for proteins that could be widely applied for understanding protein sequences, structures, functions, and interactions.

  1. Optimizing high performance computing workflow for protein functional annotation.

    PubMed

    Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene

    2014-09-10

    Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data.

  2. Optimizing high performance computing workflow for protein functional annotation

    PubMed Central

    Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene

    2014-01-01

    Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data. PMID:25313296

  3. Metamorphic Proteins: Emergence of Dual Protein Folds from One Primary Sequence.

    PubMed

    Lella, Muralikrishna; Mahalakshmi, Radhakrishnan

    2017-06-20

    Every amino acid exhibits a different propensity for distinct structural conformations. Hence, decoding how the primary amino acid sequence undergoes the transition to a defined secondary structure and its final three-dimensional fold is presently considered predictable with reasonable certainty. However, protein sequences that defy the first principles of secondary structure prediction (they attain two different folds) have recently been discovered. Such proteins, aptly named metamorphic proteins, decrease the conformational constraint by increasing flexibility in the secondary structure and thereby result in efficient functionality. In this review, we discuss the major factors driving the conformational switch related both to protein sequence and to structure using illustrative examples. We discuss the concept of an evolutionary transition in sequence and structure, the functional impact of the tertiary fold, and the pressure of intrinsic and external factors that give rise to metamorphic proteins. We mainly focus on the major components of protein architecture, namely, the α-helix and β-sheet segments, which are involved in conformational switching within the same or highly similar sequences. These chameleonic sequences are widespread in both cytosolic and membrane proteins, and these folds are equally important for protein structure and function. We discuss the implications of metamorphic proteins and chameleonic peptide sequences in de novo peptide design.

  4. Exome Sequencing Identifies Truncating Mutations in Human SERPINF1 in Autosomal-Recessive Osteogenesis Imperfecta

    PubMed Central

    Becker, Jutta; Semler, Oliver; Gilissen, Christian; Li, Yun; Bolz, Hanno Jörn; Giunta, Cecilia; Bergmann, Carsten; Rohrbach, Marianne; Koerber, Friederike; Zimmermann, Katharina; de Vries, Petra; Wirth, Brunhilde; Schoenau, Eckhard; Wollnik, Bernd; Veltman, Joris A.; Hoischen, Alexander; Netzer, Christian

    2011-01-01

    Osteogenesis imperfecta (OI) is a heterogeneous genetic disorder characterized by bone fragility and susceptibility to fractures after minimal trauma. After mutations in all known OI genes had been excluded by Sanger sequencing, we applied next-generation sequencing to analyze the exome of a single individual who has a severe form of the disease and whose parents are second cousins. A total of 26,922 variations from the human reference genome sequence were subjected to several filtering steps. In addition, we extracted the genotypes of all dbSNP130-annotated SNPs from the exome sequencing data and used these 299,494 genotypes as markers for the genome-wide identification of homozygous regions. A single homozygous truncating mutation, affecting SERPINF1 on chromosome 17p13.3, that was embedded into a homozygous stretch of 2.99 Mb remained. The mutation was also homozygous in the affected brother of the index patient. Subsequently, we identified homozygosity for two different truncating SERPINF1 mutations in two unrelated patients with OI and parental consanguinity. All four individuals with SERPINF1 mutations have severe OI. Fractures of long bones and severe vertebral compression fractures with resulting deformities were observed as early as the first year of life in these individuals. Collagen analyses with cultured dermal fibroblasts displayed no evidence for impaired collagen folding, posttranslational modification, or secretion. SERPINF1 encodes pigment epithelium-derived factor (PEDF), a secreted glycoprotein of the serpin superfamily. PEDF is a multifunctional protein and one of the strongest inhibitors of angiogenesis currently known in humans. Our data provide genetic evidence for PEDF involvement in human bone homeostasis. PMID:21353196

  5. CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction.

    PubMed

    Cui, Xuefeng; Lu, Zhiwu; Wang, Sheng; Jing-Yan Wang, Jim; Gao, Xin

    2016-06-15

    Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment, threading and alignment-free methods, protein homology detection remains a challenging open problem. Recently, network methods that try to find transitive paths in the protein structure space demonstrate the importance of incorporating network information of the structure space. Yet, current methods merge the sequence space and the structure space into a single space, and thus introduce inconsistency in combining different sources of information. We present a novel network-based protein homology detection method, CMsearch, based on cross-modal learning. Instead of exploring a single network built from the mixture of sequence and structure space information, CMsearch builds two separate networks to represent the sequence space and the structure space. It then learns sequence-structure correlation by simultaneously taking sequence information, structure information, sequence space information and structure space information into consideration. We tested CMsearch on two challenging tasks, protein homology detection and protein structure prediction, by querying all 8332 PDB40 proteins. Our results demonstrate that CMsearch is insensitive to the similarity metrics used to define the sequence and the structure spaces. By using HMM-HMM alignment as the sequence similarity metric, CMsearch clearly outperforms state-of-the-art homology detection methods and the CASP-winning template-based protein structure prediction methods. Our program is freely available for download from http://sfb.kaust.edu.sa/Pages/Software.aspx : xin.gao@kaust.edu.sa Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  6. Elman RNN based classification of proteins sequences on account of their mutual information.

    PubMed

    Mishra, Pooja; Nath Pandey, Paras

    2012-10-21

    In the present work we have employed the method of estimating residue correlation within the protein sequences, by using the mutual information (MI) of adjacent residues, based on structural and solvent accessibility properties of amino acids. The long range correlation between nonadjacent residues is improved by constructing a mutual information vector (MIV) for a single protein sequence, like this each protein sequence is associated with its corresponding MIVs. These MIVs are given to Elman RNN to obtain the classification of protein sequences. The modeling power of MIV was shown to be significantly better, giving a new approach towards alignment free classification of protein sequences. We also conclude that sequence structural and solvent accessible property based MIVs are better predictor. Copyright © 2012 Elsevier Ltd. All rights reserved.

  7. Combining high-throughput MALDI-TOF mass spectrometry and isoelectric focusing gel electrophoresis for virtual 2D gel-based proteomics.

    PubMed

    Lohnes, Karen; Quebbemann, Neil R; Liu, Kate; Kobzeff, Fred; Loo, Joseph A; Ogorzalek Loo, Rachel R

    2016-07-15

    The virtual two-dimensional gel electrophoresis/mass spectrometry (virtual 2D gel/MS) technology combines the premier, high-resolution capabilities of 2D gel electrophoresis with the sensitivity and high mass accuracy of mass spectrometry (MS). Intact proteins separated by isoelectric focusing (IEF) gel electrophoresis are imaged from immobilized pH gradient (IPG) polyacrylamide gels (the first dimension of classic 2D-PAGE) by matrix-assisted laser desorption/ionization (MALDI) MS. Obtaining accurate intact masses from sub-picomole-level proteins embedded in 2D-PAGE gels or in IPG strips is desirable to elucidate how the protein of one spot identified as protein 'A' on a 2D gel differs from the protein of another spot identified as the same protein, whenever tryptic peptide maps fail to resolve the issue. This task, however, has been extremely challenging. Virtual 2D gel/MS provides access to these intact masses. Modifications to our matrix deposition procedure improve the reliability with which IPG gels can be prepared; the new procedure is described. Development of this MALDI MS imaging (MSI) method for high-throughput MS with integrated 'top-down' MS to elucidate protein isoforms from complex biological samples is described and it is demonstrated that a 4-cm IPG gel segment can now be imaged in approximately 5min. Gel-wide chemical and enzymatic methods with further interrogation by MALDI MS/MS provide identifications, sequence-related information, and post-translational/transcriptional modification information. The MSI-based virtual 2D gel/MS platform may potentially link the benefits of 'top-down' and 'bottom-up' proteomics. Copyright © 2016 Elsevier Inc. All rights reserved.

  8. Expression of DNA repair proteins MSH2, MLH1 and MGMT in human benign and malignant thyroid lesions: An immunohistochemical study

    PubMed Central

    Giaginis, Constantinos; Michailidi, Christina; Stolakis, Vasileios; Alexandrou, Paraskevi; Tsourouflis, Gerasimos; Klijanienko, Jerzy; Delladetsima, Ioanna; Theocharis, Stamatios

    2011-01-01

    Summary Background DNA repair is a major defense mechanism, which contributes to the maintenance of genetic sequence, and minimizes cell death, mutation rates, replication errors, DNA damage persistence and genomic instability. Alterations in the expression levels of proteins participating in DNA repair mechanisms have been associated with several aspects of cancer biology. The present study aimed to evaluate the clinical significance of DNA repair proteins MSH2, MLH1 and MGMT in benign and malignant thyroid lesions. Material/Methods MSH2, MLH1 and MGMT protein expression was assessed immunohistochemically on paraffin-embedded thyroid tissues from 90 patients with benign and malignant lesions. Results The expression levels of MLH1 was significantly upregulated in cases with malignant compared to those with benign thyroid lesions (p=0.038). The expression levels of MGMT was significantly downregulated in malignant compared to benign thyroid lesions (p=0.001). Similar associations for both MLH1 and MGMT between cases with papillary carcinoma and hyperplastic nodules were also noted (p=0.014 and p=0.026, respectively). In the subgroup of malignant thyroid lesions, MSH2 downregulation was significantly associated with larger tumor size (p=0.031), while MLH1 upregulation was significantly associated with the presence of lymphatic and vascular invasion (p=0.006 and p=0.002, respectively). Conclusions Alterations in the mismatch repair proteins MSH2 and MLH1 and the direct repair protein MGMT may result from tumor development and/or progression. Further studies are recommended to draw definite conclusions on the clinical significance of DNA repair proteins in thyroid neoplasia. PMID:21358597

  9. Computationally mapping sequence space to understand evolutionary protein engineering.

    PubMed

    Armstrong, Kathryn A; Tidor, Bruce

    2008-01-01

    Evolutionary protein engineering has been dramatically successful, producing a wide variety of new proteins with altered stability, binding affinity, and enzymatic activity. However, the success of such procedures is often unreliable, and the impact of the choice of protein, engineering goal, and evolutionary procedure is not well understood. We have created a framework for understanding aspects of the protein engineering process by computationally mapping regions of feasible sequence space for three small proteins using structure-based design protocols. We then tested the ability of different evolutionary search strategies to explore these sequence spaces. The results point to a non-intuitive relationship between the error-prone PCR mutation rate and the number of rounds of replication. The evolutionary relationships among feasible sequences reveal hub-like sequences that serve as particularly fruitful starting sequences for evolutionary search. Moreover, genetic recombination procedures were examined, and tradeoffs relating sequence diversity and search efficiency were identified. This framework allows us to consider the impact of protein structure on the allowed sequence space and therefore on the challenges that each protein presents to error-prone PCR and genetic recombination procedures.

  10. Rosetta stone method for detecting protein function and protein-protein interactions from genome sequences

    DOEpatents

    Eisenberg, David; Marcotte, Edward M.; Pellegrini, Matteo; Thompson, Michael J.; Yeates, Todd O.

    2002-10-15

    A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.

  11. Protein Information Resource: a community resource for expert annotation of protein data

    PubMed Central

    Barker, Winona C.; Garavelli, John S.; Hou, Zhenglin; Huang, Hongzhan; Ledley, Robert S.; McGarvey, Peter B.; Mewes, Hans-Werner; Orcutt, Bruce C.; Pfeiffer, Friedhelm; Tsugita, Akira; Vinayaka, C. R.; Xiao, Chunlin; Yeh, Lai-Su L.; Wu, Cathy

    2001-01-01

    The Protein Information Resource, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the most comprehensive and expertly annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database. To provide timely and high quality annotation and promote database interoperability, the PIR-International employs rule-based and classification-driven procedures based on controlled vocabulary and standard nomenclature and includes status tags to distinguish experimentally determined from predicted protein features. The database contains about 200 000 non-redundant protein sequences, which are classified into families and superfamilies and their domains and motifs identified. Entries are extensively cross-referenced to other sequence, classification, genome, structure and activity databases. The PIR web site features search engines that use sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. The PIR-Inter­national databases and search tools are accessible on the PIR web site at http://pir.georgetown.edu/ and at the MIPS web site at http://www.mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP. PMID:11125041

  12. ORFer--retrieval of protein sequences and open reading frames from GenBank and storage into relational databases or text files.

    PubMed

    Büssow, Konrad; Hoffmann, Steve; Sievert, Volker

    2002-12-19

    Functional genomics involves the parallel experimentation with large sets of proteins. This requires management of large sets of open reading frames as a prerequisite of the cloning and recombinant expression of these proteins. A Java program was developed for retrieval of protein and nucleic acid sequences and annotations from NCBI GenBank, using the XML sequence format. Annotations retrieved by ORFer include sequence name, organism and also the completeness of the sequence. The program has a graphical user interface, although it can be used in a non-interactive mode. For protein sequences, the program also extracts the open reading frame sequence, if available, and checks its correct translation. ORFer accepts user input in the form of single or lists of GenBank GI identifiers or accession numbers. It can be used to extract complete sets of open reading frames and protein sequences from any kind of GenBank sequence entry, including complete genomes or chromosomes. Sequences are either stored with their features in a relational database or can be exported as text files in Fasta or tabulator delimited format. The ORFer program is freely available at http://www.proteinstrukturfabrik.de/orfer. The ORFer program allows for fast retrieval of DNA sequences, protein sequences and their open reading frames and sequence annotations from GenBank. Furthermore, storage of sequences and features in a relational database is supported. Such a database can supplement a laboratory information system (LIMS) with appropriate sequence information.

  13. Comparison of Different Buffers for Protein Extraction from Formalin-Fixed and Paraffin-Embedded Tissue Specimens

    PubMed Central

    Shen, Kaini; Sun, Jian; Cao, Xinxin; Zhou, Daobin; Li, Jian

    2015-01-01

    We determined the best extraction buffer for proteomic investigation using formalin-fixation and paraffin-embedded (FFPE) specimens. A Zwittergent 3–16 based buffer, sodium dodecyl sulfate (SDS)-containing buffer with/without polyethylene glycol 20000 (PEG20000), urea-containing buffer, and FFPE-FASP protein preparation kit were compared for protein extraction from different types of rat FFPE tissues, including the heart, brain, liver, lung, and kidney. All of the samples were divided into two groups of laser microdissected (LMD) and non-LMD specimens. For both kinds of specimens, Zwittergent was the most efficient buffer for identifying peptides and proteins, was broadly applicable to different tissues without impairing the enzymatic digestion, and was well compatible with mass spectrometry analysis. As a high molecular weight carrier substance, PEG20000 improved the identification of peptides and proteins; however, such an advantage is limited to tissues containing submicrograms to micrograms of protein. Considering its low lytic strength, urea-containing buffer would not be the first alternative for protein recovery. In conclusion, Zwittergent 3–16 is an effective buffer for extracting proteins from FFPE specimens for downstream proteomics analysis. PMID:26580073

  14. Protein Sequence Classification with Improved Extreme Learning Machine Algorithms

    PubMed Central

    2014-01-01

    Precisely classifying a protein sequence from a large biological protein sequences database plays an important role for developing competitive pharmacological products. Comparing the unseen sequence with all the identified protein sequences and returning the category index with the highest similarity scored protein, conventional methods are usually time-consuming. Therefore, it is urgent and necessary to build an efficient protein sequence classification system. In this paper, we study the performance of protein sequence classification using SLFNs. The recent efficient extreme learning machine (ELM) and its invariants are utilized as the training algorithms. The optimal pruned ELM is first employed for protein sequence classification in this paper. To further enhance the performance, the ensemble based SLFNs structure is constructed where multiple SLFNs with the same number of hidden nodes and the same activation function are used as ensembles. For each ensemble, the same training algorithm is adopted. The final category index is derived using the majority voting method. Two approaches, namely, the basic ELM and the OP-ELM, are adopted for the ensemble based SLFNs. The performance is analyzed and compared with several existing methods using datasets obtained from the Protein Information Resource center. The experimental results show the priority of the proposed algorithms. PMID:24795876

  15. Evaluation of positive Rift Valley fever virus formalin-fixed paraffin embedded samples as a source of sequence data for retrospective phylogenetic analysis.

    PubMed

    Mubemba, B; Thompson, P N; Odendaal, L; Coetzee, P; Venter, E H

    2017-05-01

    Rift Valley fever (RVF), caused by an arthropod borne Phlebovirus in the family Bunyaviridae, is a haemorrhagic disease that affects ruminants and humans. Due to the zoonotic nature of the virus, a biosafety level 3 laboratory is required for isolation of the virus. Fresh and frozen samples are the preferred sample type for isolation and acquisition of sequence data. However, these samples are scarce in addition to posing a health risk to laboratory personnel. Archived formalin-fixed, paraffin-embedded (FFPE) tissue samples are safe and readily available, however FFPE derived RNA is in most cases degraded and cross-linked in peptide bonds and it is unknown whether the sample type would be suitable as reference material for retrospective phylogenetic studies. A RT-PCR assay targeting a 490 nt portion of the structural G N glycoprotein encoding gene of the RVFV M-segment was applied to total RNA extracted from archived RVFV positive FFPE samples. Several attempts to obtain target amplicons were unsuccessful. FFPE samples were then analysed using next generation sequencing (NGS), i.e. Truseq ® (Illumina) and sequenced on the Miseq ® genome analyser (Illumina). Using reference mapping, gapped virus sequence data of varying degrees of shallow depth was aligned to a reference sequence. However, the NGS did not yield long enough contigs that consistently covered the same genome regions in all samples to allow phylogenetic analysis. Copyright © 2017 Elsevier B.V. All rights reserved.

  16. Special AT-rich sequence binding protein 1 promotes tumor growth and metastasis of esophageal squamous cell carcinoma.

    PubMed

    Ma, Jun; Wu, Kaiming; Zhao, Zhenxian; Miao, Rong; Xu, Zhe

    2017-03-01

    Esophageal squamous cell carcinoma is one of the most aggressive malignancies worldwide. Special AT-rich sequence binding protein 1 is a nuclear matrix attachment region binding protein which participates in higher order chromatin organization and tissue-specific gene expression. However, the role of special AT-rich sequence binding protein 1 in esophageal squamous cell carcinoma remains unknown. In this study, western blot and quantitative real-time polymerase chain reaction analysis were performed to identify differentially expressed special AT-rich sequence binding protein 1 in a series of esophageal squamous cell carcinoma tissue samples. The effects of special AT-rich sequence binding protein 1 silencing by two short-hairpin RNAs on cell proliferation, migration, and invasion were assessed by the CCK-8 assay and transwell assays in esophageal squamous cell carcinoma in vitro. Special AT-rich sequence binding protein 1 was significantly upregulated in esophageal squamous cell carcinoma tissue samples and cell lines. Silencing of special AT-rich sequence binding protein 1 inhibited the proliferation of KYSE450 and EC9706 cells which have a relatively high level of special AT-rich sequence binding protein 1, and the ability of migration and invasion of KYSE450 and EC9706 cells was distinctly suppressed. Special AT-rich sequence binding protein 1 could be a potential target for the treatment of esophageal squamous cell carcinoma and inhibition of special AT-rich sequence binding protein 1 may provide a new strategy for the prevention of esophageal squamous cell carcinoma invasion and metastasis.

  17. SequenceCEROSENE: a computational method and web server to visualize spatial residue neighborhoods at the sequence level.

    PubMed

    Heinke, Florian; Bittrich, Sebastian; Kaiser, Florian; Labudde, Dirk

    2016-01-01

    To understand the molecular function of biopolymers, studying their structural characteristics is of central importance. Graphics programs are often utilized to conceive these properties, but with the increasing number of available structures in databases or structure models produced by automated modeling frameworks this process requires assistance from tools that allow automated structure visualization. In this paper a web server and its underlying method for generating graphical sequence representations of molecular structures is presented. The method, called SequenceCEROSENE (color encoding of residues obtained by spatial neighborhood embedding), retrieves the sequence of each amino acid or nucleotide chain in a given structure and produces a color coding for each residue based on three-dimensional structure information. From this, color-highlighted sequences are obtained, where residue coloring represent three-dimensional residue locations in the structure. This color encoding thus provides a one-dimensional representation, from which spatial interactions, proximity and relations between residues or entire chains can be deduced quickly and solely from color similarity. Furthermore, additional heteroatoms and chemical compounds bound to the structure, like ligands or coenzymes, are processed and reported as well. To provide free access to SequenceCEROSENE, a web server has been implemented that allows generating color codings for structures deposited in the Protein Data Bank or structure models uploaded by the user. Besides retrieving visualizations in popular graphic formats, underlying raw data can be downloaded as well. In addition, the server provides user interactivity with generated visualizations and the three-dimensional structure in question. Color encoded sequences generated by SequenceCEROSENE can aid to quickly perceive the general characteristics of a structure of interest (or entire sets of complexes), thus supporting the researcher in the initial phase of structure-based studies. In this respect, the web server can be a valuable tool, as users are allowed to process multiple structures, quickly switch between results, and interact with generated visualizations in an intuitive manner. The SequenceCEROSENE web server is available at https://biosciences.hs-mittweida.de/seqcerosene.

  18. Context and meter enhance long-range planning in music performance

    PubMed Central

    Mathias, Brian; Pfordresher, Peter Q.; Palmer, Caroline

    2015-01-01

    Neural responses demonstrate evidence of resonance, or oscillation, during the production of periodic auditory events. Music contains periodic auditory events that give rise to a sense of beat, which in turn generates a sense of meter on the basis of multiple periodicities. Metrical hierarchies may aid memory for music by facilitating similarity-based associations among sequence events at different periodic distances that unfold in longer contexts. A fundamental question is how metrical associations arising from a musical context influence memory during music performance. Longer contexts may facilitate metrical associations at higher hierarchical levels more than shorter contexts, a prediction of the range model, a formal model of planning processes in music performance (Palmer and Pfordresher, 2003; Pfordresher et al., 2007). Serial ordering errors, in which intended sequence events are produced in incorrect sequence positions, were measured as skilled pianists performed musical pieces that contained excerpts embedded in long or short musical contexts. Pitch errors arose from metrically similar positions and further sequential distances more often when the excerpt was embedded in long contexts compared to short contexts. Musicians’ keystroke intensities and error rates also revealed influences of metrical hierarchies, which differed for performances in long and short contexts. The range model accounted for contextual effects and provided better fits to empirical findings when metrical associations between sequence events were included. Longer sequence contexts may facilitate planning during sequence production by increasing conceptual similarity between hierarchically associated events. These findings are consistent with the notion that neural oscillations at multiple periodicities may strengthen metrical associations across sequence events during planning. PMID:25628550

  19. ProfileGrids: a sequence alignment visualization paradigm that avoids the limitations of Sequence Logos.

    PubMed

    Roca, Alberto I

    2014-01-01

    The 2013 BioVis Contest provided an opportunity to evaluate different paradigms for visualizing protein multiple sequence alignments. Such data sets are becoming extremely large and thus taxing current visualization paradigms. Sequence Logos represent consensus sequences but have limitations for protein alignments. As an alternative, ProfileGrids are a new protein sequence alignment visualization paradigm that represents an alignment as a color-coded matrix of the residue frequency occurring at every homologous position in the aligned protein family. The JProfileGrid software program was used to analyze the BioVis contest data sets to generate figures for comparison with the Sequence Logo reference images. The ProfileGrid representation allows for the clear and effective analysis of protein multiple sequence alignments. This includes both a general overview of the conservation and diversity sequence patterns as well as the interactive ability to query the details of the protein residue distributions in the alignment. The JProfileGrid software is free and available from http://www.ProfileGrid.org.

  20. A topological approach for protein classification

    DOE PAGES

    Cang, Zixuan; Mu, Lin; Wu, Kedi; ...

    2015-11-04

    Here, protein function and dynamics are closely related to its sequence and structure. However, prediction of protein function and dynamics from its sequence and structure is still a fundamental challenge in molecular biology. Protein classification, which is typically done through measuring the similarity between proteins based on protein sequence or physical information, serves as a crucial step toward the understanding of protein function and dynamics.

  1. Combining protein sequence, structure, and dynamics: A novel approach for functional evolution analysis of PAS domain superfamily.

    PubMed

    Dong, Zheng; Zhou, Hongyu; Tao, Peng

    2018-02-01

    PAS domains are widespread in archaea, bacteria, and eukaryota, and play important roles in various functions. In this study, we aim to explore functional evolutionary relationship among proteins in the PAS domain superfamily in view of the sequence-structure-dynamics-function relationship. We collected protein sequences and crystal structure data from RCSB Protein Data Bank of the PAS domain superfamily belonging to three biological functions (nucleotide binding, photoreceptor activity, and transferase activity). Protein sequences were aligned and then used to select sequence-conserved residues and build phylogenetic tree. Three-dimensional structure alignment was also applied to obtain structure-conserved residues. The protein dynamics were analyzed using elastic network model (ENM) and validated by molecular dynamics (MD) simulation. The result showed that the proteins with same function could be grouped by sequence similarity, and proteins in different functional groups displayed statistically significant difference in their vibrational patterns. Interestingly, in all three functional groups, conserved amino acid residues identified by sequence and structure conservation analysis generally have a lower fluctuation than other residues. In addition, the fluctuation of conserved residues in each biological function group was strongly correlated with the corresponding biological function. This research suggested a direct connection in which the protein sequences were related to various functions through structural dynamics. This is a new attempt to delineate functional evolution of proteins using the integrated information of sequence, structure, and dynamics. © 2017 The Protein Society.

  2. Dissecting the relationship between protein structure and sequence variation

    NASA Astrophysics Data System (ADS)

    Shahmoradi, Amir; Wilke, Claus; Wilke Lab Team

    2015-03-01

    Over the past decade several independent works have shown that some structural properties of proteins are capable of predicting protein evolution. The strength and significance of these structure-sequence relations, however, appear to vary widely among different proteins, with absolute correlation strengths ranging from 0 . 1 to 0 . 8 . Here we present the results from a comprehensive search for the potential biophysical and structural determinants of protein evolution by studying more than 200 structural and evolutionary properties in a dataset of 209 monomeric enzymes. We discuss the main protein characteristics responsible for the general patterns of protein evolution, and identify sequence divergence as the main determinant of the strengths of virtually all structure-evolution relationships, explaining ~ 10 - 30 % of observed variation in sequence-structure relations. In addition to sequence divergence, we identify several protein structural properties that are moderately but significantly coupled with the strength of sequence-structure relations. In particular, proteins with more homogeneous back-bone hydrogen bond energies, large fractions of helical secondary structures and low fraction of beta sheets tend to have the strongest sequence-structure relation. BEACON-NSF center for the study of evolution in action.

  3. Protein Interaction Profile Sequencing (PIP-seq).

    PubMed

    Foley, Shawn W; Gregory, Brian D

    2016-10-10

    Every eukaryotic RNA transcript undergoes extensive post-transcriptional processing from the moment of transcription up through degradation. This regulation is performed by a distinct cohort of RNA-binding proteins which recognize their target transcript by both its primary sequence and secondary structure. Here, we describe protein interaction profile sequencing (PIP-seq), a technique that uses ribonuclease-based footprinting followed by high-throughput sequencing to globally assess both protein-bound RNA sequences and RNA secondary structure. PIP-seq utilizes single- and double-stranded RNA-specific nucleases in the absence of proteins to infer RNA secondary structure. These libraries are also compared to samples that undergo nuclease digestion in the presence of proteins in order to find enriched protein-bound sequences. Combined, these four libraries provide a comprehensive, transcriptome-wide view of RNA secondary structure and RNA protein interaction sites from a single experimental technique. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.

  4. Correlation between protein sequence similarity and x-ray diffraction quality in the protein data bank.

    PubMed

    Lu, Hui-Meng; Yin, Da-Chuan; Ye, Ya-Jing; Luo, Hui-Min; Geng, Li-Qiang; Li, Hai-Sheng; Guo, Wei-Hong; Shang, Peng

    2009-01-01

    As the most widely utilized technique to determine the 3-dimensional structure of protein molecules, X-ray crystallography can provide structure of the highest resolution among the developed techniques. The resolution obtained via X-ray crystallography is known to be influenced by many factors, such as the crystal quality, diffraction techniques, and X-ray sources, etc. In this paper, the authors found that the protein sequence could also be one of the factors. We extracted information of the resolution and the sequence of proteins from the Protein Data Bank (PDB), classified the proteins into different clusters according to the sequence similarity, and statistically analyzed the relationship between the sequence similarity and the best resolution obtained. The results showed that there was a pronounced correlation between the sequence similarity and the obtained resolution. These results indicate that protein structure itself is one variable that may affect resolution when X-ray crystallography is used.

  5. Development of a Plastic Embedding Method for Large-Volume and Fluorescent-Protein-Expressing Tissues

    PubMed Central

    Yang, Zhongqin; Hu, Bihe; Zhang, Yuhui; Luo, Qingming; Gong, Hui

    2013-01-01

    Fluorescent proteins serve as important biomarkers for visualizing both subcellular organelles in living cells and structural and functional details in large-volume tissues or organs. However, current techniques for plastic embedding are limited in their ability to preserve fluorescence while remaining suitable for micro-optical sectioning tomography of large-volume samples. In this study, we quantitatively evaluated the fluorescence preservation and penetration time of several commonly used resins in a Thy1-eYFP-H transgenic whole mouse brain, including glycol methacrylate (GMA), LR White, hydroxypropyl methacrylate (HPMA) and Unicryl. We found that HMPA embedding doubled the eYFP fluorescence intensity but required long durations of incubation for whole brain penetration. GMA, Unicryl and LR White each penetrated the brain rapidly but also led to variable quenching of eYFP fluorescence. Among the fast-penetrating resins, GMA preserved fluorescence better than LR White and Unicryl. We found that we could optimize the GMA formulation by reducing the polymerization temperature, removing 4-methoxyphenol and adjusting the pH of the resin solution to be alkaline. By optimizing the GMA formulation, we increased percentage of eYFP fluorescence preservation in GMA-embedded brains nearly two-fold. These results suggest that modified GMA is suitable for embedding large-volume tissues such as whole mouse brain and provide a novel approach for visualizing brain-wide networks. PMID:23577174

  6. Complete nucleotide and derived amino acid sequence of cDNA encoding the mitochondrial uncoupling protein of rat brown adipose tissue: lack of a mitochondrial targeting presequence.

    PubMed Central

    Ridley, R G; Patel, H V; Gerber, G E; Morton, R C; Freeman, K B

    1986-01-01

    A cDNA clone spanning the entire amino acid sequence of the nuclear-encoded uncoupling protein of rat brown adipose tissue mitochondria has been isolated and sequenced. With the exception of the N-terminal methionine the deduced N-terminus of the newly synthesized uncoupling protein is identical to the N-terminal 30 amino acids of the native uncoupling protein as determined by protein sequencing. This proves that the protein contains no N-terminal mitochondrial targeting prepiece and that a targeting region must reside within the amino acid sequence of the mature protein. Images PMID:3012461

  7. The embedded population around Herbig Ae/Be stars

    NASA Astrophysics Data System (ADS)

    Testi, L.; Stanga, R. M.; Natta, A.; Palla, F.; Prusti, T.; Baffa, C.; Hunt, L. K.; Lisi, F.

    Herbig Ae/Be stars are intermediate mass young stars in the pre-main sequence phase of evolution. There are only few stars of this type known so far, and all of them seem to be relatively isolated, in contrast to their low mass counterparts, the T Tauri stars. A possible explanation of this fact is that other young stars formed near the known YSO are deeply embedded in the molecular cloud environment and are not detectable at optical wavelengths. We used the new ARcetri Near Infrared CAmera (ARNICA) to survey in the J, H and K bands the regions of sky around Herbig stars. The aim of this work is to identify embedded YSO and investigate the clustering properties of these young stars.

  8. A multisite blinded study for the detection of BRAF mutations in formalin-fixed, paraffin-embedded malignant melanoma

    PubMed Central

    Richter, Anna; Grieu, Fabienne; Carrello, Amerigo; Amanuel, Benhur; Namdarian, Kateh; Rynska, Aleksandra; Lucas, Amanda; Michael, Victoria; Bell, Anthony; Fox, Stephen B.; Hewitt, Chelsee A.; Do, Hongdo; McArthur, Grant A.; Wong, Stephen Q.; Dobrovic, Alexander; Iacopetta, Barry

    2013-01-01

    Melanoma patients with BRAF mutations respond to treatment with vemurafenib, thus creating a need for accurate testing of BRAF mutation status. We carried out a blinded study to evaluate various BRAF mutation testing methodologies in the clinical setting. Formalin-fixed, paraffin-embedded melanoma samples were macrodissected before screening for mutations using Sanger sequencing, single-strand conformation analysis (SSCA), high resolution melting analysis (HRM) and competitive allele-specific TaqMan® PCR (CAST-PCR). Concordance of 100% was observed between the Sanger sequencing, SSCA and HRM techniques. CAST-PCR gave rapid and accurate results for the common V600E and V600K mutations, however additional assays are required to detect rarer BRAF mutation types found in 3–4% of melanomas. HRM and SSCA followed by Sanger sequencing are effective two-step strategies for the detection of BRAF mutations in the clinical setting. CAST-PCR was useful for samples with low tumour purity and may also be a cost-effective and robust method for routine diagnostics. PMID:23584600

  9. MIPS: a database for genomes and protein sequences.

    PubMed Central

    Mewes, H W; Heumann, K; Kaps, A; Mayer, K; Pfeiffer, F; Stocker, S; Frishman, D

    1999-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Martinsried near Munich, Germany, develops and maintains genome oriented databases. It is commonplace that the amount of sequence data available increases rapidly, but not the capacity of qualified manual annotation at the sequence databases. Therefore, our strategy aims to cope with the data stream by the comprehensive application of analysis tools to sequences of complete genomes, the systematic classification of protein sequences and the active support of sequence analysis and functional genomics projects. This report describes the systematic and up-to-date analysis of genomes (PEDANT), a comprehensive database of the yeast genome (MYGD), a database reflecting the progress in sequencing the Arabidopsis thaliana genome (MATD), the database of assembled, annotated human EST clusters (MEST), and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). MIPS provides access through its WWW server (http://www.mips.biochem.mpg.de) to a spectrum of generic databases, including the above mentioned as well as a database of protein families (PROTFAM), the MITOP database, and the all-against-all FASTA database. PMID:9847138

  10. TIMMDC1/C3orf1 functions as a membrane-embedded mitochondrial complex I assembly factor through association with the MCIA complex.

    PubMed

    Guarani, Virginia; Paulo, Joao; Zhai, Bo; Huttlin, Edward L; Gygi, Steven P; Harper, J Wade

    2014-03-01

    Complex I (CI) of the electron transport chain, a large membrane-embedded NADH dehydrogenase, couples electron transfer to the release of protons into the mitochondrial inner membrane space to promote ATP production through ATP synthase. In addition to being a central conduit for ATP production, CI activity has been linked to neurodegenerative disorders, including Parkinson's disease. CI is built in a stepwise fashion through the actions of several assembly factors. We employed interaction proteomics to interrogate the molecular associations of 15 core subunits and assembly factors previously linked to human CI deficiency, resulting in a network of 101 proteins and 335 interactions (edges). TIMMDC1, a predicted 4-pass membrane protein, reciprocally associated with multiple members of the MCIA CI assembly factor complex and core CI subunits and was localized in the mitochondrial inner membrane, and its depletion resulted in reduced CI activity and cellular respiration. Quantitative proteomics demonstrated a role for TIMMDC1 in assembly of membrane-embedded and soluble arms of the complex. This study defines a new membrane-embedded CI assembly factor and provides a resource for further analysis of CI biology.

  11. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cang, Zixuan; Mu, Lin; Wu, Kedi

    Here, protein function and dynamics are closely related to its sequence and structure. However, prediction of protein function and dynamics from its sequence and structure is still a fundamental challenge in molecular biology. Protein classification, which is typically done through measuring the similarity between proteins based on protein sequence or physical information, serves as a crucial step toward the understanding of protein function and dynamics.

  12. Detection of somatic mutations by high-resolution DNA melting (HRM) analysis in multiple cancers.

    PubMed

    Gonzalez-Bosquet, Jesus; Calcei, Jacob; Wei, Jun S; Garcia-Closas, Montserrat; Sherman, Mark E; Hewitt, Stephen; Vockley, Joseph; Lissowska, Jolanta; Yang, Hannah P; Khan, Javed; Chanock, Stephen

    2011-01-17

    Identification of somatic mutations in cancer is a major goal for understanding and monitoring the events related to cancer initiation and progression. High resolution melting (HRM) curve analysis represents a fast, post-PCR high-throughput method for scanning somatic sequence alterations in target genes. The aim of this study was to assess the sensitivity and specificity of HRM analysis for tumor mutation screening in a range of tumor samples, which included 216 frozen pediatric small rounded blue-cell tumors as well as 180 paraffin-embedded tumors from breast, endometrial and ovarian cancers (60 of each). HRM analysis was performed in exons of the following candidate genes known to harbor established commonly observed mutations: PIK3CA, ERBB2, KRAS, TP53, EGFR, BRAF, GATA3, and FGFR3. Bi-directional sequencing analysis was used to determine the accuracy of the HRM analysis. For the 39 mutations observed in frozen samples, the sensitivity and specificity of HRM analysis were 97% and 87%, respectively. There were 67 mutation/variants in the paraffin-embedded samples, and the sensitivity and specificity for the HRM analysis were 88% and 80%, respectively. Paraffin-embedded samples require higher quantity of purified DNA for high performance. In summary, HRM analysis is a promising moderate-throughput screening test for mutations among known candidate genomic regions. Although the overall accuracy appears to be better in frozen specimens, somatic alterations were detected in DNA extracted from paraffin-embedded samples.

  13. Detection of Somatic Mutations by High-Resolution DNA Melting (HRM) Analysis in Multiple Cancers

    PubMed Central

    Gonzalez-Bosquet, Jesus; Calcei, Jacob; Wei, Jun S.; Garcia-Closas, Montserrat; Sherman, Mark E.; Hewitt, Stephen; Vockley, Joseph; Lissowska, Jolanta; Yang, Hannah P.; Khan, Javed; Chanock, Stephen

    2011-01-01

    Identification of somatic mutations in cancer is a major goal for understanding and monitoring the events related to cancer initiation and progression. High resolution melting (HRM) curve analysis represents a fast, post-PCR high-throughput method for scanning somatic sequence alterations in target genes. The aim of this study was to assess the sensitivity and specificity of HRM analysis for tumor mutation screening in a range of tumor samples, which included 216 frozen pediatric small rounded blue-cell tumors as well as 180 paraffin-embedded tumors from breast, endometrial and ovarian cancers (60 of each). HRM analysis was performed in exons of the following candidate genes known to harbor established commonly observed mutations: PIK3CA, ERBB2, KRAS, TP53, EGFR, BRAF, GATA3, and FGFR3. Bi-directional sequencing analysis was used to determine the accuracy of the HRM analysis. For the 39 mutations observed in frozen samples, the sensitivity and specificity of HRM analysis were 97% and 87%, respectively. There were 67 mutation/variants in the paraffin-embedded samples, and the sensitivity and specificity for the HRM analysis were 88% and 80%, respectively. Paraffin-embedded samples require higher quantity of purified DNA for high performance. In summary, HRM analysis is a promising moderate-throughput screening test for mutations among known candidate genomic regions. Although the overall accuracy appears to be better in frozen specimens, somatic alterations were detected in DNA extracted from paraffin-embedded samples. PMID:21264207

  14. What baboons can (not) tell us about natural language grammars.

    PubMed

    Poletiek, Fenna H; Fitz, Hartmut; Bocanegra, Bruno R

    2016-06-01

    Rey et al. (2012) present data from a study with baboons that they interpret in support of the idea that center-embedded structures in human language have their origin in low level memory mechanisms and associative learning. Critically, the authors claim that the baboons showed a behavioral preference that is consistent with center-embedded sequences over other types of sequences. We argue that the baboons' response patterns suggest that two mechanisms are involved: first, they can be trained to associate a particular response with a particular stimulus, and, second, when faced with two conditioned stimuli in a row, they respond to the most recent one first, copying behavior they had been rewarded for during training. Although Rey et al. (2012) 'experiment shows that the baboons' behavior is driven by low level mechanisms, it is not clear how the animal behavior reported, bears on the phenomenon of Center Embedded structures in human syntax. Hence, (1) natural language syntax may indeed have been shaped by low level mechanisms, and (2) the baboons' behavior is driven by low level stimulus response learning, as Rey et al. propose. But is the second evidence for the first? We will discuss in what ways this study can and cannot give evidential value for explaining the origin of Center Embedded recursion in human grammar. More generally, their study provokes an interesting reflection on the use of animal studies in order to understand features of the human linguistic system. Copyright © 2015 Elsevier B.V. All rights reserved.

  15. Understanding protein evolution: from protein physics to Darwinian selection.

    PubMed

    Zeldovich, Konstantin B; Shakhnovich, Eugene I

    2008-01-01

    Efforts in whole-genome sequencing and structural proteomics start to provide a global view of the protein universe, the set of existing protein structures and sequences. However, approaches based on the selection of individual sequences have not been entirely successful at the quantitative description of the distribution of structures and sequences in the protein universe because evolutionary pressure acts on the entire organism, rather than on a particular molecule. In parallel to this line of study, studies in population genetics and phenomenological molecular evolution established a mathematical framework to describe the changes in genome sequences in populations of organisms over time. Here, we review both microscopic (physics-based) and macroscopic (organism-level) models of protein-sequence evolution and demonstrate that bridging the two scales provides the most complete description of the protein universe starting from clearly defined, testable, and physiologically relevant assumptions.

  16. Comparative characterization of random-sequence proteins consisting of 5, 12, and 20 kinds of amino acids

    PubMed Central

    Tanaka, Junko; Doi, Nobuhide; Takashima, Hideaki; Yanagawa, Hiroshi

    2010-01-01

    Screening of functional proteins from a random-sequence library has been used to evolve novel proteins in the field of evolutionary protein engineering. However, random-sequence proteins consisting of the 20 natural amino acids tend to aggregate, and the occurrence rate of functional proteins in a random-sequence library is low. From the viewpoint of the origin of life, it has been proposed that primordial proteins consisted of a limited set of amino acids that could have been abundantly formed early during chemical evolution. We have previously found that members of a random-sequence protein library constructed with five primitive amino acids show high solubility (Doi et al., Protein Eng Des Sel 2005;18:279–284). Although such a library is expected to be appropriate for finding functional proteins, the functionality may be limited, because they have no positively charged amino acid. Here, we constructed three libraries of 120-amino acid, random-sequence proteins using alphabets of 5, 12, and 20 amino acids by preselection using mRNA display (to eliminate sequences containing stop codons and frameshifts) and characterized and compared the structural properties of random-sequence proteins arbitrarily chosen from these libraries. We found that random-sequence proteins constructed with the 12-member alphabet (including five primitive amino acids and positively charged amino acids) have higher solubility than those constructed with the 20-member alphabet, though other biophysical properties are very similar in the two libraries. Thus, a library of moderate complexity constructed from 12 amino acids may be a more appropriate resource for functional screening than one constructed from 20 amino acids. PMID:20162614

  17. Diversity of the P2 protein among nontypeable Haemophilus influenzae isolates.

    PubMed Central

    Bell, J; Grass, S; Jeanteur, D; Munson, R S

    1994-01-01

    The genes for outer membrane protein P2 of four nontypeable Haemophilus influenzae strains were cloned and sequenced. The derived amino acid sequences were compared with the outer membrane protein P2 sequence from H. influenzae type b MinnA and the sequences of P2 from three additional nontypeable H. influenzae strains. The sequences were 76 to 94% identical. The sequences had regions with considerable variability separated by regions which were highly conserved. The variable regions mapped to putative surface-exposed loops of the protein. PMID:8188390

  18. Amino acid sequences of ribosomal proteins S11 from Bacillus stearothermophilus and S19 from Halobacterium marismortui. Comparison of the ribosomal protein S11 family.

    PubMed

    Kimura, M; Kimura, J; Hatakeyama, T

    1988-11-21

    The complete amino acid sequences of ribosomal proteins S11 from the Gram-positive eubacterium Bacillus stearothermophilus and of S19 from the archaebacterium Halobacterium marismortui have been determined. A search for homologous sequences of these proteins revealed that they belong to the ribosomal protein S11 family. Homologous proteins have previously been sequenced from Escherichia coli as well as from chloroplast, yeast and mammalian ribosomes. A pairwise comparison of the amino acid sequences showed that Bacillus protein S11 shares 68% identical residues with S11 from Escherichia coli and a slightly lower homology (52%) with the homologous chloroplast protein. The halophilic protein S19 is more related to the eukaryotic (45-49%) than to the eubacterial counterparts (35%).

  19. "De-novo" amino acid sequence elucidation of protein G'e by combined "top-down" and "bottom-up" mass spectrometry.

    PubMed

    Yefremova, Yelena; Al-Majdoub, Mahmoud; Opuni, Kwabena F M; Koy, Cornelia; Cui, Weidong; Yan, Yuetian; Gross, Michael L; Glocker, Michael O

    2015-03-01

    Mass spectrometric de-novo sequencing was applied to review the amino acid sequence of a commercially available recombinant protein G´ with great scientific and economic importance. Substantial deviations to the published amino acid sequence (Uniprot Q54181) were found by the presence of 46 additional amino acids at the N-terminus, including a so-called "His-tag" as well as an N-terminal partial α-N-gluconoylation and α-N-phosphogluconoylation, respectively. The unexpected amino acid sequence of the commercial protein G' comprised 241 amino acids and resulted in a molecular mass of 25,998.9 ± 0.2 Da for the unmodified protein. Due to the higher mass that is caused by its extended amino acid sequence compared with the original protein G' (185 amino acids), we named this protein "protein G'e." By means of mass spectrometric peptide mapping, the suggested amino acid sequence, as well as the N-terminal partial α-N-gluconoylations, was confirmed with 100% sequence coverage. After the protein G'e sequence was determined, we were able to determine the expression vector pET-28b from Novagen with the Xho I restriction enzyme cleavage site as the best option that was used for cloning and expressing the recombinant protein G'e in E. coli. A dissociation constant (K(d)) value of 9.4 nM for protein G'e was determined thermophoretically, showing that the N-terminal flanking sequence extension did not cause significant changes in the binding affinity to immunoglobulins.

  20. Pfarao: a web application for protein family analysis customized for cytoskeletal and motor proteins (CyMoBase).

    PubMed

    Odronitz, Florian; Kollmar, Martin

    2006-11-29

    Annotation of protein sequences of eukaryotic organisms is crucial for the understanding of their function in the cell. Manual annotation is still by far the most accurate way to correctly predict genes. The classification of protein sequences, their phylogenetic relation and the assignment of function involves information from various sources. This often leads to a collection of heterogeneous data, which is hard to track. Cytoskeletal and motor proteins consist of large and diverse superfamilies comprising up to several dozen members per organism. Up to date there is no integrated tool available to assist in the manual large-scale comparative genomic analysis of protein families. Pfarao (Protein Family Application for Retrieval, Analysis and Organisation) is a database driven online working environment for the analysis of manually annotated protein sequences and their relationship. Currently, the system can store and interrelate a wide range of information about protein sequences, species, phylogenetic relations and sequencing projects as well as links to literature and domain predictions. Sequences can be imported from multiple sequence alignments that are generated during the annotation process. A web interface allows to conveniently browse the database and to compile tabular and graphical summaries of its content. We implemented a protein sequence-centric web application to store, organize, interrelate, and present heterogeneous data that is generated in manual genome annotation and comparative genomics. The application has been developed for the analysis of cytoskeletal and motor proteins (CyMoBase) but can easily be adapted for any protein.

  1. Conservation of coevolving protein interfaces bridges prokaryote-eukaryote homologies in the twilight zone.

    PubMed

    Rodriguez-Rivas, Juan; Marsili, Simone; Juan, David; Valencia, Alfonso

    2016-12-27

    Protein-protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue coevolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that coevolution points to structurally conserved contacts at protein-protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a domain-centered protocol to study the interplay between residue coevolution and structural conservation of protein-protein interfaces. We show that sequence-based coevolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein-protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence (the twilight zone), where standard homology modeling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic coevolutionary analysis to the prediction of eukaryotic interfaces further illustrate the potential of this approach.

  2. Foldit Standalone: a video game-derived protein structure manipulation interface using Rosetta.

    PubMed

    Kleffner, Robert; Flatten, Jeff; Leaver-Fay, Andrew; Baker, David; Siegel, Justin B; Khatib, Firas; Cooper, Seth

    2017-09-01

    Foldit Standalone is an interactive graphical interface to the Rosetta molecular modeling package. In contrast to most command-line or batch interactions with Rosetta, Foldit Standalone is designed to allow easy, real-time, direct manipulation of protein structures, while also giving access to the extensive power of Rosetta computations. Derived from the user interface of the scientific discovery game Foldit (itself based on Rosetta), Foldit Standalone has added more advanced features and removed the competitive game elements. Foldit Standalone was built from the ground up with a custom rendering and event engine, configurable visualizations and interactions driven by Rosetta. Foldit Standalone contains, among other features: electron density and contact map visualizations, multiple sequence alignment tools for template-based modeling, rigid body transformation controls, RosettaScripts support and an embedded Lua interpreter. Foldit Standalone is available for download at https://fold.it/standalone , under the Rosetta license, which is free for academic and non-profit users. It is implemented in cross-platform C ++ and binary executables are available for Windows, macOS and Linux. scooper@ccs.neu.edu. © The Author(s) 2017. Published by Oxford University Press.

  3. The Inhibition of Stat5 by a Peptide Aptamer Ligand Specific for the DNA Binding Domain Prevents Target Gene Transactivation and the Growth of Breast and Prostate Tumor Cells

    PubMed Central

    Weber, Axel; Borghouts, Corina; Brendel, Christian; Moriggl, Richard; Delis, Natalia; Brill, Boris; Vafaizadeh, Vida; Groner, Bernd

    2013-01-01

    The signal transducer and activator of transcription Stat5 is transiently activated by growth factor and cytokine signals in normal cells, but its persistent activation has been observed in a wide range of human tumors. Aberrant Stat5 activity was initially observed in leukemias, but subsequently also found in carcinomas. We investigated the importance of Stat5 in human tumor cell lines. shRNA mediated downregulation of Stat5 revealed the dependence of prostate and breast cancer cells on the expression of this transcription factor. We extended these inhibition studies and derived a peptide aptamer (PA) ligand, which directly interacts with the DNA-binding domain of Stat5 in a yeast-two-hybrid screen. The Stat5 specific PA sequence is embedded in a thioredoxin (hTRX) scaffold protein. The resulting recombinant protein S5-DBD-PA was expressed in bacteria, purified and introduced into tumor cells by protein transduction. Alternatively, S5-DBD-PA was expressed in the tumor cells after infection with a S5-DBD-PA encoding gene transfer vector. Both strategies impaired the DNA-binding ability of Stat5, suppressed Stat5 dependent transactivation and caused its intracellular degradation. Our experiments describe a peptide based inhibitor of Stat5 protein activity which can serve as a lead for the development of a clinically useful compound for cancer treatment. PMID:24276378

  4. Sequence repeats and protein structure

    NASA Astrophysics Data System (ADS)

    Hoang, Trinh X.; Trovato, Antonio; Seno, Flavio; Banavar, Jayanth R.; Maritan, Amos

    2012-11-01

    Repeats are frequently found in known protein sequences. The level of sequence conservation in tandem repeats correlates with their propensities to be intrinsically disordered. We employ a coarse-grained model of a protein with a two-letter amino acid alphabet, hydrophobic (H) and polar (P), to examine the sequence-structure relationship in the realm of repeated sequences. A fraction of repeated sequences comprises a distinct class of bad folders, whose folding temperatures are much lower than those of random sequences. Imperfection in sequence repetition improves the folding properties of the bad folders while deteriorating those of the good folders. Our results may explain why nature has utilized repeated sequences for their versatility and especially to design functional proteins that are intrinsically unstructured at physiological temperatures.

  5. Tetramer-organizing polyproline-rich peptides differ in CHO cell-expressed and plasma-derived human butyrylcholinesterase tetramers.

    PubMed

    Schopfer, Lawrence M; Lockridge, Oksana

    2016-06-01

    Tetrameric butyrylcholinesterase (BChE) in human plasma is the product of multiple genes, namely one BCHE gene on chromosome 3q26.1 and multiple genes that encode polyproline-rich peptides. The function of the polyproline-rich peptides is to assemble BChE into tetramers. CHO cells transfected with human BChE cDNA express BChE monomers and dimers, but only low quantities of tetramers. Our goal was to identify the polyproline-rich peptides in CHO-cell derived human BChE tetramers. CHO cell-produced human BChE tetramers were purified from serum-free culture medium. Peptides embedded in the tetramerization domain were released from BChE tetramers by boiling and identified by liquid chromatography-tandem mass spectrometry. A total of 270 proline-rich peptides were sequenced, ranging in size from 6-41 residues. The peptides originated from 60 different proteins that reside in multiple cell compartments including the nucleus, cytoplasm, and endoplasmic reticulum. No single protein was the source of the polyproline-rich peptides in CHO cell-expressed human BChE tetramers. In contrast, 70% of the tetramer-organizing peptides in plasma-derived BChE tetramers originate from lamellipodin. No protein source was identified for polyproline peptides containing up to 41 consecutive proline residues. In conclusion, the use of polyproline-rich peptides as a tetramerization motif is documented only for the cholinesterases, but is expected to serve other tetrameric proteins as well. The CHO cell data suggest that the BChE tetramer-organizing peptide can arise from a variety of proteins. Copyright © 2016 Elsevier B.V. All rights reserved.

  6. Rapid Identification of Sequences for Orphan Enzymes to Power Accurate Protein Annotation

    PubMed Central

    Ojha, Sunil; Watson, Douglas S.; Bomar, Martha G.; Galande, Amit K.; Shearer, Alexander G.

    2013-01-01

    The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the “back catalog” of enzymology – “orphan enzymes,” those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme “back catalog” is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology’s “back catalog” another powerful tool to drive accurate genome annotation. PMID:24386392

  7. Rapid identification of sequences for orphan enzymes to power accurate protein annotation.

    PubMed

    Ramkissoon, Kevin R; Miller, Jennifer K; Ojha, Sunil; Watson, Douglas S; Bomar, Martha G; Galande, Amit K; Shearer, Alexander G

    2013-01-01

    The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the "back catalog" of enzymology--"orphan enzymes," those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme "back catalog" is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology's "back catalog" another powerful tool to drive accurate genome annotation.

  8. Dynamics of domain coverage of the protein sequence universe.

    PubMed

    Rekapalli, Bhanu; Wuichet, Kristin; Peterson, Gregory D; Zhulin, Igor B

    2012-11-16

    The currently known protein sequence space consists of millions of sequences in public databases and is rapidly expanding. Assigning sequences to families leads to a better understanding of protein function and the nature of the protein universe. However, a large portion of the current protein space remains unassigned and is referred to as its "dark matter". Here we suggest that true size of "dark matter" is much larger than stated by current definitions. We propose an approach to reducing the size of "dark matter" by identifying and subtracting regions in protein sequences that are not likely to contain any domain. Recent improvements in computational domain modeling result in a decrease, albeit slowly, in the relative size of "dark matter"; however, its absolute size increases substantially with the growth of sequence data.

  9. Clinical Actionability of Comprehensive Genomic Profiling for Management of Rare or Refractory Cancers

    PubMed Central

    Hirshfield, Kim M.; Tolkunov, Denis; Zhong, Hua; Ali, Siraj M.; Stein, Mark N.; Murphy, Susan; Vig, Hetal; Vazquez, Alexei; Glod, John; Moss, Rebecca A.; Belyi, Vladimir; Chan, Chang S.; Chen, Suzie; Goodell, Lauri; Foran, David; Yelensky, Roman; Palma, Norma A.; Sun, James X.; Miller, Vincent A.; Stephens, Philip J.; Ross, Jeffrey S.; Kaufman, Howard; Poplin, Elizabeth; Mehnert, Janice; Tan, Antoinette R.; Bertino, Joseph R.; Aisner, Joseph; DiPaola, Robert S.

    2016-01-01

    Background. The frequency with which targeted tumor sequencing results will lead to implemented change in care is unclear. Prospective assessment of the feasibility and limitations of using genomic sequencing is critically important. Methods. A prospective clinical study was conducted on 100 patients with diverse-histology, rare, or poor-prognosis cancers to evaluate the clinical actionability of a Clinical Laboratory Improvement Amendments (CLIA)-certified, comprehensive genomic profiling assay (FoundationOne), using formalin-fixed, paraffin-embedded tumors. The primary objectives were to assess utility, feasibility, and limitations of genomic sequencing for genomically guided therapy or other clinical purpose in the setting of a multidisciplinary molecular tumor board. Results. Of the tumors from the 92 patients with sufficient tissue, 88 (96%) had at least one genomic alteration (average 3.6, range 0–10). Commonly altered pathways included p53 (46%), RAS/RAF/MAPK (rat sarcoma; rapidly accelerated fibrosarcoma; mitogen-activated protein kinase) (45%), receptor tyrosine kinases/ligand (44%), PI3K/AKT/mTOR (phosphatidylinositol-4,5-bisphosphate 3-kinase; protein kinase B; mammalian target of rapamycin) (35%), transcription factors/regulators (31%), and cell cycle regulators (30%). Many low frequency but potentially actionable alterations were identified in diverse histologies. Use of comprehensive profiling led to implementable clinical action in 35% of tumors with genomic alterations, including genomically guided therapy, diagnostic modification, and trigger for germline genetic testing. Conclusion. Use of targeted next-generation sequencing in the setting of an institutional molecular tumor board led to implementable clinical action in more than one third of patients with rare and poor-prognosis cancers. Major barriers to implementation of genomically guided therapy were clinical status of the patient and drug access. Early and serial sequencing in the clinical course and expanded access to genomically guided early-phase clinical trials and targeted agents may increase actionability. Implications for Practice: Identification of key factors that facilitate use of genomic tumor testing results and implementation of genomically guided therapy may lead to enhanced benefit for patients with rare or difficult to treat cancers. Clinical use of a targeted next-generation sequencing assay in the setting of an institutional molecular tumor board led to implementable clinical action in over one third of patients with rare and poor prognosis cancers. The major barriers to implementation of genomically guided therapy were clinical status of the patient and drug access both on trial and off label. Approaches to increase actionability include early and serial sequencing in the clinical course and expanded access to genomically guided early phase clinical trials and targeted agents. PMID:27566247

  10. Recombinant protein secretion in Pseudozyma flocculosa and Pseudozyma antarctica with a novel signal peptide.

    PubMed

    Cheng, Yali; Avis, Tyler J; Bolduc, Sébastien; Zhao, Yingyi; Anguenot, Raphaël; Neveu, Bertrand; Labbé, Caroline; Belzile, François; Bélanger, Richard R

    2008-12-01

    Secretion of recombinant proteins aims to reproduce the correct posttranslational modifications of the expressed protein while simplifying its recovery. In this study, secretion signal sequences from an abundantly secreted 34-kDa protein (P34) from Pseudozyma flocculosa were cloned. The efficiency of these sequences in the secretion of recombinant green fluorescent protein (GFP) was investigated in two Pseudozyma species and compared with other secretion signal sequences, from S. cerevisiae and Pseudozyma spp. The results indicate that various secretion signal sequences were functional and that the P34 signal peptide was the most effective secretion signal sequence in both P. flocculosa and P. antarctica. The cells correctly processed the secretion signal sequences, including P34 signal peptide, and mature GFP was recovered from the culture medium. This is the first report of functional secretion signal sequences in P. flocculosa. These sequences can be used to test the secretion of other recombinant proteins and for studying the secretion pathway in P. flocculosa and P. antarctica.

  11. Computational prediction of virus-human protein-protein interactions using embedding kernelized heterogeneous data.

    PubMed

    Nourani, Esmaeil; Khunjush, Farshad; Durmuş, Saliha

    2016-05-24

    Pathogenic microorganisms exploit host cellular mechanisms and evade host defense mechanisms through molecular pathogen-host interactions (PHIs). Therefore, comprehensive analysis of these PHI networks should be an initial step for developing effective therapeutics against infectious diseases. Computational prediction of PHI data is gaining increasing demand because of scarcity of experimental data. Prediction of protein-protein interactions (PPIs) within PHI systems can be formulated as a classification problem, which requires the knowledge of non-interacting protein pairs. This is a restricting requirement since we lack datasets that report non-interacting protein pairs. In this study, we formulated the "computational prediction of PHI data" problem using kernel embedding of heterogeneous data. This eliminates the abovementioned requirement and enables us to predict new interactions without randomly labeling protein pairs as non-interacting. Domain-domain associations are used to filter the predicted results leading to 175 novel PHIs between 170 human proteins and 105 viral proteins. To compare our results with the state-of-the-art studies that use a binary classification formulation, we modified our settings to consider the same formulation. Detailed evaluations are conducted and our results provide more than 10 percent improvements for accuracy and AUC (area under the receiving operating curve) results in comparison with state-of-the-art methods.

  12. The Embedding Problem for Markov Models of Nucleotide Substitution

    PubMed Central

    Verbyla, Klara L.; Yap, Von Bing; Pahwa, Anuj; Shao, Yunli; Huttley, Gavin A.

    2013-01-01

    Continuous-time Markov processes are often used to model the complex natural phenomenon of sequence evolution. To make the process of sequence evolution tractable, simplifying assumptions are often made about the sequence properties and the underlying process. The validity of one such assumption, time-homogeneity, has never been explored. Violations of this assumption can be found by identifying non-embeddability. A process is non-embeddable if it can not be embedded in a continuous time-homogeneous Markov process. In this study, non-embeddability was demonstrated to exist when modelling sequence evolution with Markov models. Evidence of non-embeddability was found primarily at the third codon position, possibly resulting from changes in mutation rate over time. Outgroup edges and those with a deeper time depth were found to have an increased probability of the underlying process being non-embeddable. Overall, low levels of non-embeddability were detected when examining individual edges of triads across a diverse set of alignments. Subsequent phylogenetic reconstruction analyses demonstrated that non-embeddability could impact on the correct prediction of phylogenies, but at extremely low levels. Despite the existence of non-embeddability, there is minimal evidence of violations of the local time homogeneity assumption and consequently the impact is likely to be minor. PMID:23935949

  13. Protein 3D Structure Computed from Evolutionary Sequence Variation

    PubMed Central

    Sheridan, Robert; Hopf, Thomas A.; Pagnani, Andrea; Zecchina, Riccardo; Sander, Chris

    2011-01-01

    The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing. In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy. We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7–4.8 Å Cα-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes. PMID:22163331

  14. Modeling repetitive, non‐globular proteins

    PubMed Central

    Basu, Koli; Campbell, Robert L.; Guo, Shuaiqi; Sun, Tianjun

    2016-01-01

    Abstract While ab initio modeling of protein structures is not routine, certain types of proteins are more straightforward to model than others. Proteins with short repetitive sequences typically exhibit repetitive structures. These repetitive sequences can be more amenable to modeling if some information is known about the predominant secondary structure or other key features of the protein sequence. We have successfully built models of a number of repetitive structures with novel folds using knowledge of the consensus sequence within the sequence repeat and an understanding of the likely secondary structures that these may adopt. Our methods for achieving this success are reviewed here. PMID:26914323

  15. Evaluating the protein coding potential of exonized transposable element sequences

    PubMed Central

    Piriyapongsa, Jittima; Rutledge, Mark T; Patel, Sanil; Borodovsky, Mark; Jordan, I King

    2007-01-01

    Background Transposable element (TE) sequences, once thought to be merely selfish or parasitic members of the genomic community, have been shown to contribute a wide variety of functional sequences to their host genomes. Analysis of complete genome sequences have turned up numerous cases where TE sequences have been incorporated as exons into mRNAs, and it is widely assumed that such 'exonized' TEs encode protein sequences. However, the extent to which TE-derived sequences actually encode proteins is unknown and a matter of some controversy. We have tried to address this outstanding issue from two perspectives: i-by evaluating ascertainment biases related to the search methods used to uncover TE-derived protein coding sequences (CDS) and ii-through a probabilistic codon-frequency based analysis of the protein coding potential of TE-derived exons. Results We compared the ability of three classes of sequence similarity search methods to detect TE-derived sequences among data sets of experimentally characterized proteins: 1-a profile-based hidden Markov model (HMM) approach, 2-BLAST methods and 3-RepeatMasker. Profile based methods are more sensitive and more selective than the other methods evaluated. However, the application of profile-based search methods to the detection of TE-derived sequences among well-curated experimentally characterized protein data sets did not turn up many more cases than had been previously detected and nowhere near as many cases as recent genome-wide searches have. We observed that the different search methods used were complementary in the sense that they yielded largely non-overlapping sets of hits and differed in their ability to recover known cases of TE-derived CDS. The probabilistic analysis of TE-derived exon sequences indicates that these sequences have low protein coding potential on average. In particular, non-autonomous TEs that do not encode protein sequences, such as Alu elements, are frequently exonized but unlikely to encode protein sequences. Conclusion The exaptation of the numerous TE sequences found in exons as bona fide protein coding sequences may prove to be far less common than has been suggested by the analysis of complete genomes. We hypothesize that many exonized TE sequences actually function as post-transcriptional regulators of gene expression, rather than coding sequences, which may act through a variety of double stranded RNA related regulatory pathways. Indeed, their relatively high copy numbers and similarity to sequences dispersed throughout the genome suggests that exonized TE sequences could serve as master regulators with a wide scope of regulatory influence. Reviewers: This article was reviewed by Itai Yanai, Kateryna D. Makova, Melissa Wilson (nominated by Kateryna D. Makova) and Cedric Feschotte (nominated by John M. Logsdon Jr.). PMID:18036258

  16. On the Role of Aggregation Prone Regions in Protein Evolution, Stability, and Enzymatic Catalysis: Insights from Diverse Analyses

    PubMed Central

    Buck, Patrick M.; Kumar, Sandeep; Singh, Satish K.

    2013-01-01

    The various roles that aggregation prone regions (APRs) are capable of playing in proteins are investigated here via comprehensive analyses of multiple non-redundant datasets containing randomly generated amino acid sequences, monomeric proteins, intrinsically disordered proteins (IDPs) and catalytic residues. Results from this study indicate that the aggregation propensities of monomeric protein sequences have been minimized compared to random sequences with uniform and natural amino acid compositions, as observed by a lower average aggregation propensity and fewer APRs that are shorter in length and more often punctuated by gate-keeper residues. However, evidence for evolutionary selective pressure to disrupt these sequence regions among homologous proteins is inconsistent. APRs are less conserved than average sequence identity among closely related homologues (≥80% sequence identity with a parent) but APRs are more conserved than average sequence identity among homologues that have at least 50% sequence identity with a parent. Structural analyses of APRs indicate that APRs are three times more likely to contain ordered versus disordered residues and that APRs frequently contribute more towards stabilizing proteins than equal length segments from the same protein. Catalytic residues and APRs were also found to be in structural contact significantly more often than expected by random chance. Our findings suggest that proteins have evolved by optimizing their risk of aggregation for cellular environments by both minimizing aggregation prone regions and by conserving those that are important for folding and function. In many cases, these sequence optimizations are insufficient to develop recombinant proteins into commercial products. Rational design strategies aimed at improving protein solubility for biotechnological purposes should carefully evaluate the contributions made by candidate APRs, targeted for disruption, towards protein structure and activity. PMID:24146608

  17. ProfileGrids: a sequence alignment visualization paradigm that avoids the limitations of Sequence Logos

    PubMed Central

    2014-01-01

    Background The 2013 BioVis Contest provided an opportunity to evaluate different paradigms for visualizing protein multiple sequence alignments. Such data sets are becoming extremely large and thus taxing current visualization paradigms. Sequence Logos represent consensus sequences but have limitations for protein alignments. As an alternative, ProfileGrids are a new protein sequence alignment visualization paradigm that represents an alignment as a color-coded matrix of the residue frequency occurring at every homologous position in the aligned protein family. Results The JProfileGrid software program was used to analyze the BioVis contest data sets to generate figures for comparison with the Sequence Logo reference images. Conclusions The ProfileGrid representation allows for the clear and effective analysis of protein multiple sequence alignments. This includes both a general overview of the conservation and diversity sequence patterns as well as the interactive ability to query the details of the protein residue distributions in the alignment. The JProfileGrid software is free and available from http://www.ProfileGrid.org. PMID:25237393

  18. SIMAP--a comprehensive database of pre-calculated protein sequence similarities, domains, annotations and clusters.

    PubMed

    Rattei, Thomas; Tischler, Patrick; Götz, Stefan; Jehl, Marc-André; Hoser, Jonathan; Arnold, Roland; Conesa, Ana; Mewes, Hans-Werner

    2010-01-01

    The prediction of protein function as well as the reconstruction of evolutionary genesis employing sequence comparison at large is still the most powerful tool in sequence analysis. Due to the exponential growth of the number of known protein sequences and the subsequent quadratic growth of the similarity matrix, the computation of the Similarity Matrix of Proteins (SIMAP) becomes a computational intensive task. The SIMAP database provides a comprehensive and up-to-date pre-calculation of the protein sequence similarity matrix, sequence-based features and sequence clusters. As of September 2009, SIMAP covers 48 million proteins and more than 23 million non-redundant sequences. Novel features of SIMAP include the expansion of the sequence space by including databases such as ENSEMBL as well as the integration of metagenomes based on their consistent processing and annotation. Furthermore, protein function predictions by Blast2GO are pre-calculated for all sequences in SIMAP and the data access and query functions have been improved. SIMAP assists biologists to query the up-to-date sequence space systematically and facilitates large-scale downstream projects in computational biology. Access to SIMAP is freely provided through the web portal for individuals (http://mips.gsf.de/simap/) and for programmatic access through DAS (http://webclu.bio.wzw.tum.de/das/) and Web-Service (http://mips.gsf.de/webservices/services/SimapService2.0?wsdl).

  19. Use of designed sequences in protein structure recognition.

    PubMed

    Kumar, Gayatri; Mudgal, Richa; Srinivasan, Narayanaswamy; Sandhya, Sankaran

    2018-05-09

    Knowledge of the protein structure is a pre-requisite for improved understanding of molecular function. The gap in the sequence-structure space has increased in the post-genomic era. Grouping related protein sequences into families can aid in narrowing the gap. In the Pfam database, structure description is provided for part or full-length proteins of 7726 families. For the remaining 52% of the families, information on 3-D structure is not yet available. We use the computationally designed sequences that are intermediately related to two protein domain families, which are already known to share the same fold. These strategically designed sequences enable detection of distant relationships and here, we have employed them for the purpose of structure recognition of protein families of yet unknown structure. We first measured the success rate of our approach using a dataset of protein families of known fold and achieved a success rate of 88%. Next, for 1392 families of yet unknown structure, we made structural assignments for part/full length of the proteins. Fold association for 423 domains of unknown function (DUFs) are provided as a step towards functional annotation. The results indicate that knowledge-based filling of gaps in protein sequence space is a lucrative approach for structure recognition. Such sequences assist in traversal through protein sequence space and effectively function as 'linkers', where natural linkers between distant proteins are unavailable. This article was reviewed by Oliviero Carugo, Christine Orengo and Srikrishna Subramanian.

  20. Determination of ABO genotypes with DNA extracted from formalin-fixed, paraffin-embedded tissues.

    PubMed

    Yamada, M; Yamamoto, Y; Tanegashima, A; Kane, M; Ikehara, Y; Fukunaga, T; Nishi, K

    1994-01-01

    The gene encoding the specific glycosyltransferases which catalyze the conversion of the H antigen to A or B antigens shows a slight but distinct variation in its allelic nucleotide sequence and can be divided into 6 genotypes when digested with specific restriction enzymes. We extracted DNA from formalin-fixed, paraffin-embedded tissues using SDS/proteinase K treatment followed by phenol/chloroform extraction. The sequence of nucleotides for the A, B and O genes was amplified by the polymerase chain reaction (PCR). DNA fragments of 128 bp and 200 bp could be amplified in the second round of PCR, using an aliquot of the first round PCR product as template. Degraded DNA from paraffin blocks stored for up to 10.7 years could be successfully typed. The ABO genotype was deduced from the digestion patterns with an appropriate combination of restriction enzymes and was compatible with the phenotype obtained from the blood sample.

  1. High-Throughput Amplicon-Based Copy Number Detection of 11 Genes in Formalin-Fixed Paraffin-Embedded Ovarian Tumour Samples by MLPA-Seq

    PubMed Central

    Kondrashova, Olga; Love, Clare J.; Lunke, Sebastian; Hsu, Arthur L.; Waring, Paul M.; Taylor, Graham R.

    2015-01-01

    Whilst next generation sequencing can report point mutations in fixed tissue tumour samples reliably, the accurate determination of copy number is more challenging. The conventional Multiplex Ligation-dependent Probe Amplification (MLPA) assay is an effective tool for measurement of gene dosage, but is restricted to around 50 targets due to size resolution of the MLPA probes. By switching from a size-resolved format, to a sequence-resolved format we developed a scalable, high-throughput, quantitative assay. MLPA-seq is capable of detecting deletions, duplications, and amplifications in as little as 5ng of genomic DNA, including from formalin-fixed paraffin-embedded (FFPE) tumour samples. We show that this method can detect BRCA1, BRCA2, ERBB2 and CCNE1 copy number changes in DNA extracted from snap-frozen and FFPE tumour tissue, with 100% sensitivity and >99.5% specificity. PMID:26569395

  2. Increasing Sequence Diversity with Flexible Backbone Protein Design: The Complete Redesign of a Protein Hydrophobic Core

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Murphy, Grant S.; Mills, Jeffrey L.; Miley, Michael J.

    2015-10-15

    Protein design tests our understanding of protein stability and structure. Successful design methods should allow the exploration of sequence space not found in nature. However, when redesigning naturally occurring protein structures, most fixed backbone design algorithms return amino acid sequences that share strong sequence identity with wild-type sequences, especially in the protein core. This behavior places a restriction on functional space that can be explored and is not consistent with observations from nature, where sequences of low identity have similar structures. Here, we allow backbone flexibility during design to mutate every position in the core (38 residues) of a four-helixmore » bundle protein. Only small perturbations to the backbone, 12 {angstrom}, were needed to entirely mutate the core. The redesigned protein, DRNN, is exceptionally stable (melting point >140C). An NMR and X-ray crystal structure show that the side chains and backbone were accurately modeled (all-atom RMSD = 1.3 {angstrom}).« less

  3. Effective Application of Bicelles for Conformational Analysis of G Protein-Coupled Receptors by Hydrogen/Deuterium Exchange Mass Spectrometry

    NASA Astrophysics Data System (ADS)

    Duc, Nguyen Minh; Du, Yang; Thorsen, Thor S.; Lee, Su Youn; Zhang, Cheng; Kato, Hideaki; Kobilka, Brian K.; Chung, Ka Young

    2015-05-01

    G protein-coupled receptors (GPCRs) have important roles in physiology and pathology, and 40% of drugs currently on the market target GPCRs for the treatment of various diseases. Because of their therapeutic importance, the structural mechanism of GPCR signaling is of great interest in the field of drug discovery. Hydrogen/deuterium exchange mass spectrometry (HDX-MS) is a useful tool for analyzing ligand binding sites, the protein-protein interaction interface, and conformational changes of proteins. However, its application to GPCRs has been limited for various reasons, including the hydrophobic nature of GPCRs and the use of detergents in their preparation. In the present study, we tested the application of bicelles as a means of solubilizing GPCRs for HDX-MS studies. GPCRs (e.g., β2-adrenergic receptor [β2AR], μ-opioid receptor, and protease-activated receptor 1) solubilized in bicelles produced better sequence coverage (greater than 90%) than GPCRs solubilized in n-dodecyl-β-D-maltopyranoside (DDM), suggesting that bicelles are a more effective method of solubilization for HDX-MS studies. The HDX-MS profile of β2AR in bicelles showed that transmembrane domains (TMs) undergo lower deuterium uptake than intracellular or extracellular regions, which is consistent with the fact that the TMs are highly ordered and embedded in bicelles. The overall HDX-MS profiles of β2AR solubilized in bicelles and in DDM were similar except for intracellular loop 3. Interestingly, we detected EX1 kinetics, an important phenomenon in protein dynamics, at the C-terminus of TM6 in β2AR. In conclusion, we suggest the application of bicelles as a useful method for solubilizing GPCRs for conformational analysis by HDX-MS.

  4. Accurate prediction of protein-protein interactions by integrating potential evolutionary information embedded in PSSM profile and discriminative vector machine classifier.

    PubMed

    Li, Zheng-Wei; You, Zhu-Hong; Chen, Xing; Li, Li-Ping; Huang, De-Shuang; Yan, Gui-Ying; Nie, Ru; Huang, Yu-An

    2017-04-04

    Identification of protein-protein interactions (PPIs) is of critical importance for deciphering the underlying mechanisms of almost all biological processes of cell and providing great insight into the study of human disease. Although much effort has been devoted to identifying PPIs from various organisms, existing high-throughput biological techniques are time-consuming, expensive, and have high false positive and negative results. Thus it is highly urgent to develop in silico methods to predict PPIs efficiently and accurately in this post genomic era. In this article, we report a novel computational model combining our newly developed discriminative vector machine classifier (DVM) and an improved Weber local descriptor (IWLD) for the prediction of PPIs. Two components, differential excitation and orientation, are exploited to build evolutionary features for each protein sequence. The main characteristics of the proposed method lies in introducing an effective feature descriptor IWLD which can capture highly discriminative evolutionary information from position-specific scoring matrixes (PSSM) of protein data, and employing the powerful and robust DVM classifier. When applying the proposed method to Yeast and H. pylori data sets, we obtained excellent prediction accuracies as high as 96.52% and 91.80%, respectively, which are significantly better than the previous methods. Extensive experiments were then performed for predicting cross-species PPIs and the predictive results were also pretty promising. To further validate the performance of the proposed method, we compared it with the state-of-the-art support vector machine (SVM) classifier on Human data set. The experimental results obtained indicate that our method is highly effective for PPIs prediction and can be taken as a supplementary tool for future proteomics research.

  5. Conservation of Shannon's redundancy for proteins. [information theory applied to amino acid sequences

    NASA Technical Reports Server (NTRS)

    Gatlin, L. L.

    1974-01-01

    Concepts of information theory are applied to examine various proteins in terms of their redundancy in natural originators such as animals and plants. The Monte Carlo method is used to derive information parameters for random protein sequences. Real protein sequence parameters are compared with the standard parameters of protein sequences having a specific length. The tendency of a chain to contain some amino acids more frequently than others and the tendency of a chain to contain certain amino acid pairs more frequently than other pairs are used as randomness measures of individual protein sequences. Non-periodic proteins are generally found to have random Shannon redundancies except in cases of constraints due to short chain length and genetic codes. Redundant characteristics of highly periodic proteins are discussed. A degree of periodicity parameter is derived.

  6. The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families.

    PubMed

    Yooseph, Shibu; Sutton, Granger; Rusch, Douglas B; Halpern, Aaron L; Williamson, Shannon J; Remington, Karin; Eisen, Jonathan A; Heidelberg, Karla B; Manning, Gerard; Li, Weizhong; Jaroszewski, Lukasz; Cieplak, Piotr; Miller, Christopher S; Li, Huiying; Mashiyama, Susan T; Joachimiak, Marcin P; van Belle, Christopher; Chandonia, John-Marc; Soergel, David A; Zhai, Yufeng; Natarajan, Kannan; Lee, Shaun; Raphael, Benjamin J; Bafna, Vineet; Friedman, Robert; Brenner, Steven E; Godzik, Adam; Eisenberg, David; Dixon, Jack E; Taylor, Susan S; Strausberg, Robert L; Frazier, Marvin; Venter, J Craig

    2007-03-01

    Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.

  7. Can natural proteins designed with 'inverted' peptide sequences adopt native-like protein folds?

    PubMed

    Sridhar, Settu; Guruprasad, Kunchur

    2014-01-01

    We have carried out a systematic computational analysis on a representative dataset of proteins of known three-dimensional structure, in order to evaluate whether it would possible to 'swap' certain short peptide sequences in naturally occurring proteins with their corresponding 'inverted' peptides and generate 'artificial' proteins that are predicted to retain native-like protein fold. The analysis of 3,967 representative proteins from the Protein Data Bank revealed 102,677 unique identical inverted peptide sequence pairs that vary in sequence length between 5-12 and 18 amino acid residues. Our analysis illustrates with examples that such 'artificial' proteins may be generated by identifying peptides with 'similar structural environment' and by using comparative protein modeling and validation studies. Our analysis suggests that natural proteins may be tolerant to accommodating such peptides.

  8. N-Terminal Amino Acid Sequence Determination of Proteins by N-Terminal Dimethyl Labeling: Pitfalls and Advantages When Compared with Edman Degradation Sequence Analysis.

    PubMed

    Chang, Elizabeth; Pourmal, Sergei; Zhou, Chun; Kumar, Rupesh; Teplova, Marianna; Pavletich, Nikola P; Marians, Kenneth J; Erdjument-Bromage, Hediye

    2016-07-01

    In recent history, alternative approaches to Edman sequencing have been investigated, and to this end, the Association of Biomolecular Resource Facilities (ABRF) Protein Sequencing Research Group (PSRG) initiated studies in 2014 and 2015, looking into bottom-up and top-down N-terminal (Nt) dimethyl derivatization of standard quantities of intact proteins with the aim to determine Nt sequence information. We have expanded this initiative and used low picomole amounts of myoglobin to determine the efficiency of Nt-dimethylation. Application of this approach on protein domains, generated by limited proteolysis of overexpressed proteins, confirms that it is a universal labeling technique and is very sensitive when compared with Edman sequencing. Finally, we compared Edman sequencing and Nt-dimethylation of the same polypeptide fragments; results confirm that there is agreement in the identity of the Nt amino acid sequence between these 2 methods.

  9. De novo protein sequencing by combining top-down and bottom-up tandem mass spectra.

    PubMed

    Liu, Xiaowen; Dekker, Lennard J M; Wu, Si; Vanduijn, Martijn M; Luider, Theo M; Tolić, Nikola; Kou, Qiang; Dvorkin, Mikhail; Alexandrova, Sonya; Vyatkina, Kira; Paša-Tolić, Ljiljana; Pevzner, Pavel A

    2014-07-03

    There are two approaches for de novo protein sequencing: Edman degradation and mass spectrometry (MS). Existing MS-based methods characterize a novel protein by assembling tandem mass spectra of overlapping peptides generated from multiple proteolytic digestions of the protein. Because each tandem mass spectrum covers only a short peptide of the target protein, the key to high coverage protein sequencing is to find spectral pairs from overlapping peptides in order to assemble tandem mass spectra to long ones. However, overlapping regions of peptides may be too short to be confidently identified. High-resolution mass spectrometers have become accessible to many laboratories. These mass spectrometers are capable of analyzing molecules of large mass values, boosting the development of top-down MS. Top-down tandem mass spectra cover whole proteins. However, top-down tandem mass spectra, even combined, rarely provide full ion fragmentation coverage of a protein. We propose an algorithm, TBNovo, for de novo protein sequencing by combining top-down and bottom-up MS. In TBNovo, a top-down tandem mass spectrum is utilized as a scaffold, and bottom-up tandem mass spectra are aligned to the scaffold to increase sequence coverage. Experiments on data sets of two proteins showed that TBNovo achieved high sequence coverage and high sequence accuracy.

  10. Pfarao: a web application for protein family analysis customized for cytoskeletal and motor proteins (CyMoBase)

    PubMed Central

    Odronitz, Florian; Kollmar, Martin

    2006-01-01

    Background Annotation of protein sequences of eukaryotic organisms is crucial for the understanding of their function in the cell. Manual annotation is still by far the most accurate way to correctly predict genes. The classification of protein sequences, their phylogenetic relation and the assignment of function involves information from various sources. This often leads to a collection of heterogeneous data, which is hard to track. Cytoskeletal and motor proteins consist of large and diverse superfamilies comprising up to several dozen members per organism. Up to date there is no integrated tool available to assist in the manual large-scale comparative genomic analysis of protein families. Description Pfarao (Protein Family Application for Retrieval, Analysis and Organisation) is a database driven online working environment for the analysis of manually annotated protein sequences and their relationship. Currently, the system can store and interrelate a wide range of information about protein sequences, species, phylogenetic relations and sequencing projects as well as links to literature and domain predictions. Sequences can be imported from multiple sequence alignments that are generated during the annotation process. A web interface allows to conveniently browse the database and to compile tabular and graphical summaries of its content. Conclusion We implemented a protein sequence-centric web application to store, organize, interrelate, and present heterogeneous data that is generated in manual genome annotation and comparative genomics. The application has been developed for the analysis of cytoskeletal and motor proteins (CyMoBase) but can easily be adapted for any protein. PMID:17134497

  11. Selective excitation for spectral editing and assignment in separated local field experiments of oriented membrane proteins

    NASA Astrophysics Data System (ADS)

    Koroloff, Sophie N.; Nevzorov, Alexander A.

    2017-01-01

    Spectroscopic assignment of NMR spectra for oriented uniformly labeled membrane proteins embedded in their native-like bilayer environment is essential for their structure determination. However, sequence-specific assignment in oriented-sample (OS) NMR is often complicated by insufficient resolution and spectral crowding. Therefore, the assignment process is usually done by a laborious and expensive "shotgun" method involving multiple selective labeling of amino acid residues. Presented here is a strategy to overcome poor spectral resolution in crowded regions of 2D spectra by selecting resolved "seed" residues via soft Gaussian pulses inserted into spin-exchange separated local-field experiments. The Gaussian pulse places the selected polarization along the z-axis while dephasing the other signals before the evolution of the 1H-15N dipolar couplings. The transfer of magnetization is accomplished via mismatched Hartmann-Hahn conditions to the nearest-neighbor peaks via the proton bath. By optimizing the length and amplitude of the Gaussian pulse, one can also achieve a phase inversion of the closest peaks, thus providing an additional phase contrast. From the superposition of the selective spin-exchanged SAMPI4 onto the fully excited SAMPI4 spectrum, the 15N sites that are directly adjacent to the selectively excited residues can be easily identified, thereby providing a straightforward method for initiating the assignment process in oriented membrane proteins.

  12. Confirmation of the "protein-traffic-hypothesis" and the "protein-localization-hypothesis" using the diabetes-mellitus-type-1-knock-in and transgenic-murine-models and the trepitope sequences.

    PubMed

    Arneth, Borros

    2012-10-01

    As possible mechanisms to explain the emergence of autoimmune diseases, the current author has suggested in earlier papers two new pathways: the "protein localization hypothesis" and the "protein traffic hypothesis". The "protein localization hypothesis" states that an autoimmune disease develops if a protein accumulates in a previously unoccupied compartment, that did not previously contain that protein. Similarly, the "protein traffic hypothesis" states that a sudden error within the transport of a certain protein leads to the emergence of an autoimmune disease. The current article discusses the usefulness of the different commercially available transgenic murine models of diabetes mellitus type 1 to confirm the aforementioned hypotheses. This discussion shows that several transgenic murine models of diabetes mellitus type 1 are in-line and confirm the aforementioned hypotheses. Furthermore, these hypotheses are additionally inline with the occurrence of several newly discovered protein sequences, the so-called trepitope sequences. These sequences modulate the immune response to certain proteins. The current study analyzed to what extent the hypotheses are supported by the occurrence of these new sequences. Thereby the occurrence of the trepitope sequences provides additional evidence supporting the aforementioned hypotheses. Both the "protein localization hypothesis" and the "protein traffic hypothesis" have the potential to lead to new causal therapy concepts. The "protein localization hypothesis" and the "protein traffic hypothesis" provide conceptional explanations for the diabetes mouse models as well as for the newly discovered trepitope sequences. Copyright © 2012 Elsevier Ltd. All rights reserved.

  13. Dynamics of domain coverage of the protein sequence universe

    PubMed Central

    2012-01-01

    Background The currently known protein sequence space consists of millions of sequences in public databases and is rapidly expanding. Assigning sequences to families leads to a better understanding of protein function and the nature of the protein universe. However, a large portion of the current protein space remains unassigned and is referred to as its “dark matter”. Results Here we suggest that true size of “dark matter” is much larger than stated by current definitions. We propose an approach to reducing the size of “dark matter” by identifying and subtracting regions in protein sequences that are not likely to contain any domain. Conclusions Recent improvements in computational domain modeling result in a decrease, albeit slowly, in the relative size of “dark matter”; however, its absolute size increases substantially with the growth of sequence data. PMID:23157439

  14. Rational Protein Engineering Guided by Deep Mutational Scanning

    PubMed Central

    Shin, HyeonSeok; Cho, Byung-Kwan

    2015-01-01

    Sequence–function relationship in a protein is commonly determined by the three-dimensional protein structure followed by various biochemical experiments. However, with the explosive increase in the number of genome sequences, facilitated by recent advances in sequencing technology, the gap between protein sequences available and three-dimensional structures is rapidly widening. A recently developed method termed deep mutational scanning explores the functional phenotype of thousands of mutants via massive sequencing. Coupled with a highly efficient screening system, this approach assesses the phenotypic changes made by the substitution of each amino acid sequence that constitutes a protein. Such an informational resource provides the functional role of each amino acid sequence, thereby providing sufficient rationale for selecting target residues for protein engineering. Here, we discuss the current applications of deep mutational scanning and consider experimental design. PMID:26404267

  15. Chameleon sequences in neurodegenerative diseases.

    PubMed

    Bahramali, Golnaz; Goliaei, Bahram; Minuchehr, Zarrin; Salari, Ali

    2016-03-25

    Chameleon sequences can adopt either alpha helix sheet or a coil conformation. Defining chameleon sequences in PDB (Protein Data Bank) may yield to an insight on defining peptides and proteins responsible in neurodegeneration. In this research, we benefitted from the large PDB and performed a sequence analysis on Chameleons, where we developed an algorithm to extract peptide segments with identical sequences, but different structures. In order to find new chameleon sequences, we extracted a set of 8315 non-redundant protein sequences from the PDB with an identity less than 25%. Our data was classified to "helix to strand (HE)", "helix to coil (HC)" and "strand to coil (CE)" alterations. We also analyzed the occurrence of singlet and doublet amino acids and the solvent accessibility in the chameleon sequences; we then sorted out the proteins with the most number of chameleon sequences and named them Chameleon Flexible Proteins (CFPs) in our dataset. Our data revealed that Gly, Val, Ile, Tyr and Phe, are the major amino acids in Chameleons. We also found that there are proteins such as Insulin Degrading Enzyme IDE and GTP-binding nuclear protein Ran (RAN) with the most number of chameleons (640 and 405 respectively). These proteins have known roles in neurodegenerative diseases. Therefore it can be inferred that other CFP's can serve as key proteins in neurodegeneration, and a study on them can shed light on curing and preventing neurodegenerative diseases. Copyright © 2016 Elsevier Inc. All rights reserved.

  16. Chameleon sequences in neurodegenerative diseases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bahramali, Golnaz; Goliaei, Bahram, E-mail: goliaei@ut.ac.ir; Minuchehr, Zarrin, E-mail: minuchehr@nigeb.ac.ir

    2016-03-25

    Chameleon sequences can adopt either alpha helix sheet or a coil conformation. Defining chameleon sequences in PDB (Protein Data Bank) may yield to an insight on defining peptides and proteins responsible in neurodegeneration. In this research, we benefitted from the large PDB and performed a sequence analysis on Chameleons, where we developed an algorithm to extract peptide segments with identical sequences, but different structures. In order to find new chameleon sequences, we extracted a set of 8315 non-redundant protein sequences from the PDB with an identity less than 25%. Our data was classified to “helix to strand (HE)”, “helix tomore » coil (HC)” and “strand to coil (CE)” alterations. We also analyzed the occurrence of singlet and doublet amino acids and the solvent accessibility in the chameleon sequences; we then sorted out the proteins with the most number of chameleon sequences and named them Chameleon Flexible Proteins (CFPs) in our dataset. Our data revealed that Gly, Val, Ile, Tyr and Phe, are the major amino acids in Chameleons. We also found that there are proteins such as Insulin Degrading Enzyme IDE and GTP-binding nuclear protein Ran (RAN) with the most number of chameleons (640 and 405 respectively). These proteins have known roles in neurodegenerative diseases. Therefore it can be inferred that other CFP's can serve as key proteins in neurodegeneration, and a study on them can shed light on curing and preventing neurodegenerative diseases.« less

  17. Polyester Wax: A New Embedding Medium for the Histopathologic Study of Human Temporal Bones

    PubMed Central

    Merchant, Saumil N.; Burgess, Barbara; O'Malley, Jennifer; Jones, Diane; Adams, Joe C.

    2007-01-01

    Background Celloidin and paraffin are the two common embedding mediums used for histopathologic study of the human temporal bone by light microscopy. Although celloidin embedding permits excellent morphologic assessment, celloidin is difficult to remove, and there are significant restrictions on success with immunostaining. Embedding in paraffin allows immunostaining to be performed, but preservation of cellular detail within the membranous labyrinth is relatively poor. Objectives/Hypothesis Polyester wax is an embedding medium that has a low melting point (37°C), is soluble in most organic solvents, is water tolerant, and sections easily. We hypothesized that embedding in polyester wax would permit good preservation of the morphology of the membranous labyrinth and, at the same time, allow the study of proteins by immunostaining. Methods Nine temporal bones from individuals aged 1 to 94 years removed 2 to 31 hours postmortem, from subjects who had no history of otologic disease, were used. The bones were fixed using 10% formalin, decal-cified using EDTA, embedded in polyester wax, and serially sectioned at a thickness of 8 to 12 μm on a rotary microtome. The block and knife were cooled with frozen CO2 (dry ice) held in a funnel above the block. Sections were placed on glass slides coated with a solution of 1% fish gelatin and 1% bovine albumin, followed by staining of selected sections with hematoxylin and eosin (H&E). Immunostaining was also performed on selected sections using antibodies to 200 kD neurofilament and Na-K-ATPase. Results Polyester wax–embedded sections demonstrated good preservation of cellular detail of the organ of Corti and other structures of the membranous labyrinth, as well as the surrounding otic capsule. The protocol described in this paper was reliable and consistently yielded sections of good quality. Immuno-staining was successful with both antibodies. Conclusion The use of polyester wax as an embedding medium for human temporal bones offers the advantage of good preservation of morphology and ease of immunostaining. We anticipate that in the future, polyester wax embedding will also permit other molecular biologic assays on temporal bone sections such as the retrieval of nucleic acids and the study of proteins using mass spectrometry–based proteomic analysis. PMID:16467713

  18. Comparative analysis of ribosomal protein L5 sequences from bacteria of the genus Thermus.

    PubMed

    Jahn, O; Hartmann, R K; Boeckh, T; Erdmann, V A

    1991-06-01

    The genes for the ribosomal 5S rRNA binding protein L5 have been cloned from three extremely thermophilic eubacteria, Thermus flavus, Thermus thermophilus HB8 and Thermus aquaticus (Jahn et al, submitted). Genes for protein L5 from the three Thermus strains display 95% G/C in third positions of codons. Amino acid sequences deduced from the DNA sequence were shown to be identical for T flavus and T thermophilus, although the corresponding DNA sequences differed by two T to C transitions in the T thermophilus gene. Protein L5 sequences from T flavus and T thermophilus are 95% homologous to L5 from T aquaticus and 56.5% homologous to the corresponding E coli sequence. The lowest degrees of homology were found between the T flavus/T thermophilus L5 proteins and those of yeast L16 (27.5%), Halobacterium marismortui (34.0%) and Methanococcus vannielii (36.6%). From sequence comparison it becomes clear that thermostability of Thermus L5 proteins is achieved by an increase in hydrophobic interactions and/or by restriction of steric flexibility due to the introduction of amino acids with branched aliphatic side chains such as leucine. Alignment of the nine protein sequences equivalent to Thermus L5 proteins led to identification of a conserved internal segment, rich in acidic amino acids, which shows homology to subsequences of E coli L18 and L25. The occurrence of conserved sequence elements in 5S rRNA binding proteins and ribosomal proteins in general is discussed in terms of evolution and function.

  19. A method for partitioning the information contained in a protein sequence between its structure and function.

    PubMed

    Possenti, Andrea; Vendruscolo, Michele; Camilloni, Carlo; Tiana, Guido

    2018-05-23

    Proteins employ the information stored in the genetic code and translated into their sequences to carry out well-defined functions in the cellular environment. The possibility to encode for such functions is controlled by the balance between the amount of information supplied by the sequence and that left after that the protein has folded into its structure. We study the amount of information necessary to specify the protein structure, providing an estimate that keeps into account the thermodynamic properties of protein folding. We thus show that the information remaining in the protein sequence after encoding for its structure (the 'information gap') is very close to what needed to encode for its function and interactions. Then, by predicting the information gap directly from the protein sequence, we show that it may be possible to use these insights from information theory to discriminate between ordered and disordered proteins, to identify unknown functions, and to optimize artificially-designed protein sequences. This article is protected by copyright. All rights reserved. © 2018 Wiley Periodicals, Inc.

  20. FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues.

    PubMed

    El-Manzalawy, Yasser; Abbas, Mostafa; Malluhi, Qutaibah; Honavar, Vasant

    2016-01-01

    A wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses are mediated by RNA-protein interactions. However, experimental determination of the structures of protein-RNA complexes is expensive and technically challenging. Hence, a number of computational tools have been developed for predicting protein-RNA interfaces. Some of the state-of-the-art protein-RNA interface predictors rely on position-specific scoring matrix (PSSM)-based encoding of the protein sequences. The computational efforts needed for generating PSSMs severely limits the practical utility of protein-RNA interface prediction servers. In this work, we experiment with two approaches, random sampling and sequence similarity reduction, for extracting a representative reference database of protein sequences from more than 50 million protein sequences in UniRef100. Our results suggest that random sampled databases produce better PSSM profiles (in terms of the number of hits used to generate the profile and the distance of the generated profile to the corresponding profile generated using the entire UniRef100 data as well as the accuracy of the machine learning classifier trained using these profiles). Based on our results, we developed FastRNABindR, an improved version of RNABindR for predicting protein-RNA interface residues using PSSM profiles generated using 1% of the UniRef100 sequences sampled uniformly at random. To the best of our knowledge, FastRNABindR is the only protein-RNA interface residue prediction online server that requires generation of PSSM profiles for query sequences and accepts hundreds of protein sequences per submission. Our approach for determining the optimal BLAST database for a protein-RNA interface residue classification task has the potential of substantially speeding up, and hence increasing the practical utility of, other amino acid sequence based predictors of protein-protein and protein-DNA interfaces.

  1. A novel class of plant-specific zinc-dependent DNA-binding protein that binds to A/T-rich DNA sequences

    PubMed Central

    Nagano, Yukio; Furuhashi, Hirofumi; Inaba, Takehito; Sasaki, Yukiko

    2001-01-01

    Complementary DNA encoding a DNA-binding protein, designated PLATZ1 (plant AT-rich sequence- and zinc-binding protein 1), was isolated from peas. The amino acid sequence of the protein is similar to those of other uncharacterized proteins predicted from the genome sequences of higher plants. However, no paralogous sequences have been found outside the plant kingdom. Multiple alignments among these paralogous proteins show that several cysteine and histidine residues are invariant, suggesting that these proteins are a novel class of zinc-dependent DNA-binding proteins with two distantly located regions, C-x2-H-x11-C-x2-C-x(4–5)-C-x2-C-x(3–7)-H-x2-H and C-x2-C-x(10–11)-C-x3-C. In an electrophoretic mobility shift assay, the zinc chelator 1,10-o-phenanthroline inhibited DNA binding, and two distant zinc-binding regions were required for DNA binding. A protein blot with 65ZnCl2 showed that both regions are required for zinc-binding activity. The PLATZ1 protein non-specifically binds to A/T-rich sequences, including the upstream region of the pea GTPase pra2 and plastocyanin petE genes. Expression of the PLATZ1 repressed those of the reporter constructs containing the coding sequence of luciferase gene driven by the cauliflower mosaic virus (CaMV) 35S90 promoter fused to the tandem repeat of the A/T-rich sequences. These results indicate that PLATZ1 is a novel class of plant-specific zinc-dependent DNA-binding protein responsible for A/T-rich sequence-mediated transcriptional repression. PMID:11600698

  2. Design of Protein Multi-specificity Using an Independent Sequence Search Reduces the Barrier to Low Energy Sequences

    PubMed Central

    Sevy, Alexander M.; Jacobs, Tim M.; Crowe, James E.; Meiler, Jens

    2015-01-01

    Computational protein design has found great success in engineering proteins for thermodynamic stability, binding specificity, or enzymatic activity in a ‘single state’ design (SSD) paradigm. Multi-specificity design (MSD), on the other hand, involves considering the stability of multiple protein states simultaneously. We have developed a novel MSD algorithm, which we refer to as REstrained CONvergence in multi-specificity design (RECON). The algorithm allows each state to adopt its own sequence throughout the design process rather than enforcing a single sequence on all states. Convergence to a single sequence is encouraged through an incrementally increasing convergence restraint for corresponding positions. Compared to MSD algorithms that enforce (constrain) an identical sequence on all states the energy landscape is simplified, which accelerates the search drastically. As a result, RECON can readily be used in simulations with a flexible protein backbone. We have benchmarked RECON on two design tasks. First, we designed antibodies derived from a common germline gene against their diverse targets to assess recovery of the germline, polyspecific sequence. Second, we design “promiscuous”, polyspecific proteins against all binding partners and measure recovery of the native sequence. We show that RECON is able to efficiently recover native-like, biologically relevant sequences in this diverse set of protein complexes. PMID:26147100

  3. Healthy older adults demonstrate generalized postural motor learning in response to variable amplitude oscillations of the support surface

    PubMed Central

    Van Ooteghem, Karen; Frank, James S.; Allard, Fran; Horak, Fay B

    2011-01-01

    Postural motor learning for dynamic balance tasks has been demonstrated in healthy older adults (Van Ooteghem et al. 2009). The purpose of this study was to investigate the type of knowledge (general or specific) obtained with balance training in this age group and to examine whether embedding perturbation regularities within a balance task masks specific learning. Two groups of older adults maintained balance on a constant frequency-variable amplitude oscillating platform. One group was trained using an embedded sequence (ES) protocol which contained the same 15-s sequence of variable amplitude oscillations in the middle of each trial. A second group was trained using a looped sequence (LS) protocol which contained a 15-s sequence repeated three times to form each trial. All trials were 45-s. Participants were not informed of any repetition. To examine learning, participants performed a retention test following a 24-h delay. LS participants also completed a transfer task. Specificity of learning was examined by comparing performance for repeated versus random sequences (ES) and training versus transfer sequences (LS). Performance was measured by deriving spatial and temporal measures of whole body centre of mass (COM), and trunk orientation. Both groups improved performance with practice as characterized by reduced COM displacement, improved COM-platform phase relationships, and decreased angular trunk motion. Improvements were also characterized by general rather than specific postural motor learning. These findings are similar to young adults (Van Ooteghem et al. 2008) and indicate that age does not influence the type of learning which occurs for balance control. PMID:20544184

  4. Multiplex picoliter-droplet digital PCR for quantitative assessment of DNA integrity in clinical samples.

    PubMed

    Didelot, Audrey; Kotsopoulos, Steve K; Lupo, Audrey; Pekin, Deniz; Li, Xinyu; Atochin, Ivan; Srinivasan, Preethi; Zhong, Qun; Olson, Jeff; Link, Darren R; Laurent-Puig, Pierre; Blons, Hélène; Hutchison, J Brian; Taly, Valerie

    2013-05-01

    Assessment of DNA integrity and quantity remains a bottleneck for high-throughput molecular genotyping technologies, including next-generation sequencing. In particular, DNA extracted from paraffin-embedded tissues, a major potential source of tumor DNA, varies widely in quality, leading to unpredictable sequencing data. We describe a picoliter droplet-based digital PCR method that enables simultaneous detection of DNA integrity and the quantity of amplifiable DNA. Using a multiplex assay, we detected 4 different target lengths (78, 159, 197, and 550 bp). Assays were validated with human genomic DNA fragmented to sizes of 170 bp to 3000 bp. The technique was validated with DNA quantities as low as 1 ng. We evaluated 12 DNA samples extracted from paraffin-embedded lung adenocarcinoma tissues. One sample contained no amplifiable DNA. The fractions of amplifiable DNA for the 11 other samples were between 0.05% and 10.1% for 78-bp fragments and ≤1% for longer fragments. Four samples were chosen for enrichment and next-generation sequencing. The quality of the sequencing data was in agreement with the results of the DNA-integrity test. Specifically, DNA with low integrity yielded sequencing results with lower levels of coverage and uniformity and had higher levels of false-positive variants. The development of DNA-quality assays will enable researchers to downselect samples or process more DNA to achieve reliable genome sequencing with the highest possible efficiency of cost and effort, as well as minimize the waste of precious samples. © 2013 American Association for Clinical Chemistry.

  5. Quantum Fragment Based ab Initio Molecular Dynamics for Proteins.

    PubMed

    Liu, Jinfeng; Zhu, Tong; Wang, Xianwei; He, Xiao; Zhang, John Z H

    2015-12-08

    Developing ab initio molecular dynamics (AIMD) methods for practical application in protein dynamics is of significant interest. Due to the large size of biomolecules, applying standard quantum chemical methods to compute energies for dynamic simulation is computationally prohibitive. In this work, a fragment based ab initio molecular dynamics approach is presented for practical application in protein dynamics study. In this approach, the energy and forces of the protein are calculated by a recently developed electrostatically embedded generalized molecular fractionation with conjugate caps (EE-GMFCC) method. For simulation in explicit solvent, mechanical embedding is introduced to treat protein interaction with explicit water molecules. This AIMD approach has been applied to MD simulations of a small benchmark protein Trpcage (with 20 residues and 304 atoms) in both the gas phase and in solution. Comparison to the simulation result using the AMBER force field shows that the AIMD gives a more stable protein structure in the simulation, indicating that quantum chemical energy is more reliable. Importantly, the present fragment-based AIMD simulation captures quantum effects including electrostatic polarization and charge transfer that are missing in standard classical MD simulations. The current approach is linear-scaling, trivially parallel, and applicable to performing the AIMD simulation of proteins with a large size.

  6. Genetics Home Reference: CLN7 disease

    MedlinePlus

    ... unknown. The MFSD8 protein is embedded in the membrane of cell compartments called lysosomes , which digest and recycle different types of molecules. Based on the structure of the protein, MFSD8 probably transports molecules across the lysosomal membrane, but the specific molecules it moves have not ...

  7. Overproduction, purification, and ATPase activity of the Escherichia coli RuvB protein involved in DNA repair.

    PubMed Central

    Iwasaki, H; Shiba, T; Makino, K; Nakata, A; Shinagawa, H

    1989-01-01

    The ruvA and ruvB genes of Escherichia coli constitute an operon which belongs to the SOS regulon. Genetic evidence suggests that the products of the ruv operon are involved in DNA repair and recombination. To begin biochemical characterization of these proteins, we developed a plasmid system that overproduced RuvB protein to 20% of total cell protein. Starting from the overproducing system, we purified RuvB protein. The purified RuvB protein behaved like a monomer in gel filtration chromatography and had an apparent relative molecular mass of 38 kilodaltons in sodium dodecyl sulfate-polyacrylamide gel electrophoresis, which agrees with the value predicted from the DNA sequence. The amino acid sequence of the amino-terminal region of the purified protein was analyzed, and the sequence agreed with the one deduced from the DNA sequence. Since the deduced sequence of RuvB protein contained the consensus sequence for ATP-binding proteins, we examined the ATP-binding and ATPase activities of the purified RuvB protein. RuvB protein had a stronger affinity to ADP than to ATP and weak ATPase activity. The results suggest that the weak ATPase activity of RuvB protein is at least partly due to end product inhibition by ADP. Images PMID:2529252

  8. A combined de novo protein sequencing and cDNA library approach to the venomic analysis of Chinese spider Araneus ventricosus.

    PubMed

    Duan, Zhigui; Cao, Rui; Jiang, Liping; Liang, Songping

    2013-01-14

    In past years, spider venoms have attracted increasing attention due to their extraordinary chemical and pharmacological diversity. The recently popularized proteomic method highly improved our ability to analyze the proteins in the venom. However, the lack of information about isolated venom proteins sequences dramatically limits the ability to confidently identify venom proteins. In the present paper, the venom from Araneus ventricosus was analyzed using two complementary approaches: 2-DE/Shotgun-LC-MS/MS coupled to MASCOT search and 2-DE/Shotgun-LC-MS/MS coupled to manual de novo sequencing followed by local venom protein database (LVPD) search. The LVPD was constructed with toxin-like protein sequences obtained from the analysis of cDNA library from A. ventricosus venom glands. Our results indicate that a total of 130 toxin-like protein sequences were unambiguously identified by manual de novo sequencing coupled to LVPD search, accounting for 86.67% of all toxin-like proteins in LVPD. Thus manual de novo sequencing coupled to LVPD search was proved an extremely effective approach for the analysis of venom proteins. In addition, the approach displays impeccable advantage in validating mutant positions of isoforms from the same toxin-like family. Intriguingly, methyl esterifcation of glutamic acid was discovered for the first time in animal venom proteins by manual de novo sequencing. Crown Copyright © 2012. Published by Elsevier B.V. All rights reserved.

  9. Speculation detection for Chinese clinical notes: Impacts of word segmentation and embedding models.

    PubMed

    Zhang, Shaodian; Kang, Tian; Zhang, Xingting; Wen, Dong; Elhadad, Noémie; Lei, Jianbo

    2016-04-01

    Speculations represent uncertainty toward certain facts. In clinical texts, identifying speculations is a critical step of natural language processing (NLP). While it is a nontrivial task in many languages, detecting speculations in Chinese clinical notes can be particularly challenging because word segmentation may be necessary as an upstream operation. The objective of this paper is to construct a state-of-the-art speculation detection system for Chinese clinical notes and to investigate whether embedding features and word segmentations are worth exploiting toward this overall task. We propose a sequence labeling based system for speculation detection, which relies on features from bag of characters, bag of words, character embedding, and word embedding. We experiment on a novel dataset of 36,828 clinical notes with 5103 gold-standard speculation annotations on 2000 notes, and compare the systems in which word embeddings are calculated based on word segmentations given by general and by domain specific segmenters respectively. Our systems are able to reach performance as high as 92.2% measured by F score. We demonstrate that word segmentation is critical to produce high quality word embedding to facilitate downstream information extraction applications, and suggest that a domain dependent word segmenter can be vital to such a clinical NLP task in Chinese language. Copyright © 2016 Elsevier Inc. All rights reserved.

  10. CDSbank: taxonomy-aware extraction, selection, renaming and formatting of protein-coding DNA or amino acid sequences.

    PubMed

    Hazes, Bart

    2014-02-28

    Protein-coding DNA sequences and their corresponding amino acid sequences are routinely used to study relationships between sequence, structure, function, and evolution. The rapidly growing size of sequence databases increases the power of such comparative analyses but it makes it more challenging to prepare high quality sequence data sets with control over redundancy, quality, completeness, formatting, and labeling. Software tools for some individual steps in this process exist but manual intervention remains a common and time consuming necessity. CDSbank is a database that stores both the protein-coding DNA sequence (CDS) and amino acid sequence for each protein annotated in Genbank. CDSbank also stores Genbank feature annotation, a flag to indicate incomplete 5' and 3' ends, full taxonomic data, and a heuristic to rank the scientific interest of each species. This rich information allows fully automated data set preparation with a level of sophistication that aims to meet or exceed manual processing. Defaults ensure ease of use for typical scenarios while allowing great flexibility when needed. Access is via a free web server at http://hazeslab.med.ualberta.ca/CDSbank/. CDSbank presents a user-friendly web server to download, filter, format, and name large sequence data sets. Common usage scenarios can be accessed via pre-programmed default choices, while optional sections give full control over the processing pipeline. Particular strengths are: extract protein-coding DNA sequences just as easily as amino acid sequences, full access to taxonomy for labeling and filtering, awareness of incomplete sequences, and the ability to take one protein sequence and extract all synonymous CDS or identical protein sequences in other species. Finally, CDSbank can also create labeled property files to, for instance, annotate or re-label phylogenetic trees.

  11. Assigning protein functions by comparative genome analysis protein phylogenetic profiles

    DOEpatents

    Pellegrini, Matteo; Marcotte, Edward M.; Thompson, Michael J.; Eisenberg, David; Grothe, Robert; Yeates, Todd O.

    2003-05-13

    A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.

  12. PASS2: an automated database of protein alignments organised as structural superfamilies.

    PubMed

    Bhaduri, Anirban; Pugalenthi, Ganesan; Sowdhamini, Ramanathan

    2004-04-02

    The functional selection and three-dimensional structural constraints of proteins in nature often relates to the retention of significant sequence similarity between proteins of similar fold and function despite poor sequence identity. Organization of structure-based sequence alignments for distantly related proteins, provides a map of the conserved and critical regions of the protein universe that is useful for the analysis of folding principles, for the evolutionary unification of protein families and for maximizing the information return from experimental structure determination. The Protein Alignment organised as Structural Superfamily (PASS2) database represents continuously updated, structural alignments for evolutionary related, sequentially distant proteins. An automated and updated version of PASS2 is, in direct correspondence with SCOP 1.63, consisting of sequences having identity below 40% among themselves. Protein domains have been grouped into 628 multi-member superfamilies and 566 single member superfamilies. Structure-based sequence alignments for the superfamilies have been obtained using COMPARER, while initial equivalencies have been derived from a preliminary superposition using LSQMAN or STAMP 4.0. The final sequence alignments have been annotated for structural features using JOY4.0. The database is supplemented with sequence relatives belonging to different genomes, conserved spatially interacting and structural motifs, probabilistic hidden markov models of superfamilies based on the alignments and useful links to other databases. Probabilistic models and sensitive position specific profiles obtained from reliable superfamily alignments aid annotation of remote homologues and are useful tools in structural and functional genomics. PASS2 presents the phylogeny of its members both based on sequence and structural dissimilarities. Clustering of members allows us to understand diversification of the family members. The search engine has been improved for simpler browsing of the database. The database resolves alignments among the structural domains consisting of evolutionarily diverged set of sequences. Availability of reliable sequence alignments of distantly related proteins despite poor sequence identity and single-member superfamilies permit better sampling of structures in libraries for fold recognition of new sequences and for the understanding of protein structure-function relationships of individual superfamilies. PASS2 is accessible at http://www.ncbs.res.in/~faculty/mini/campass/pass2.html

  13. Protein Science by DNA Sequencing: How Advances in Molecular Biology Are Accelerating Biochemistry.

    PubMed

    Higgins, Sean A; Savage, David F

    2018-01-09

    A fundamental goal of protein biochemistry is to determine the sequence-function relationship, but the vastness of sequence space makes comprehensive evaluation of this landscape difficult. However, advances in DNA synthesis and sequencing now allow researchers to assess the functional impact of every single mutation in many proteins, but challenges remain in library construction and the development of general assays applicable to a diverse range of protein functions. This Perspective briefly outlines the technical innovations in DNA manipulation that allow massively parallel protein biochemistry and then summarizes the methods currently available for library construction and the functional assays of protein variants. Areas in need of future innovation are highlighted with a particular focus on assay development and the use of computational analysis with machine learning to effectively traverse the sequence-function landscape. Finally, applications in the fundamentals of protein biochemistry, disease prediction, and protein engineering are presented.

  14. Injectable hydrogels embedded with alginate microspheres for controlled delivery of bone morphogenetic protein-2.

    PubMed

    Zhu, Youjia; Wang, Jiulong; Wu, Jingjing; Zhang, Jun; Wan, Ying; Wu, Hua

    2016-03-23

    Some delivery carriers with injectable characteristics were built using the thermosensitive chitosan/dextran-polylactide/glycerophosphate hydrogel and selected alginate microspheres for the controllable release of bone morphogenetic protein-2 (BMP-2). BMP-2 was first loaded into the microspheres with an average size of around 20 μm and the resulting microspheres were then embedded into the gel in order to achieve well-controlled BMP-2 release. The microsphere-embedded gels show their incipient gelation temperature at around 32 °C and pH at about 7.1. Some gels had their elastic modulus close to 1400 Pa and the ratio of elastic modulus to viscous modulus at around 34, revealing that they behaved like mechanically strong gels. Optimized microsphere-embedded gels were found to be able to administer the BMP-2 release without significant initial burst release in an approximately linear manner over a period of time longer than four weeks. The release rate and the released amount of BMP-2 from these gels could be regulated individually or cooperatively by the initial BMP-2 load and the dextran-polylactide content in the gels. Measurements of the BMP-2 induced alkaline phosphatase activity in C2C12 cells confirmed that C2C12 cells responded to BMP-2 in a dose-dependent way and the released BMP-2 from the microsphere-embedded gels well retained their bioactivity. In vivo assessment of some gels revealed that the released BMP-2 maintained its osteogenesis functions.

  15. Investigation of protein adsorption performance of Ni2+-attached diatomite particles embedded in composite monolithic cryogels.

    PubMed

    Ünlü, Nuri; Ceylan, Şeyda; Erzengin, Mahmut; Odabaşı, Mehmet

    2011-08-01

    As a low-cost natural adsorbent, diatomite (DA) (2 μm) has several advantages including high surface area, chemical reactivity, hydrophilicity and lack of toxicity. In this study, the protein adsorption performance of supermacroporous composite cryogels embedded with Ni(2+)-attached DA particles (Ni(2+)-ADAPs) was investigated. Supermacroporous poly(2-hydroxyethyl methacrylate) (PHEMA)-based monolithic composite cryogel column embedded with Ni(2+)-ADAPs was prepared by radical cryo-copolymerization of 2-hydroxyethyl methacrylate (HEMA) with N,N'-methylene-bis-acrylamide (MBAAm) as cross-linker directly in a plastic syringe for affinity purification of human serum albumin (HSA) both from aqueous solutions and human serum. The chemical composition and surface area of DA was determined by XRF and BET method, respectively. The characterization of composite cryogel was investigated by SEM. The effect of pH, and embedded Ni(2+)-ADAPs amount, initial HSA concentration, temperature and flow rate on adsorption were studied. The maximum amount of HSA adsorption from aqueous solution at pH 8.0 phosphate buffer was very high (485.15 mg/g DA). It was observed that HSA could be repeatedly adsorbed and desorbed to the embedded Ni(2+)-ADAPs in poly(2-hydroxyethyl methacrylate) composite cryogel without significant loss of adsorption capacity. The efficiency of albumin adsorption from human serum before and after albumin adsorption was also investigated with SDS-PAGE analyses. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  16. Direct Formalin Fixation Induces Widespread Genomic Effects in Archival Tissues

    EPA Science Inventory

    Recent advances in next generation sequencing have dramatically improved transcriptional analysis of degraded RNA from formalin-fixed paraffin-embedded (FFPE) samples. However, little is known about potential genomic artifacts induced by formalin fixation, which could affect toxi...

  17. Computational analysis of sequence selection mechanisms.

    PubMed

    Meyerguz, Leonid; Grasso, Catherine; Kleinberg, Jon; Elber, Ron

    2004-04-01

    Mechanisms leading to gene variations are responsible for the diversity of species and are important components of the theory of evolution. One constraint on gene evolution is that of protein foldability; the three-dimensional shapes of proteins must be thermodynamically stable. We explore the impact of this constraint and calculate properties of foldable sequences using 3660 structures from the Protein Data Bank. We seek a selection function that receives sequences as input, and outputs survival probability based on sequence fitness to structure. We compute the number of sequences that match a particular protein structure with energy lower than the native sequence, the density of the number of sequences, the entropy, and the "selection" temperature. The mechanism of structure selection for sequences longer than 200 amino acids is approximately universal. For shorter sequences, it is not. We speculate on concrete evolutionary mechanisms that show this behavior.

  18. Metallic-nanoparticles-enhanced fluorescence from individual micron-sized aerosol particles on-the-fly.

    PubMed

    Sivaprakasam, Vasanthi; Hart, Matthew B; Jain, Vaibhav; Eversole, Jay D

    2014-08-11

    Fluorescence spectra from individual aerosol particles that were either coated or embedded with metallic nanoparticles (MNPs) was acquired on-the-fly using 266 nm and 355 nm excitation. Using aqueous suspensions of MNPs with either polystyrene latex (PSL) spheres or dissolved proteins (tryptophan or ovalbumin), we generated PSL spheres coated with MNPs, or protein clusters embedded with MNPs as aerosols. Both enhanced and quenched fluorescence intensities were observed as a function of MNP concentration. Optimizing MNP material, size and spacing should yield enhanced sensitivity for specific aerosol materials that could be exploited to improve detection limits of single-particle, on-the-fly fluorescence or Raman based spectroscopic sensors.

  19. Sequence-similar, structure-dissimilar protein pairs in the PDB.

    PubMed

    Kosloff, Mickey; Kolodny, Rachel

    2008-05-01

    It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which "redundant" structures have been removed, based on a sequence-based criterion for similarity. Similarly, when predicting protein structure using homology modeling, if a template structure for modeling a target sequence is selected by sequence alone, this implicitly assumes that all sequence-similar templates are equivalent. Here, we show that this assumption is often not correct and that standard approaches to create subsets of the PDB can lead to the loss of structurally and functionally important information. We have carried out sequence-based structural superpositions and geometry-based structural alignments of a large number of protein pairs to determine the extent to which sequence similarity ensures structural similarity. We find many examples where two proteins that are similar in sequence have structures that differ significantly from one another. The source of the structural differences usually has a functional basis. The number of such proteins pairs that are identified and the magnitude of the dissimilarity depend on the approach that is used to calculate the differences; in particular sequence-based structure superpositioning will identify a larger number of structurally dissimilar pairs than geometry-based structural alignments. When two sequences can be aligned in a statistically meaningful way, sequence-based structural superpositioning provides a meaningful measure of structural differences. This approach and geometry-based structure alignments reveal somewhat different information and one or the other might be preferable in a given application. Our results suggest that in some cases, notably homology modeling, the common use of nonredundant datasets, culled from the PDB based on sequence, may mask important structural and functional information. We have established a data base of sequence-similar, structurally dissimilar protein pairs that will help address this problem (http://luna.bioc.columbia.edu/rachel/seqsimstrdiff.htm).

  20. Evaluating the efficacy of a structure-derived amino acid substitution matrix in detecting protein homologs by BLAST and PSI-BLAST.

    PubMed

    Goonesekere, Nalin Cw

    2009-01-01

    The large numbers of protein sequences generated by whole genome sequencing projects require rapid and accurate methods of annotation. The detection of homology through computational sequence analysis is a powerful tool in determining the complex evolutionary and functional relationships that exist between proteins. Homology search algorithms employ amino acid substitution matrices to detect similarity between proteins sequences. The substitution matrices in common use today are constructed using sequences aligned without reference to protein structure. Here we present amino acid substitution matrices constructed from the alignment of a large number of protein domain structures from the structural classification of proteins (SCOP) database. We show that when incorporated into the homology search algorithms BLAST and PSI-blast, the structure-based substitution matrices enhance the efficacy of detecting remote homologs.

  1. Sequencing proteins with transverse ionic transport in nanochannels.

    PubMed

    Boynton, Paul; Di Ventra, Massimiliano

    2016-05-03

    De novo protein sequencing is essential for understanding cellular processes that govern the function of living organisms and all sequence modifications that occur after a protein has been constructed from its corresponding DNA code. By obtaining the order of the amino acids that compose a given protein one can then determine both its secondary and tertiary structures through structure prediction, which is used to create models for protein aggregation diseases such as Alzheimer's Disease. Here, we propose a new technique for de novo protein sequencing that involves translocating a polypeptide through a synthetic nanochannel and measuring the ionic current of each amino acid through an intersecting perpendicular nanochannel. We find that the distribution of ionic currents for each of the 20 proteinogenic amino acids encoded by eukaryotic genes is statistically distinct, showing this technique's potential for de novo protein sequencing.

  2. miRNA-embedded shRNAs for Lineage-specific BCL11A Knockdown and Hemoglobin F Induction

    PubMed Central

    Guda, Swaroopa; Brendel, Christian; Renella, Raffaele; Du, Peng; Bauer, Daniel E; Canver, Matthew C; Grenier, Jennifer K; Grimson, Andrew W; Kamran, Sophia C; Thornton, James; de Boer, Helen; Root, David E; Milsom, Michael D; Orkin, Stuart H; Gregory, Richard I; Williams, David A

    2015-01-01

    RNA interference (RNAi) technology using short hairpin RNAs (shRNAs) expressed via RNA polymerase (pol) III promoters has been widely exploited to modulate gene expression in a variety of mammalian cell types. For certain applications, such as lineage-specific knockdown, embedding targeting sequences into pol II-driven microRNA (miRNA) architecture is required. Here, using the potential therapeutic target BCL11A, we demonstrate that pol III-driven shRNAs lead to significantly increased knockdown but also increased cytotoxcity in comparison to pol II-driven miRNA adapted shRNAs (shRNAmiR) in multiple hematopoietic cell lines. We show that the two expression systems yield mature guide strand sequences that differ by a 4 bp shift. This results in alternate seed sequences and consequently influences the efficacy of target gene knockdown. Incorporating a corresponding 4 bp shift into the guide strand of shRNAmiRs resulted in improved knockdown efficiency of BCL11A. This was associated with a significant de-repression of the hemoglobin target of BCL11A, human γ-globin or the murine homolog Hbb-y. Our results suggest the requirement for optimization of shRNA sequences upon incorporation into a miRNA backbone. These findings have important implications in future design of shRNAmiRs for RNAi-based therapy in hemoglobinopathies and other diseases requiring lineage-specific expression of gene silencing sequences. PMID:26080908

  3. Evolution of EF-hand calcium-modulated proteins. III. Exon sequences confirm most dendrograms based on protein sequences: calmodulin dendrograms show significant lack of parallelism

    NASA Technical Reports Server (NTRS)

    Nakayama, S.; Kretsinger, R. H.

    1993-01-01

    In the first report in this series we presented dendrograms based on 152 individual proteins of the EF-hand family. In the second we used sequences from 228 proteins, containing 835 domains, and showed that eight of the 29 subfamilies are congruent and that the EF-hand domains of the remaining 21 subfamilies have diverse evolutionary histories. In this study we have computed dendrograms within and among the EF-hand subfamilies using the encoding DNA sequences. In most instances the dendrograms based on protein and on DNA sequences are very similar. Significant differences between protein and DNA trees for calmodulin remain unexplained. In our fourth report we evaluate the sequences and the distribution of introns within the EF-hand family and conclude that exon shuffling did not play a significant role in its evolution.

  4. Application of 2D graphic representation of protein sequence based on Huffman tree method.

    PubMed

    Qi, Zhao-Hui; Feng, Jun; Qi, Xiao-Qin; Li, Ling

    2012-05-01

    Based on Huffman tree method, we propose a new 2D graphic representation of protein sequence. This representation can completely avoid loss of information in the transfer of data from a protein sequence to its graphic representation. The method consists of two parts. One is about the 0-1 codes of 20 amino acids by Huffman tree with amino acid frequency. The amino acid frequency is defined as the statistical number of an amino acid in the analyzed protein sequences. The other is about the 2D graphic representation of protein sequence based on the 0-1 codes. Then the applications of the method on ten ND5 genes and seven Escherichia coli strains are presented in detail. The results show that the proposed model may provide us with some new sights to understand the evolution patterns determined from protein sequences and complete genomes. Copyright © 2012 Elsevier Ltd. All rights reserved.

  5. The hypervariable region 1 protein of hepatitis C virus broadly reactive with sera of patients with chronic hepatitis C has a similar amino acid sequence with the consensus sequence.

    PubMed

    Watanabe, K; Yoshioka, K; Ito, H; Ishigami, M; Takagi, K; Utsunomiya, S; Kobayashi, M; Kishimoto, H; Yano, M; Kakumu, S

    1999-11-10

    Hypervariable region 1 (HVR1) proteins of hepatitis C virus (HCV) have been reported to react broadly with sera of patients with HCV infection. However, the variability of the broad reactivity of individual HVR1 proteins has not been elucidated. We assessed the reactivity of 25 different HVR1 proteins (genotype 1b) with sera of 81 patients with HCV infection (genotype 1b) by Western blot. HVR1 proteins reacted with 2-60 sera. The number of sera reactive with each HVR1 protein significantly correlated with the number of amino acid residues identical to the consensus sequence defined by Puntoriero et al. (G. Puntoriero, A. Lahm, S. Zucchelli, B. B. Ercole, R. Tafi, M. Penzzanera, M. U. Mondelli, R. Cortese, A. Tramontano, G. Galfre', and A. Nicosia. 1998. EMBO J. 17, 3521-3533. ) (r = 0.561, P < 0.005). The most widely reactive HVR1 protein, 12-22, had a sequence similar to the consensus sequence. The peptide with C-terminal 13-amino-acids sequence of HVR1 protein 12-22 (NH2-CSFTSLFTPGPSQK) was injected into rabbits as an immunogen. The rabbit immune sera reacted with 9 of 25 HVR1 proteins of genotype 1b including HVR1 protein 12-22 and with 3 of 12 proteins of genotype 2a. These results indicate that the HVR1 protein broadly reactive with patients' sera has a sequence similar to the consensus sequence, can induce broadly reactive sera, and could be one of the candidate immunogens in a prophylactic vaccine against HCV. Copyright 1999 Academic Press.

  6. The eukaryotic signal sequence, YGRL, targets the chlamydial inclusion

    PubMed Central

    Kabeiseman, Emily J.; Cichos, Kyle H.; Moore, Elizabeth R.

    2014-01-01

    Understanding how host proteins are targeted to pathogen-specified organelles, like the chlamydial inclusion, is fundamentally important to understanding the biogenesis of these unique subcellular compartments and how they maintain autonomy within the cell. Syntaxin 6, which localizes to the chlamydial inclusion, contains an YGRL signal sequence. The YGRL functions to return syntaxin 6 to the trans-Golgi from the plasma membrane, and deletion of the YGRL signal sequence from syntaxin 6 also prevents the protein from localizing to the chlamydial inclusion. YGRL is one of three YXXL (YGRL, YQRL, and YKGL) signal sequences which target proteins to the trans-Golgi. We designed various constructs of eukaryotic proteins to test the specificity and propensity of YXXL sequences to target the inclusion. The YGRL signal sequence redirects proteins (e.g., Tgn38, furin, syntaxin 4) that normally do not localize to the chlamydial inclusion. Further, the requirement of the YGRL signal sequence for syntaxin 6 localization to inclusions formed by different species of Chlamydia is conserved. These data indicate that there is an inherent property of the chlamydial inclusion, which allows it to recognize the YGRL signal sequence. To examine whether this “inherent property” was protein or lipid in nature, we asked if deletion of the YGRL signal sequence from syntaxin 6 altered the ability of the protein to interact with proteins or lipids. Deletion or alteration of the YGRL from syntaxin 6 does not appreciably impact syntaxin 6-protein interactions, but does decrease syntaxin 6-lipid interactions. Intriguingly, data also demonstrate that YKGL or YQRL can successfully substitute for YGRL in localization of syntaxin 6 to the chlamydial inclusion. Importantly and for the first time, we are establishing that a eukaryotic signal sequence targets the chlamydial inclusion. PMID:25309881

  7. Different evolutionary patterns of SNPs between domains and unassigned regions in human protein-coding sequences.

    PubMed

    Pang, Erli; Wu, Xiaomei; Lin, Kui

    2016-06-01

    Protein evolution plays an important role in the evolution of each genome. Because of their functional nature, in general, most of their parts or sites are differently constrained selectively, particularly by purifying selection. Most previous studies on protein evolution considered individual proteins in their entirety or compared protein-coding sequences with non-coding sequences. Less attention has been paid to the evolution of different parts within each protein of a given genome. To this end, based on PfamA annotation of all human proteins, each protein sequence can be split into two parts: domains or unassigned regions. Using this rationale, single nucleotide polymorphisms (SNPs) in protein-coding sequences from the 1000 Genomes Project were mapped according to two classifications: SNPs occurring within protein domains and those within unassigned regions. With these classifications, we found: the density of synonymous SNPs within domains is significantly greater than that of synonymous SNPs within unassigned regions; however, the density of non-synonymous SNPs shows the opposite pattern. We also found there are signatures of purifying selection on both the domain and unassigned regions. Furthermore, the selective strength on domains is significantly greater than that on unassigned regions. In addition, among all of the human protein sequences, there are 117 PfamA domains in which no SNPs are found. Our results highlight an important aspect of protein domains and may contribute to our understanding of protein evolution.

  8. A Method to Evaluate Genome-Wide Methylation in Archival Formalin-Fixed, Paraffin-Embedded Ovarian Epithelial Cells

    PubMed Central

    Li, Qiling; Li, Min; Ma, Li; Li, Wenzhi; Wu, Xuehong; Richards, Jendai; Fu, Guoxing; Xu, Wei; Bythwood, Tameka; Li, Xu; Wang, Jianxin; Song, Qing

    2014-01-01

    Background The use of DNA from archival formalin and paraffin embedded (FFPE) tissue for genetic and epigenetic analyses may be problematic, since the DNA is often degraded and only limited amounts may be available. Thus, it is currently not known whether genome-wide methylation can be reliably assessed in DNA from archival FFPE tissue. Methodology/Principal Findings Ovarian tissues, which were obtained and formalin-fixed and paraffin-embedded in either 1999 or 2011, were sectioned and stained with hematoxylin-eosin (H&E).Epithelial cells were captured by laser micro dissection, and their DNA subjected to whole genomic bisulfite conversion, whole genomic polymerase chain reaction (PCR) amplification, and purification. Sequencing and software analyses were performed to identify the extent of genomic methylation. We observed that 31.7% of sequence reads from the DNA in the 1999 archival FFPE tissue, and 70.6% of the reads from the 2011 sample, could be matched with the genome. Methylation rates of CpG on the Watson and Crick strands were 32.2% and 45.5%, respectively, in the 1999 sample, and 65.1% and 42.7% in the 2011 sample. Conclusions/Significance We have developed an efficient method that allows DNA methylation to be assessed in archival FFPE tissue samples. PMID:25133528

  9. A Data Hiding Technique to Synchronously Embed Physiological Signals in H.264/AVC Encoded Video for Medicine Healthcare.

    PubMed

    Peña, Raul; Ávila, Alfonso; Muñoz, David; Lavariega, Juan

    2015-01-01

    The recognition of clinical manifestations in both video images and physiological-signal waveforms is an important aid to improve the safety and effectiveness in medical care. Physicians can rely on video-waveform (VW) observations to recognize difficult-to-spot signs and symptoms. The VW observations can also reduce the number of false positive incidents and expand the recognition coverage to abnormal health conditions. The synchronization between the video images and the physiological-signal waveforms is fundamental for the successful recognition of the clinical manifestations. The use of conventional equipment to synchronously acquire and display the video-waveform information involves complex tasks such as the video capture/compression, the acquisition/compression of each physiological signal, and the video-waveform synchronization based on timestamps. This paper introduces a data hiding technique capable of both enabling embedding channels and synchronously hiding samples of physiological signals into encoded video sequences. Our data hiding technique offers large data capacity and simplifies the complexity of the video-waveform acquisition and reproduction. The experimental results revealed successful embedding and full restoration of signal's samples. Our results also demonstrated a small distortion in the video objective quality, a small increment in bit-rate, and embedded cost savings of -2.6196% for high and medium motion video sequences.

  10. Prediction of Spontaneous Protein Deamidation from Sequence-Derived Secondary Structure and Intrinsic Disorder.

    PubMed

    Lorenzo, J Ramiro; Alonso, Leonardo G; Sánchez, Ignacio E

    2015-01-01

    Asparagine residues in proteins undergo spontaneous deamidation, a post-translational modification that may act as a molecular clock for the regulation of protein function and turnover. Asparagine deamidation is modulated by protein local sequence, secondary structure and hydrogen bonding. We present NGOME, an algorithm able to predict non-enzymatic deamidation of internal asparagine residues in proteins in the absence of structural data, using sequence-based predictions of secondary structure and intrinsic disorder. Compared to previous algorithms, NGOME does not require three-dimensional structures yet yields better predictions than available sequence-only methods. Four case studies of specific proteins show how NGOME may help the user identify deamidation-prone asparagine residues, often related to protein gain of function, protein degradation or protein misfolding in pathological processes. A fifth case study applies NGOME at a proteomic scale and unveils a correlation between asparagine deamidation and protein degradation in yeast. NGOME is freely available as a webserver at the National EMBnet node Argentina, URL: http://www.embnet.qb.fcen.uba.ar/ in the subpage "Protein and nucleic acid structure and sequence analysis".

  11. AlignMe—a membrane protein sequence alignment web server

    PubMed Central

    Stamm, Marcus; Staritzbichler, René; Khafizov, Kamil; Forrest, Lucy R.

    2014-01-01

    We present a web server for pair-wise alignment of membrane protein sequences, using the program AlignMe. The server makes available two operational modes of AlignMe: (i) sequence to sequence alignment, taking two sequences in fasta format as input, combining information about each sequence from multiple sources and producing a pair-wise alignment (PW mode); and (ii) alignment of two multiple sequence alignments to create family-averaged hydropathy profile alignments (HP mode). For the PW sequence alignment mode, four different optimized parameter sets are provided, each suited to pairs of sequences with a specific similarity level. These settings utilize different types of inputs: (position-specific) substitution matrices, secondary structure predictions and transmembrane propensities from transmembrane predictions or hydrophobicity scales. In the second (HP) mode, each input multiple sequence alignment is converted into a hydrophobicity profile averaged over the provided set of sequence homologs; the two profiles are then aligned. The HP mode enables qualitative comparison of transmembrane topologies (and therefore potentially of 3D folds) of two membrane proteins, which can be useful if the proteins have low sequence similarity. In summary, the AlignMe web server provides user-friendly access to a set of tools for analysis and comparison of membrane protein sequences. Access is available at http://www.bioinfo.mpg.de/AlignMe PMID:24753425

  12. Evolutionary Dynamics on Protein Bi-stability Landscapes can Potentially Resolve Adaptive Conflicts

    PubMed Central

    Sikosek, Tobias; Bornberg-Bauer, Erich; Chan, Hue Sun

    2012-01-01

    Experimental studies have shown that some proteins exist in two alternative native-state conformations. It has been proposed that such bi-stable proteins can potentially function as evolutionary bridges at the interface between two neutral networks of protein sequences that fold uniquely into the two different native conformations. Under adaptive conflict scenarios, bi-stable proteins may be of particular advantage if they simultaneously provide two beneficial biological functions. However, computational models that simulate protein structure evolution do not yet recognize the importance of bi-stability. Here we use a biophysical model to analyze sequence space to identify bi-stable or multi-stable proteins with two or more equally stable native-state structures. The inclusion of such proteins enhances phenotype connectivity between neutral networks in sequence space. Consideration of the sequence space neighborhood of bridge proteins revealed that bi-stability decreases gradually with each mutation that takes the sequence further away from an exactly bi-stable protein. With relaxed selection pressures, we found that bi-stable proteins in our model are highly successful under simulated adaptive conflict. Inspired by these model predictions, we developed a method to identify real proteins in the PDB with bridge-like properties, and have verified a clear bi-stability gradient for a series of mutants studied by Alexander et al. (Proc Nat Acad Sci USA 2009, 106:21149–21154) that connect two sequences that fold uniquely into two different native structures via a bridge-like intermediate mutant sequence. Based on these findings, new testable predictions for future studies on protein bi-stability and evolution are discussed. PMID:23028272

  13. Specific minor groove solvation is a crucial determinant of DNA binding site recognition

    PubMed Central

    Harris, Lydia-Ann; Williams, Loren Dean; Koudelka, Gerald B.

    2014-01-01

    The DNA sequence preferences of nearly all sequence specific DNA binding proteins are influenced by the identities of bases that are not directly contacted by protein. Discrimination between non-contacted base sequences is commonly based on the differential abilities of DNA sequences to allow narrowing of the DNA minor groove. However, the factors that govern the propensity of minor groove narrowing are not completely understood. Here we show that the differential abilities of various DNA sequences to support formation of a highly ordered and stable minor groove solvation network are a key determinant of non-contacted base recognition by a sequence-specific binding protein. In addition, disrupting the solvent network in the non-contacted region of the binding site alters the protein's ability to recognize contacted base sequences at positions 5–6 bases away. This observation suggests that DNA solvent interactions link contacted and non-contacted base recognition by the protein. PMID:25429976

  14. MIPS: a database for protein sequences and complete genomes.

    PubMed Central

    Mewes, H W; Hani, J; Pfeiffer, F; Frishman, D

    1998-01-01

    The MIPS group [Munich Information Center for Protein Sequences of the German National Center for Environment and Health (GSF)] at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, is involved in a number of data collection activities, including a comprehensive database of the yeast genome, a database reflecting the progress in sequencing the Arabidopsis thaliana genome, the systematic analysis of other small genomes and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). Through its WWW server (http://www.mips.biochem.mpg.de ) MIPS provides access to a variety of generic databases, including a database of protein families as well as automatically generated data by the systematic application of sequence analysis algorithms. The yeast genome sequence and its related information was also compiled on CD-ROM to provide dynamic interactive access to the 16 chromosomes of the first eukaryotic genome unraveled. PMID:9399795

  15. Protein evolution analysis of S-hydroxynitrile lyase by complete sequence design utilizing the INTMSAlign software.

    PubMed

    Nakano, Shogo; Asano, Yasuhisa

    2015-02-03

    Development of software and methods for design of complete sequences of functional proteins could contribute to studies of protein engineering and protein evolution. To this end, we developed the INTMSAlign software, and used it to design functional proteins and evaluate their usefulness. The software could assign both consensus and correlation residues of target proteins. We generated three protein sequences with S-selective hydroxynitrile lyase (S-HNL) activity, which we call designed S-HNLs; these proteins folded as efficiently as the native S-HNL. Sequence and biochemical analysis of the designed S-HNLs suggested that accumulation of neutral mutations occurs during the process of S-HNLs evolution from a low-activity form to a high-activity (native) form. Taken together, our results demonstrate that our software and the associated methods could be applied not only to design of complete sequences, but also to predictions of protein evolution, especially within families such as esterases and S-HNLs.

  16. Protein evolution analysis of S-hydroxynitrile lyase by complete sequence design utilizing the INTMSAlign software

    NASA Astrophysics Data System (ADS)

    Nakano, Shogo; Asano, Yasuhisa

    2015-02-01

    Development of software and methods for design of complete sequences of functional proteins could contribute to studies of protein engineering and protein evolution. To this end, we developed the INTMSAlign software, and used it to design functional proteins and evaluate their usefulness. The software could assign both consensus and correlation residues of target proteins. We generated three protein sequences with S-selective hydroxynitrile lyase (S-HNL) activity, which we call designed S-HNLs; these proteins folded as efficiently as the native S-HNL. Sequence and biochemical analysis of the designed S-HNLs suggested that accumulation of neutral mutations occurs during the process of S-HNLs evolution from a low-activity form to a high-activity (native) form. Taken together, our results demonstrate that our software and the associated methods could be applied not only to design of complete sequences, but also to predictions of protein evolution, especially within families such as esterases and S-HNLs.

  17. New in protein structure and function annotation: hotspots, single nucleotide polymorphisms and the 'Deep Web'.

    PubMed

    Bromberg, Yana; Yachdav, Guy; Ofran, Yanay; Schneider, Reinhard; Rost, Burkhard

    2009-05-01

    The rapidly increasing quantity of protein sequence data continues to widen the gap between available sequences and annotations. Comparative modeling suggests some aspects of the 3D structures of approximately half of all known proteins; homology- and network-based inferences annotate some aspect of function for a similar fraction of the proteome. For most known protein sequences, however, there is detailed knowledge about neither their function nor their structure. Comprehensive efforts towards the expert curation of sequence annotations have failed to meet the demand of the rapidly increasing number of available sequences. Only the automated prediction of protein function in the absence of homology can close the gap between available sequences and annotations in the foreseeable future. This review focuses on two novel methods for automated annotation, and briefly presents an outlook on how modern web software may revolutionize the field of protein sequence annotation. First, predictions of protein binding sites and functional hotspots, and the evolution of these into the most successful type of prediction of protein function from sequence will be discussed. Second, a new tool, comprehensive in silico mutagenesis, which contributes important novel predictions of function and at the same time prepares for the onset of the next sequencing revolution, will be described. While these two new sub-fields of protein prediction represent the breakthroughs that have been achieved methodologically, it will then be argued that a different development might further change the way biomedical researchers benefit from annotations: modern web software can connect the worldwide web in any browser with the 'Deep Web' (ie, proprietary data resources). The availability of this direct connection, and the resulting access to a wealth of data, may impact drug discovery and development more than any existing method that contributes to protein annotation.

  18. The generation of meaningful information in molecular systems.

    PubMed

    Wills, Peter R

    2016-03-13

    The physico-chemical processes occurring inside cells are under the computational control of genetic (DNA) and epigenetic (internal structural) programming. The origin and evolution of genetic information (nucleic acid sequences) is reasonably well understood, but scant attention has been paid to the origin and evolution of the molecular biological interpreters that give phenotypic meaning to the sequence information that is quite faithfully replicated during cellular reproduction. The near universality and age of the mapping from nucleotide triplets to amino acids embedded in the functionality of the protein synthetic machinery speaks to the early development of a system of coding which is still extant in every living organism. We take the origin of genetic coding as a paradigm of the emergence of computation in natural systems, focusing on the requirement that the molecular components of an interpreter be synthesized autocatalytically. Within this context, it is seen that interpreters of increasing complexity are generated by series of transitions through stepped dynamic instabilities (non-equilibrium phase transitions). The early phylogeny of the amino acyl-tRNA synthetase enzymes is discussed in such terms, leading to the conclusion that the observed optimality of the genetic code is a natural outcome of the processes of self-organization that produced it. © 2016 The Author(s).

  19. Quantitative Analysis of Signaling Networks across Differentially Embedded Tumors Highlights Interpatient Heterogeneity in Human Glioblastoma

    PubMed Central

    2015-01-01

    Glioblastoma multiforme (GBM) is the most aggressive malignant primary brain tumor, with a dismal mean survival even with the current standard of care. Although in vitro cell systems can provide mechanistic insight into the regulatory networks governing GBM cell proliferation and migration, clinical samples provide a more physiologically relevant view of oncogenic signaling networks. However, clinical samples are not widely available and may be embedded for histopathologic analysis. With the goal of accurately identifying activated signaling networks in GBM tumor samples, we investigated the impact of embedding in optimal cutting temperature (OCT) compound followed by flash freezing in LN2 vs immediate flash freezing (iFF) in LN2 on protein expression and phosphorylation-mediated signaling networks. Quantitative proteomic and phosphoproteomic analysis of 8 pairs of tumor specimens revealed minimal impact of the different sample processing strategies and highlighted the large interpatient heterogeneity present in these tumors. Correlation analyses of the differentially processed tumor sections identified activated signaling networks present in selected tumors and revealed the differential expression of transcription, translation, and degradation associated proteins. This study demonstrates the capability of quantitative mass spectrometry for identification of in vivo oncogenic signaling networks from human tumor specimens that were either OCT-embedded or immediately flash-frozen. PMID:24927040

  20. The first complete chloroplast genome of the Genistoid legume Lupinus luteus: evidence for a novel major lineage-specific rearrangement and new insights regarding plastome evolution in the legume family

    PubMed Central

    Martin, Guillaume E.; Rousseau-Gueutin, Mathieu; Cordonnier, Solenn; Lima, Oscar; Michon-Coudouel, Sophie; Naquin, Delphine; de Carvalho, Julie Ferreira; Aïnouche, Malika; Salmon, Armel; Aïnouche, Abdelkader

    2014-01-01

    Background and Aims To date chloroplast genomes are available only for members of the non-protein amino acid-accumulating clade (NPAAA) Papilionoid lineages in the legume family (i.e. Millettioids, Robinoids and the ‘inverted repeat-lacking clade’, IRLC). It is thus very important to sequence plastomes from other lineages in order to better understand the unusual evolution observed in this model flowering plant family. To this end, the plastome of a lupine species, Lupinus luteus, was sequenced to represent the Genistoid lineage, a noteworthy but poorly studied legume group. Methods The plastome of L. luteus was reconstructed using Roche-454 and Illumina next-generation sequencing. Its structure, repetitive sequences, gene content and sequence divergence were compared with those of other Fabaceae plastomes. PCR screening and sequencing were performed in other allied legumes in order to determine the origin of a large inversion identified in L. luteus. Key Results The first sequenced Genistoid plastome (L. luteus: 155 894 bp) resulted in the discovery of a 36-kb inversion, embedded within the already known 50-kb inversion in the large single-copy (LSC) region of the Papilionoideae. This inversion occurs at the base or soon after the Genistoid emergence, and most probably resulted from a flip–flop recombination between identical 29-bp inverted repeats within two trnS genes. Comparative analyses of the chloroplast gene content of L. luteus vs. Fabaceae and extra-Fabales plastomes revealed the loss of the plastid rpl22 gene, and its functional relocation to the nucleus was verified using lupine transcriptomic data. An investigation into the evolutionary rate of coding and non-coding sequences among legume plastomes resulted in the identification of remarkably variable regions. Conclusions This study resulted in the discovery of a novel, major 36-kb inversion, specific to the Genistoids. Chloroplast mutational hotspots were also identified, which contain novel and potentially informative regions for molecular evolutionary studies at various taxonomic levels in the legumes. Taken together, the results provide new insights into the evolutionary landscape of the legume plastome. PMID:24769537

  1. CombAlign: a code for generating a one-to-many sequence alignment from a set of pairwise structure-based sequence alignments.

    PubMed

    Zhou, Carol L Ecale

    2015-01-01

    In order to better define regions of similarity among related protein structures, it is useful to identify the residue-residue correspondences among proteins. Few codes exist for constructing a one-to-many multiple sequence alignment derived from a set of structure or sequence alignments, and a need was evident for creating such a tool for combining pairwise structure alignments that would allow for insertion of gaps in the reference structure. This report describes a new Python code, CombAlign, which takes as input a set of pairwise sequence alignments (which may be structure based) and generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA). The use and utility of CombAlign was demonstrated by generating gapped MSSAs using sets of pairwise structure-based sequence alignments between structure models of the matrix protein (VP40) and pre-small/secreted glycoprotein (sGP) of Reston Ebolavirus and the corresponding proteins of several other filoviruses. The gapped MSSAs revealed structure-based residue-residue correspondences, which enabled identification of structurally similar versus differing regions in the Reston proteins compared to each of the other corresponding proteins. CombAlign is a new Python code that generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA) given a set of pairwise sequence alignments (which may be structure based). CombAlign has utility in assisting the user in distinguishing structurally conserved versus divergent regions on a reference protein structure relative to other closely related proteins. CombAlign was developed in Python 2.6, and the source code is available for download from the GitHub code repository.

  2. A Novel Cylindrical Representation for Characterizing Intrinsic Properties of Protein Sequences.

    PubMed

    Yu, Jia-Feng; Dou, Xiang-Hua; Wang, Hong-Bo; Sun, Xiao; Zhao, Hui-Ying; Wang, Ji-Hua

    2015-06-22

    The composition and sequence order of amino acid residues are the two most important characteristics to describe a protein sequence. Graphical representations facilitate visualization of biological sequences and produce biologically useful numerical descriptors. In this paper, we propose a novel cylindrical representation by placing the 20 amino acid residue types in a circle and sequence positions along the z axis. This representation allows visualization of the composition and sequence order of amino acids at the same time. Ten numerical descriptors and one weighted numerical descriptor have been developed to quantitatively describe intrinsic properties of protein sequences on the basis of the cylindrical model. Their applications to similarity/dissimilarity analysis of nine ND5 proteins indicated that these numerical descriptors are more effective than several classical numerical matrices. Thus, the cylindrical representation obtained here provides a new useful tool for visualizing and charactering protein sequences. An online server is available at http://biophy.dzu.edu.cn:8080/CNumD/input.jsp .

  3. Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification.

    PubMed

    Sinclair, Robert M; Ravantti, Janne J; Bamford, Dennis H

    2017-04-15

    Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly, we detected similarity at the nucleotide level between capsid protein-coding regions from viruses infecting cells belonging to all three domains of life, reproducing a previously established structure-based classification of icosahedral viral capsids. Copyright © 2017 Sinclair et al.

  4. Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification

    PubMed Central

    Sinclair, Robert M.; Ravantti, Janne J.

    2017-01-01

    ABSTRACT Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly, we detected similarity at the nucleotide level between capsid protein-coding regions from viruses infecting cells belonging to all three domains of life, reproducing a previously established structure-based classification of icosahedral viral capsids. PMID:28122979

  5. Predicting residue-wise contact orders in proteins by support vector regression.

    PubMed

    Song, Jiangning; Burrage, Kevin

    2006-10-03

    The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.

  6. Cloning and expression of recombinant adhesive protein Mefp-1 of the blue mussel, Mytilus edulis

    DOEpatents

    Silverman, Heather G.; Roberto, Francisco F.

    2006-01-17

    The present invention comprises a Mytilus edulis cDNA sequenc having a nucleotide sequence that encodes for the Mytilus edulis foot protein-1 (Mefp-1), an example of a mollusk foot protein. Mefp-1 is an integral component of the blue mussels' adhesive protein complex, which allows the mussel to attach to objects underwater. The isolation, purification and sequencing of the Mefp-1 gene will allow researchers to produce Mefp-1 protein using genetic engineering techniques. The discovery of Mefp-1 gene sequence will also allow scientists to better understand how the blue mussel creates its waterproof adhesive protein complex.

  7. Determination of the sequences of protein-derived peptides and peptide mixtures by mass spectrometry

    PubMed Central

    Morris, Howard R.; Williams, Dudley H.; Ambler, Richard P.

    1971-01-01

    Micro-quantities of protein-derived peptides have been converted into N-acetylated permethyl derivatives, and their sequences determined by low-resolution mass spectrometry without prior knowledge of their amino acid compositions or lengths. A new strategy is suggested for the mass spectrometric sequencing of oligopeptides or proteins, involving gel filtration of protein hydrolysates and subsequent sequence analysis of peptide mixtures. Finally, results are given that demonstrate for the first time the use of mass spectrometry for the analysis of a protein-derived peptide mixture, again without prior knowledge of the protein or components within the mixture. PMID:5158904

  8. Identification of Sequence Specificity of 5-Methylcytosine Oxidation by Tet1 Protein with High-Throughput Sequencing.

    PubMed

    Kizaki, Seiichiro; Chandran, Anandhakumar; Sugiyama, Hiroshi

    2016-03-02

    Tet (ten-eleven translocation) family proteins have the ability to oxidize 5-methylcytosine (mC) to 5-hydroxymethylcytosine (hmC), 5-formylcytosine (fC), and 5-carboxycytosine (caC). However, the oxidation reaction of Tet is not understood completely. Evaluation of genomic-level epigenetic changes by Tet protein requires unbiased identification of the highly selective oxidation sites. In this study, we used high-throughput sequencing to investigate the sequence specificity of mC oxidation by Tet1. A 6.6×10(4) -member mC-containing random DNA-sequence library was constructed. The library was subjected to Tet-reactive pulldown followed by high-throughput sequencing. Analysis of the obtained sequence data identified the Tet1-reactive sequences. We identified mCpG as a highly reactive sequence of Tet1 protein. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  9. Primary structures of ribosomal proteins from the archaebacterium Halobacterium marismortui and the eubacterium Bacillus stearothermophilus.

    PubMed

    Arndt, E; Scholzen, T; Krömer, W; Hatakeyama, T; Kimura, M

    1991-06-01

    Approximately 40 ribosomal proteins from each Halobacterium marismortui and Bacillus stearothermophilus have been sequenced either by direct protein sequence analysis or by DNA sequence analysis of the appropriate genes. The comparison of the amino acid sequences from the archaebacterium H marismortui with the available ribosomal proteins from the eubacterial and eukaryotic kingdoms revealed four different groups of proteins: 24 proteins are related to both eubacterial as well as eukaryotic proteins. Eleven proteins are exclusively related to eukaryotic counterparts. For three proteins only eubacterial relatives-and for another three proteins no counterpart-could be found. The similarities of the halobacterial ribosomal proteins are in general somewhat higher to their eukaryotic than to their eubacterial counterparts. The comparison of B stearothermophilus proteins with their E coli homologues showed that the proteins evolved at different rates. Some proteins are highly conserved with 64-76% identity, others are poorly conserved with only 25-34% identical amino acid residues.

  10. HomPPI: a class of sequence homology based protein-protein interface prediction methods

    PubMed Central

    2011-01-01

    Background Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate. Results We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence. Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i) NPS-HomPPI (Non partner-specific HomPPI), which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii) PS-HomPPI (Partner-specific HomPPI), which can be used to predict the interface residues of a query protein with a specific target protein. Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC) of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of both the query and the target can be reliably identified. The HomPPI web server is available at http://homppi.cs.iastate.edu/. Conclusions Sequence homology-based methods offer a class of computationally efficient and reliable approaches for predicting the protein-protein interface residues that participate in either obligate or transient interactions. For query proteins involved in transient interactions, the reliability of interface residue prediction can be improved by exploiting knowledge of putative interaction partners. PMID:21682895

  11. Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure.

    PubMed

    Song, Jiangning; Yuan, Zheng; Tan, Hao; Huber, Thomas; Burrage, Kevin

    2007-12-01

    Disulfide bonds are primary covalent crosslinks between two cysteine residues in proteins that play critical roles in stabilizing the protein structures and are commonly found in extracy-toplasmatic or secreted proteins. In protein folding prediction, the localization of disulfide bonds can greatly reduce the search in conformational space. Therefore, there is a great need to develop computational methods capable of accurately predicting disulfide connectivity patterns in proteins that could have potentially important applications. We have developed a novel method to predict disulfide connectivity patterns from protein primary sequence, using a support vector regression (SVR) approach based on multiple sequence feature vectors and predicted secondary structure by the PSIPRED program. The results indicate that our method could achieve a prediction accuracy of 74.4% and 77.9%, respectively, when averaged on proteins with two to five disulfide bridges using 4-fold cross-validation, measured on the protein and cysteine pair on a well-defined non-homologous dataset. We assessed the effects of different sequence encoding schemes on the prediction performance of disulfide connectivity. It has been shown that the sequence encoding scheme based on multiple sequence feature vectors coupled with predicted secondary structure can significantly improve the prediction accuracy, thus enabling our method to outperform most of other currently available predictors. Our work provides a complementary approach to the current algorithms that should be useful in computationally assigning disulfide connectivity patterns and helps in the annotation of protein sequences generated by large-scale whole-genome projects. The prediction web server and Supplementary Material are accessible at http://foo.maths.uq.edu.au/~huber/disulfide

  12. Cloning and sequence analysis of a cDNA clone coding for the mouse GM2 activator protein.

    PubMed Central

    Bellachioma, G; Stirling, J L; Orlacchio, A; Beccari, T

    1993-01-01

    A cDNA (1.1 kb) containing the complete coding sequence for the mouse GM2 activator protein was isolated from a mouse macrophage library using a cDNA for the human protein as a probe. There was a single ATG located 12 bp from the 5' end of the cDNA clone followed by an open reading frame of 579 bp. Northern blot analysis of mouse macrophage RNA showed that there was a single band with a mobility corresponding to a size of 2.3 kb. We deduce from this that the mouse mRNA, in common with the mRNA for the human GM2 activator protein, has a long 3' untranslated sequence of approx. 1.7 kb. Alignment of the mouse and human deduced amino acid sequences showed 68% identity overall and 75% identity for the sequence on the C-terminal side of the first 31 residues, which in the human GM2 activator protein contains the signal peptide. Hydropathicity plots showed great similarity between the mouse and human sequences even in regions of low sequence similarity. There is a single N-glycosylation site in the mouse GM2 activator protein sequence (Asn151-Phe-Thr) which differs in its location from the single site reported in the human GM2 activator protein sequence (Asn63-Val-Thr). Images Figure 1 PMID:7689829

  13. Isolation and determination of the primary structure of a lectin protein from the serum of the American alligator (Alligator mississippiensis).

    PubMed

    Darville, Lancia N F; Merchant, Mark E; Maccha, Venkata; Siddavarapu, Vivekananda Reddy; Hasan, Azeem; Murray, Kermit K

    2012-02-01

    Mass spectrometry in conjunction with de novo sequencing was used to determine the amino acid sequence of a 35kDa lectin protein isolated from the serum of the American alligator that exhibits binding to mannose. The protein N-terminal sequence was determined using Edman degradation and enzymatic digestion with different proteases was used to generate peptide fragments for analysis by liquid chromatography tandem mass spectrometry (LC MS/MS). Separate analysis of the protein digests with multiple enzymes enhanced the protein sequence coverage. De novo sequencing was accomplished using MASCOT Distiller and PEAKS software and the sequences were searched against the NCBI database using MASCOT and BLAST to identify homologous peptides. MS analysis of the intact protein indicated that it is present primarily as monomer and dimer in vitro. The isolated 35kDa protein was ~98% sequenced and found to have 313 amino acids and nine cysteine residues and was identified as an alligator lectin. The alligator lectin sequence was aligned with other lectin sequences using DIALIGN and ClustalW software and was found to exhibit 58% and 59% similarity to both human and mouse intelectin-1. The alligator lectin exhibited strong binding affinities toward mannan and mannose as compared to other tested carbohydrates. Copyright © 2011 Elsevier Inc. All rights reserved.

  14. Lipidic cubic phase injector facilitates membrane protein serial femtosecond crystallography.

    PubMed

    Weierstall, Uwe; James, Daniel; Wang, Chong; White, Thomas A; Wang, Dingjie; Liu, Wei; Spence, John C H; Bruce Doak, R; Nelson, Garrett; Fromme, Petra; Fromme, Raimund; Grotjohann, Ingo; Kupitz, Christopher; Zatsepin, Nadia A; Liu, Haiguang; Basu, Shibom; Wacker, Daniel; Han, Gye Won; Katritch, Vsevolod; Boutet, Sébastien; Messerschmidt, Marc; Williams, Garth J; Koglin, Jason E; Marvin Seibert, M; Klinker, Markus; Gati, Cornelius; Shoeman, Robert L; Barty, Anton; Chapman, Henry N; Kirian, Richard A; Beyerlein, Kenneth R; Stevens, Raymond C; Li, Dianfan; Shah, Syed T A; Howe, Nicole; Caffrey, Martin; Cherezov, Vadim

    2014-01-01

    Lipidic cubic phase (LCP) crystallization has proven successful for high-resolution structure determination of challenging membrane proteins. Here we present a technique for extruding gel-like LCP with embedded membrane protein microcrystals, providing a continuously renewed source of material for serial femtosecond crystallography. Data collected from sub-10-μm-sized crystals produced with less than 0.5 mg of purified protein yield structural insights regarding cyclopamine binding to the Smoothened receptor.

  15. Evolution of EF-hand calcium-modulated proteins. IV. Exon shuffling did not determine the domain compositions of EF-hand proteins

    NASA Technical Reports Server (NTRS)

    Kretsinger, R. H.; Nakayama, S.

    1993-01-01

    In the previous three reports in this series we demonstrated that the EF-hand family of proteins evolved by a complex pattern of gene duplication, transposition, and splicing. The dendrograms based on exon sequences are nearly identical to those based on protein sequences for troponin C, the essential light chain myosin, the regulatory light chain, and calpain. This validates both the computational methods and the dendrograms for these subfamilies. The proposal of congruence for calmodulin, troponin C, essential light chain, and regulatory light chain was confirmed. There are, however, significant differences in the calmodulin dendrograms computed from DNA and from protein sequences. In this study we find that introns are distributed throughout the EF-hand domain and the interdomain regions. Further, dendrograms based on intron type and distribution bear little resemblance to those based on protein or on DNA sequences. We conclude that introns are inserted, and probably deleted, with relatively high frequency. Further, in the EF-hand family exons do not correspond to structural domains and exon shuffling played little if any role in the evolution of this widely distributed homolog family. Calmodulin has had a turbulent evolution. Its dendrograms based on protein sequence, exon sequence, 3'-tail sequence, intron sequences, and intron positions all show significant differences.

  16. Microwave-assisted acid and base hydrolysis of intact proteins containing disulfide bonds for protein sequence analysis by mass spectrometry.

    PubMed

    Reiz, Bela; Li, Liang

    2010-09-01

    Controlled hydrolysis of proteins to generate peptide ladders combined with mass spectrometric analysis of the resultant peptides can be used for protein sequencing. In this paper, two methods of improving the microwave-assisted protein hydrolysis process are described to enable rapid sequencing of proteins containing disulfide bonds and increase sequence coverage, respectively. It was demonstrated that proteins containing disulfide bonds could be sequenced by MS analysis by first performing hydrolysis for less than 2 min, followed by 1 h of reduction to release the peptides originally linked by disulfide bonds. It was shown that a strong base could be used as a catalyst for microwave-assisted protein hydrolysis, producing complementary sequence information to that generated by microwave-assisted acid hydrolysis. However, using either acid or base hydrolysis, amide bond breakages in small regions of the polypeptide chains of the model proteins (e.g., cytochrome c and lysozyme) were not detected. Dynamic light scattering measurement of the proteins solubilized in an acid or base indicated that protein-protein interaction or aggregation was not the cause of the failure to hydrolyze certain amide bonds. It was speculated that there were some unknown local structures that might play a role in preventing an acid or base from reacting with the peptide bonds therein. 2010 American Society for Mass Spectrometry. Published by Elsevier Inc. All rights reserved.

  17. Comparative analysis of the prion protein gene sequences in African lion.

    PubMed

    Wu, Chang-De; Pang, Wan-Yong; Zhao, De-Ming

    2006-10-01

    The prion protein gene of African lion (Panthera Leo) was first cloned and polymorphisms screened. The results suggest that the prion protein gene of eight African lions is highly homogenous. The amino acid sequences of the prion protein (PrP) of all samples tested were identical. Four single nucleotide polymorphisms (C42T, C81A, C420T, T600C) in the prion protein gene (Prnp) of African lion were found, but no amino acid substitutions. Sequence analysis showed that the higher homology is observed to felis catus AF003087 (96.7%) and to sheep number M31313.1 (96.2%) Genbank accessed. With respect to all the mammalian prion protein sequences compared, the African lion prion protein sequence has three amino acid substitutions. The homology might in turn affect the potential intermolecular interactions critical for cross species transmission of prion disease.

  18. Predicting the tolerated sequences for proteins and protein interfaces using RosettaBackrub flexible backbone design.

    PubMed

    Smith, Colin A; Kortemme, Tanja

    2011-01-01

    Predicting the set of sequences that are tolerated by a protein or protein interface, while maintaining a desired function, is useful for characterizing protein interaction specificity and for computationally designing sequence libraries to engineer proteins with new functions. Here we provide a general method, a detailed set of protocols, and several benchmarks and analyses for estimating tolerated sequences using flexible backbone protein design implemented in the Rosetta molecular modeling software suite. The input to the method is at least one experimentally determined three-dimensional protein structure or high-quality model. The starting structure(s) are expanded or refined into a conformational ensemble using Monte Carlo simulations consisting of backrub backbone and side chain moves in Rosetta. The method then uses a combination of simulated annealing and genetic algorithm optimization methods to enrich for low-energy sequences for the individual members of the ensemble. To emphasize certain functional requirements (e.g. forming a binding interface), interactions between and within parts of the structure (e.g. domains) can be reweighted in the scoring function. Results from each backbone structure are merged together to create a single estimate for the tolerated sequence space. We provide an extensive description of the protocol and its parameters, all source code, example analysis scripts and three tests applying this method to finding sequences predicted to stabilize proteins or protein interfaces. The generality of this method makes many other applications possible, for example stabilizing interactions with small molecules, DNA, or RNA. Through the use of within-domain reweighting and/or multistate design, it may also be possible to use this method to find sequences that stabilize particular protein conformations or binding interactions over others.

  19. Spectroscopic and electrochemical characterization of cytochrome c encapsulated in a bio sol-gel matrix.

    PubMed

    Deriu, Daniela; Pagnotta, Sara Emanuela; Santucci, Roberto; Rosato, Nicola

    2008-08-01

    Sol-gel technique represents a remarkably versatile method for protein encapsulation. To enhance sol-gel biocompatibility, systems envisaging the presence of calcium and phosphates in the sol-gel composition were recently prepared and investigated. Unfortunately, the low pH at which solutions were prepared (pH < 2.5) dramatically limited their application to proteins, because the acidic environment induces protein denaturation. In this paper we apply a new protocol based on the introduction of calcium nitrate to the inorganic phase, with formation of a binary bioactive system. In this case protein encapsulation results versatile and secure, being achieved at a pH close to neutrality (pH 6.0); also, the presence of calcium is expected to enhance system biocompatibility. To determine the properties of the salt-doped sol-gel and the influence exerted on entrapped biosystems, the structural and functional properties of embedded cytochrome c have been investigated. Data obtained indicate that the salt-doped sol-gel induces no significant change in the structure and the redox properties of the embedded protein; also, the matrix increases protein stability. Interestingly, the presence of calcium nitrate appears determinant for refolding of the acid-denatured protein. This is of interest in the perspective of future applications in biosensoristic area.

  20. A protein block based fold recognition method for the annotation of twilight zone sequences.

    PubMed

    Suresh, V; Ganesan, K; Parthasarathy, S

    2013-03-01

    The description of protein backbone was recently improved with a group of structural fragments called Structural Alphabets instead of the regular three states (Helix, Sheet and Coil) secondary structure description. Protein Blocks is one of the Structural Alphabets used to describe each and every region of protein backbone including the coil. According to de Brevern (2000) the Protein Blocks has 16 structural fragments and each one has 5 residues in length. Protein Blocks fragments are highly informative among the available Structural Alphabets and it has been used for many applications. Here, we present a protein fold recognition method based on Protein Blocks for the annotation of twilight zone sequences. In our method, we align the predicted Protein Blocks of a query amino acid sequence with a library of assigned Protein Blocks of 953 known folds using the local pair-wise alignment. The alignment results with z-value ≥ 2.5 and P-value ≤ 0.08 are predicted as possible folds. Our method is able to recognize the possible folds for nearly 35.5% of the twilight zone sequences with their predicted Protein Block sequence obtained by pb_prediction, which is available at Protein Block Export server.

  1. Meta sequence analysis of human blood peptides and their parent proteins.

    PubMed

    Bowden, Peter; Pendrak, Voitek; Zhu, Peihong; Marshall, John G

    2010-04-18

    Sequence analysis of the blood peptides and their qualities will be key to understanding the mechanisms that contribute to error in LC-ESI-MS/MS. Analysis of peptides and their proteins at the level of sequences is much more direct and informative than the comparison of disparate accession numbers. A portable database of all blood peptide and protein sequences with descriptor fields and gene ontology terms might be useful for designing immunological or MRM assays from human blood. The results of twelve studies of human blood peptides and/or proteins identified by LC-MS/MS and correlated against a disparate array of genetic libraries were parsed and matched to proteins from the human ENSEMBL, SwissProt and RefSeq databases by SQL. The reported peptide and protein sequences were organized into an SQL database with full protein sequences and up to five unique peptides in order of prevalence along with the peptide count for each protein. Structured query language or BLAST was used to acquire descriptive information in current databases. Sampling error at the level of peptides is the largest source of disparity between groups. Chi Square analysis of peptide to protein distributions confirmed the significant agreement between groups on identified proteins. Copyright 2010. Published by Elsevier B.V.

  2. Prediction of protein tertiary structure from sequences using a very large back-propagation neural network

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, X.; Wilcox, G.L.

    1993-12-31

    We have implemented large scale back-propagation neural networks on a 544 node Connection Machine, CM-5, using the C language in MIMD mode. The program running on 512 processors performs backpropagation learning at 0.53 Gflops, which provides 76 million connection updates per second. We have applied the network to the prediction of protein tertiary structure from sequence information alone. A neural network with one hidden layer and 40 million connections is trained to learn the relationship between sequence and tertiary structure. The trained network yields predicted structures of some proteins on which it has not been trained given only their sequences.more » Presentation of the Fourier transform of the sequences accentuates periodicity in the sequence and yields good generalization with greatly increased training efficiency. Training simulations with a large, heterologous set of protein structures (111 proteins from CM-5 time) to solutions with under 2% RMS residual error within the training set (random responses give an RMS error of about 20%). Presentation of 15 sequences of related proteins in a testing set of 24 proteins yields predicted structures with less than 8% RMS residual error, indicating good apparent generalization.« less

  3. Protein Translocation into the Intermembrane Space and Matrix of Mitochondria: Mechanisms and Driving Forces.

    PubMed

    Backes, Sandra; Herrmann, Johannes M

    2017-01-01

    Mitochondria contain two aqueous subcompartments, the matrix and the intermembrane space (IMS). The matrix is enclosed by both the inner and outer mitochondrial membranes, whilst the IMS is sandwiched between the two. Proteins of the matrix are synthesized in the cytosol as preproteins, which contain amino-terminal matrix targeting sequences that mediate their translocation through translocases embedded in the outer and inner membrane. For these proteins, the translocation reaction is driven by the import motor which is part of the inner membrane translocase. The import motor employs matrix Hsp70 molecules and ATP hydrolysis to ratchet proteins into the mitochondrial matrix. Most IMS proteins lack presequences and instead utilize the IMS receptor Mia40, which facilitates their translocation across the outer membrane in a reaction that is coupled to the formation of disulfide bonds within the protein. This process requires neither ATP nor the mitochondrial membrane potential. Mia40 fulfills two roles: First, it acts as a holdase, which is crucial in the import of IMS proteins and second, it functions as a foldase, introducing disulfide bonds into newly imported proteins, which induces and stabilizes their natively folded state. For several Mia40 substrates, oxidative folding is an essential prerequisite for their assembly into oligomeric complexes. Interestingly, recent studies have shown that the two functions of Mia40 can be experimentally separated from each other by the use of specific mutants, hence providing a powerful new way to dissect the different physiological roles of Mia40. In this review we summarize the current knowledge relating to the mitochondrial matrix-targeting and the IMS-targeting/Mia40 pathway. Moreover, we discuss the mechanistic properties by which the mitochondrial import motor on the one hand and Mia40 on the other, drive the translocation of their substrates into the organelle. We propose that the lateral diffusion of Mia40 in the inner membrane and the oxidation-mediated folding of incoming polypeptides supports IMS import.

  4. Double demonstration of oncogenic high risk human papilloma virus DNA and HPV-E7 protein in oral cancers.

    PubMed

    Pannone, G; Santoro, A; Carinci, F; Bufo, P; Papagerakis, S M; Rubini, C; Campisi, G; Giovannelli, L; Contaldo, M; Serpico, R; Mazzotta, M; Lo Muzio, L

    2011-01-01

    Oncogenic HPVs are necessarily involved in cervical cancer but their role in oral carcinogenesis is debated. To detect HPV in oral cancer, 38 cases of formalin fixed-paraffin embedded OSCC were studied by both DNA genotyping (MY09/11 L1 consensus primers in combination with GP5-GP6 primer pair followed by sequencing) and immunohistochemistry (monoclonal Abs against capsid protein and HPV-E7 protein, K1H8 DAKO and clone 8C9 INVITROGEN, respectively). HPV-16 tonsil cancer was used as positive control. The overall prevalence of HPV infection in OSCCs was 10.5%. Amplification of DNA samples showed single HPV DNA infection in 3 cases (HPV16; HPV53; HPV70) and double infection in one case of cheek cancer (HPV31/HPV44). The overall HR-HPV prevalence was 7.5%. E-7 antigen was immunohistochemically detected in all HPV-positive cases. HPV+ OSCC cases showed an overall better outcome than HPV negative oral cancers, as evaluated by Kaplan-Meier curves. HPVs exert their oncogenic role after DNA integration, gene expression of E5, E6 and E7 loci and p53/pRb host proteins suppression. This study showed that HPV-E7 protein inactivating pRb is expressed in oral cancer cells infected by oncogenic HPV other than classical HR-HPV-16/18. Interestingly HPV-70, considered a low risk virus with no definite collocation in oncogenic type category, gives rise to the expression of HPV-E7 protein and inactivate pRb in oral cancer. HPV-70, as proved in current literature, is able to inactivates also p53 protein, promoting cell immortalization. HPV-53, classified as a possible high risk virus, expresses E7 protein in OSCC, contributing to oral carcinogenesis. We have identified among OSCCs, a subgroup characterized by HPV infection (10.5%). Finally, we have proved the oncogenic potential of some HPV virus types, not well known in literature.

  5. CROSS-DISCIPLINARY PHYSICS AND RELATED AREAS OF SCIENCE AND TECHNOLOGY: Chaos game representation walk model for the protein sequences

    NASA Astrophysics Data System (ADS)

    Gao, Jie; Jiang, Li-Li; Xu, Zhen-Yuan

    2009-10-01

    A new chaos game representation of protein sequences based on the detailed hydrophobic-hydrophilic (HP) model has been proposed by Yu et al (Physica A 337 (2004) 171). A CGR-walk model is proposed based on the new CGR coordinates for the protein sequences from complete genomes in the present paper. The new CGR coordinates based on the detailed HP model are converted into a time series, and a long-memory ARFIMA(p, d, q) model is introduced into the protein sequence analysis. This model is applied to simulating real CGR-walk sequence data of twelve protein sequences. Remarkably long-range correlations are uncovered in the data and the results obtained from these models are reasonably consistent with those available from the ARFIMA(p, d, q) model.

  6. Conservation of coevolving protein interfaces bridges prokaryote–eukaryote homologies in the twilight zone

    PubMed Central

    Rodriguez-Rivas, Juan; Marsili, Simone; Juan, David; Valencia, Alfonso

    2016-01-01

    Protein–protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue coevolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that coevolution points to structurally conserved contacts at protein–protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a domain-centered protocol to study the interplay between residue coevolution and structural conservation of protein–protein interfaces. We show that sequence-based coevolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein–protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence (the twilight zone), where standard homology modeling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic coevolutionary analysis to the prediction of eukaryotic interfaces further illustrate the potential of this approach. PMID:27965389

  7. Purification, characterization and sequence analysis of Omp50,a new porin isolated from Campylobacter jejuni.

    PubMed Central

    Bolla, J M; Dé, E; Dorez, A; Pagès, J M

    2000-01-01

    A novel pore-forming protein identified in Campylobacter was purified by ion-exchange chromatography and named Omp50 according to both its molecular mass and its outer membrane localization. We observed a pore-forming ability of Omp50 after re-incorporation into artificial membranes. The protein induced cation-selective channels with major conductance values of 50-60 pS in 1 M NaCl. N-terminal sequencing allowed us to identify the predicted coding sequence Cj1170c from the Campylobacter jejuni genome database as the corresponding gene in the NCTC 11168 genome sequence. The gene, designated omp50, consists of a 1425 bp open reading frame encoding a deduced 453-amino acid protein with a calculated pI of 5.81 and a molecular mass of 51169.2 Da. The protein possessed a 20-amino acid leader sequence. No significant similarity was found between Omp50 and porin protein sequences already determined. Moreover, the protein showed only weak sequence identity with the major outer-membrane protein (MOMP) of Campylobacter, correlating with the absence of antigenic cross-reactivity between these two proteins. Omp50 is expressed in C. jejuni and Campylobacter lari but not in Campylobacter coli. The gene, however, was detected in all three species by PCR. According to its conformation and functional properties, the protein would belong to the family of outer-membrane monomeric porins. PMID:11104668

  8. Immunological recognition of different forms of the neurotensin receptor in transfected cells and rat brain.

    PubMed Central

    Boudin, H; Grauz-Guyon, A; Faure, M P; Forgez, P; Lhiaubet, A M; Dennis, M; Beaudet, A; Rostene, W; Pelaprat, D

    1995-01-01

    In this work, the molecular forms of the rat neurotensin receptor (NTR) expressed in transfected Chinese hamster ovary (CHO) cells, in infected Sf9 insect cells and in rat cerebral cortex were immunologically detected by means of an anti-peptide antibody raised against a fragment of the third intracellular loop of the receptor. Immunoblot experiments against a fusion protein indicated that the anti-peptide antibody recognized, under denaturing conditions, the corresponding amino acid sequence within the NTR. In immunoblot analysis of membranes from NTR-transfected CHO cells, high levels of immunoreactivity were observed between 60 and 72 kDa, while only a faint labelling was observed at 47 kDa, the molecular mass deduced for the rat NTR cDNA. The bands of high molecular mass were no longer observed after deglycosylation of membrane proteins by peptide N-glycosidase F, indicating that they represented glycosylated forms of the receptor. Extracts of membranes derived from baculovirus-infected Sf9 insect-cells expressing the NTR provided a quite different immunoblot pattern, since the major band detected in that case was at 47 kDa, the molecular size of the non-glycosylated receptor. Taken together, these data show that, while most of the NTR protein was glycosylated in CHO cells, it was unglycosylated in Sf9 insect-cells. In addition, molecular sizes of the receptor proteins observed in these two cell lines differed from those obtained for the NTR endogenously expressed in the rat cerebral cortex of 7 day-old rats, where bands at 56 and 54 kDa were detected. Binding experiments carried out on membrane preparations obtained from baculovirus-infected Sf9 cells demonstrated that the immunogenic sequence was still accessible to the antibody when the receptor was embedded in the cell membrane. Immunohistochemical studies carried out on both transfected CHO cells and infected Sf9 cells confirmed this interpretation and further indicated that the antibody could be applied in the visualization of the receptor. Images Figure 1 Figure 2 Figure 3 Figure 5 PMID:7826341

  9. Differentials on graph complexes II: hairy graphs

    NASA Astrophysics Data System (ADS)

    Khoroshkin, Anton; Willwacher, Thomas; Živković, Marko

    2017-10-01

    We study the cohomology of the hairy graph complexes which compute the rational homotopy of embedding spaces, generalizing the Vassiliev invariants of knot theory. We provide spectral sequences converging to zero whose first pages contain the hairy graph cohomology. Our results yield a way to construct many nonzero hairy graph cohomology classes out of (known) non-hairy classes by studying the cancellations in those sequences. This provide a first glimpse at the tentative global structure of the hairy graph cohomology.

  10. Cysteine-containing peptide tag for site-specific conjugation of proteins

    DOEpatents

    Backer, Marina V.; Backer, Joseph M.

    2008-04-08

    The present invention is directed to a biological conjugate, comprising: (a) a targeting moiety comprising a polypeptide having an amino acid sequence comprising the polypeptide sequence of SEQ ID NO:2 and the polypeptide sequence of a selected targeting protein; and (b) a binding moiety bound to the targeting moiety; the biological conjugate having a covalent bond between the thiol group of SEQ ID NO:2 and a functional group in the binding moiety. The present invention is directed to a biological conjugate, comprising: (a) a targeting moiety comprising a polypeptide having an amino acid sequence comprising the polypeptide sequence of SEQ ID NO:2 and the polypeptide sequence of a selected targeting protein; and (b) a binding moiety that comprises an adapter protein, the adapter protein having a thiol group; the biological conjugate having a disulfide bond between the thiol group of SEQ ID NO:2 and the thiol group of the adapter protein. The present invention is also directed to biological sequences employed in the above biological conjugates, as well as pharmaceutical preparations and methods using the above biological conjugates.

  11. Cysteine-containing peptide tag for site-specific conjugation of proteins

    DOEpatents

    Backer, Marina V.; Backer, Joseph M.

    2010-10-05

    The present invention is directed to a biological conjugate, comprising: (a) a targeting moiety comprising a polypeptide having an amino acid sequence comprising the polypeptide sequence of SEQ ID NO:2 and the polypeptide sequence of a selected targeting protein; and (b) a binding moiety bound to the targeting moiety; the biological conjugate having a covalent bond between the thiol group of SEQ ID NO:2 and a functional group in the binding moiety. The present invention is directed to a biological conjugate, comprising: (a) a targeting moiety comprising a polypeptide having an amino acid sequence comprising the polypeptide sequence of SEQ ID NO:2 and the polypeptide sequence of a selected targeting protein; and (b) a binding moiety that comprises an adapter protein, the adapter protein having a thiol group; the biological conjugate having a disulfide bond between the thiol group of SEQ ID NO:2 and the thiol group of the adapter protein. The present invention is also directed to biological sequences employed in the above biological conjugates, as well as pharmaceutical preparations and methods using the above biological conjugates.

  12. Graph pyramids for protein function prediction

    PubMed Central

    2015-01-01

    Background Uncovering the hidden organizational characteristics and regularities among biological sequences is the key issue for detailed understanding of an underlying biological phenomenon. Thus pattern recognition from nucleic acid sequences is an important affair for protein function prediction. As proteins from the same family exhibit similar characteristics, homology based approaches predict protein functions via protein classification. But conventional classification approaches mostly rely on the global features by considering only strong protein similarity matches. This leads to significant loss of prediction accuracy. Methods Here we construct the Protein-Protein Similarity (PPS) network, which captures the subtle properties of protein families. The proposed method considers the local as well as the global features, by examining the interactions among 'weakly interacting proteins' in the PPS network and by using hierarchical graph analysis via the graph pyramid. Different underlying properties of the protein families are uncovered by operating the proposed graph based features at various pyramid levels. Results Experimental results on benchmark data sets show that the proposed hierarchical voting algorithm using graph pyramid helps to improve computational efficiency as well the protein classification accuracy. Quantitatively, among 14,086 test sequences, on an average the proposed method misclassified only 21.1 sequences whereas baseline BLAST score based global feature matching method misclassified 362.9 sequences. With each correctly classified test sequence, the fast incremental learning ability of the proposed method further enhances the training model. Thus it has achieved more than 96% protein classification accuracy using only 20% per class training data. PMID:26044522

  13. Graph pyramids for protein function prediction.

    PubMed

    Sandhan, Tushar; Yoo, Youngjun; Choi, Jin; Kim, Sun

    2015-01-01

    Uncovering the hidden organizational characteristics and regularities among biological sequences is the key issue for detailed understanding of an underlying biological phenomenon. Thus pattern recognition from nucleic acid sequences is an important affair for protein function prediction. As proteins from the same family exhibit similar characteristics, homology based approaches predict protein functions via protein classification. But conventional classification approaches mostly rely on the global features by considering only strong protein similarity matches. This leads to significant loss of prediction accuracy. Here we construct the Protein-Protein Similarity (PPS) network, which captures the subtle properties of protein families. The proposed method considers the local as well as the global features, by examining the interactions among 'weakly interacting proteins' in the PPS network and by using hierarchical graph analysis via the graph pyramid. Different underlying properties of the protein families are uncovered by operating the proposed graph based features at various pyramid levels. Experimental results on benchmark data sets show that the proposed hierarchical voting algorithm using graph pyramid helps to improve computational efficiency as well the protein classification accuracy. Quantitatively, among 14,086 test sequences, on an average the proposed method misclassified only 21.1 sequences whereas baseline BLAST score based global feature matching method misclassified 362.9 sequences. With each correctly classified test sequence, the fast incremental learning ability of the proposed method further enhances the training model. Thus it has achieved more than 96% protein classification accuracy using only 20% per class training data.

  14. Protein Crystal Eco R1 Endonulease-DNA Complex

    NASA Technical Reports Server (NTRS)

    1998-01-01

    Type II restriction enzymes, such as Eco R1 endonulease, present a unique advantage for the study of sequence-specific recognition because they leave a record of where they have been in the form of the cleaved ends of the DNA sites where they were bound. The differential behavior of a sequence -specific protein at sites of differing base sequence is the essence of the sequence-specificity; the core question is how do these proteins discriminate between different DNA sequences especially when the two sequences are very similar. Principal Investigator: Dan Carter/New Century Pharmaceuticals

  15. A new method of preparing embeddment-free sections for transmission electron microscopy: applications to the cytoskeletal framework and other three-dimensional networks.

    PubMed

    Capco, D G; Krochmalnic, G; Penman, S

    1984-05-01

    Diethylene glycol distearate is used as a removable embedding medium to produce embeddment -free sections for transmission electron microscopy. The easily cut sections of this material float and form ribbons in a water-filled knife trough and exhibit interference colors that aid in the selection of sections of equal thickness. The images obtained with embeddment -free sections are compared with those from the more conventional epoxy-embedded sections, and illustrate that embedding medium can obscure important biological structures, especially protein filament networks. The embeddment -free section methodology is well suited for morphological studies of cytoskeletal preparations obtained by extraction of cells with nonionic detergent in cytoskeletal stabilizing medium. The embeddment -free section also serves to bridge the very different images afforded by embedded sections and unembedded whole mounts.

  16. Genomewide Function Conservation and Phylogeny in the Herpesviridae

    PubMed Central

    Albà, M. Mar; Das, Rhiju; Orengo, Christine A.; Kellam, Paul

    2001-01-01

    The Herpesviridae are a large group of well-characterized double-stranded DNA viruses for which many complete genome sequences have been determined. We have extracted protein sequences from all predicted open reading frames of 19 herpesvirus genomes. Sequence comparison and protein sequence clustering methods have been used to construct herpesvirus protein homologous families. This resulted in 1692 proteins being clustered into 243 multiprotein families and 196 singleton proteins. Predicted functions were assigned to each homologous family based on genome annotation and published data and each family classified into seven broad functional groups. Phylogenetic profiles were constructed for each herpesvirus from the homologous protein families and used to determine conserved functions and genomewide phylogenetic trees. These trees agreed with molecular-sequence-derived trees and allowed greater insight into the phylogeny of ungulate and murine gammaherpesviruses. PMID:11156614

  17. Determining protein function and interaction from genome analysis

    DOEpatents

    Eisenberg, David; Marcotte, Edward M.; Thompson, Michael J.; Pellegrini, Matteo; Yeates, Todd O.

    2004-08-03

    A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.

  18. An approach to large scale identification of non-obvious structural similarities between proteins

    PubMed Central

    Cherkasov, Artem; Jones, Steven JM

    2004-01-01

    Background A new sequence independent bioinformatics approach allowing genome-wide search for proteins with similar three dimensional structures has been developed. By utilizing the numerical output of the sequence threading it establishes putative non-obvious structural similarities between proteins. When applied to the testing set of proteins with known three dimensional structures the developed approach was able to recognize structurally similar proteins with high accuracy. Results The method has been developed to identify pathogenic proteins with low sequence identity and high structural similarity to host analogues. Such protein structure relationships would be hypothesized to arise through convergent evolution or through ancient horizontal gene transfer events, now undetectable using current sequence alignment techniques. The pathogen proteins, which could mimic or interfere with host activities, would represent candidate virulence factors. The developed approach utilizes the numerical outputs from the sequence-structure threading. It identifies the potential structural similarity between a pair of proteins by correlating the threading scores of the corresponding two primary sequences against the library of the standard folds. This approach allowed up to 64% sensitivity and 99.9% specificity in distinguishing protein pairs with high structural similarity. Conclusion Preliminary results obtained by comparison of the genomes of Homo sapiens and several strains of Chlamydia trachomatis have demonstrated the potential usefulness of the method in the identification of bacterial proteins with known or potential roles in virulence. PMID:15147578

  19. Sequence and structural implications of a bovine corneal keratan sulfate proteoglycan core protein. Protein 37B represents bovine lumican and proteins 37A and 25 are unique

    NASA Technical Reports Server (NTRS)

    Funderburgh, J. L.; Funderburgh, M. L.; Brown, S. J.; Vergnes, J. P.; Hassell, J. R.; Mann, M. M.; Conrad, G. W.; Spooner, B. S. (Principal Investigator)

    1993-01-01

    Amino acid sequence from tryptic peptides of three different bovine corneal keratan sulfate proteoglycan (KSPG) core proteins (designated 37A, 37B, and 25) showed similarities to the sequence of a chicken KSPG core protein lumican. Bovine lumican cDNA was isolated from a bovine corneal expression library by screening with chicken lumican cDNA. The bovine cDNA codes for a 342-amino acid protein, M(r) 38,712, containing amino acid sequences identified in the 37B KSPG core protein. The bovine lumican is 68% identical to chicken lumican, with an 83% identity excluding the N-terminal 40 amino acids. Location of 6 cysteine and 4 consensus N-glycosylation sites in the bovine sequence were identical to those in chicken lumican. Bovine lumican had about 50% identity to bovine fibromodulin and 20% identity to bovine decorin and biglycan. About two-thirds of the lumican protein consists of a series of 10 amino acid leucine-rich repeats that occur in regions of calculated high beta-hydrophobic moment, suggesting that the leucine-rich repeats contribute to beta-sheet formation in these proteins. Sequences obtained from 37A and 25 core proteins were absent in bovine lumican, thus predicting a unique primary structure and separate mRNA for each of the three bovine KSPG core proteins.

  20. New Structural and Functional Contexts of the Dx[DN]xDG Linear Motif: Insights into Evolution of Calcium-Binding Proteins

    PubMed Central

    Rigden, Daniel J.; Woodhead, Duncan D.; Wong, Prudence W. H.; Galperin, Michael Y.

    2011-01-01

    Binding of calcium ions (Ca2+) to proteins can have profound effects on their structure and function. Common roles of calcium binding include structure stabilization and regulation of activity. It is known that diverse families – EF-hands being one of at least twelve – use a Dx[DN]xDG linear motif to bind calcium in near-identical fashion. Here, four novel structural contexts for the motif are described. Existing experimental data for one of them, a thermophilic archaeal subtilisin, demonstrate for the first time a role for Dx[DN]xDG-bound calcium in protein folding. An integrin-like embedding of the motif in the blade of a β-propeller fold – here named the calcium blade – is discovered in structures of bacterial and fungal proteins. Furthermore, sensitive database searches suggest a common origin for the calcium blade in β-propeller structures of different sizes and a pan-kingdom distribution of these proteins. Factors favouring the multiple convergent evolution of the motif appear to include its general Asp-richness, the regular spacing of the Asp residues and the fact that change of Asp into Gly and vice versa can occur though a single nucleotide change. Among the known structural contexts for the Dx[DN]xDG motif, only the calcium blade and the EF-hand are currently found intracellularly in large numbers, perhaps because the higher extracellular concentration of Ca2+ allows for easier fixing of newly evolved motifs that have acquired useful functions. The analysis presented here will inform ongoing efforts toward prediction of similar calcium-binding motifs from sequence information alone. PMID:21720552

  1. Investigation of the protein osteocalcin of Camelops hesternus: Sequence, structure and phylogenetic implications

    NASA Astrophysics Data System (ADS)

    Humpula, James F.; Ostrom, Peggy H.; Gandhi, Hasand; Strahler, John R.; Walker, Angela K.; Stafford, Thomas W.; Smith, James J.; Voorhies, Michael R.; George Corner, R.; Andrews, Phillip C.

    2007-12-01

    Ancient DNA sequences offer an extraordinary opportunity to unravel the evolutionary history of ancient organisms. Protein sequences offer another reservoir of genetic information that has recently become tractable through the application of mass spectrometric techniques. The extent to which ancient protein sequences resolve phylogenetic relationships, however, has not been explored. We determined the osteocalcin amino acid sequence from the bone of an extinct Camelid (21 ka, Camelops hesternus) excavated from Isleta Cave, New Mexico and three bones of extant camelids: bactrian camel ( Camelus bactrianus); dromedary camel ( Camelus dromedarius) and guanaco ( Llama guanacoe) for a diagenetic and phylogenetic assessment. There was no difference in sequence among the four taxa. Structural attributes observed in both modern and ancient osteocalcin include a post-translation modification, Hyp 9, deamidation of Gln 35 and Gln 39, and oxidation of Met 36. Carbamylation of the N-terminus in ancient osteocalcin may result in blockage and explain previous difficulties in sequencing ancient proteins via Edman degradation. A phylogenetic analysis using osteocalcin sequences of 25 vertebrate taxa was conducted to explore osteocalcin protein evolution and the utility of osteocalcin sequences for delineating phylogenetic relationships. The maximum likelihood tree closely reflected generally recognized taxonomic relationships. For example, maximum likelihood analysis recovered rodents, birds and, within hominins, the Homo-Pan-Gorilla trichotomy. Within Artiodactyla, character state analysis showed that a substitution of Pro 4 for His 4 defines the Capra-Ovis clade within Artiodactyla. Homoplasy in our analysis indicated that osteocalcin evolution is not a perfect indicator of species evolution. Limited sequence availability prevented assigning functional significance to sequence changes. Our preliminary analysis of osteocalcin evolution represents an initial step towards a complete character analysis aimed at determining the evolutionary history of this functionally significant protein. We emphasize that ancient protein sequencing and phylogenetic analyses using amino acid sequences must pay close attention to post-translational modifications, amino acid substitutions due to diagenetic alteration and the impacts of isobaric amino acids on mass shifts and sequence alignments.

  2. Atom probe tomographic mapping directly reveals the atomic distribution of phosphorus in resin embedded ferritin

    DOE PAGES

    Perea, Daniel E.; Liu, Jia; Bartrand, Jonah A. G.; ...

    2016-02-29

    In this study, we report the atomic-scale analysis of biological interfaces using atom probe tomography. Embedding the protein ferritin in an organic polymer resin lacking nitrogen provided chemical contrast to visualize atomic distributions and distinguish organic-organic and organic-inorganic interfaces. The sample preparation method can be directly extended to further enhance the study of biological, organic and inorganic nanomaterials relevant to health, energy or the environment.

  3. Nucleotide sequence of the Saccharomyces cerevisiae PUT4 proline-permease-encoding gene: similarities between CAN1, HIP1 and PUT4 permeases.

    PubMed

    Vandenbol, M; Jauniaux, J C; Grenson, M

    1989-11-15

    The complete nucleotide (nt) sequence of the PUT4 gene, whose product is required for high-affinity proline active transport in the yeast Saccharomyces cerevisiae, is presented. The sequence contains a single long open reading frame of 1881 nt, encoding a polypeptide with a calculated Mr of 68,795. The predicted protein is strongly hydrophobic and exhibits six potential glycosylation sites. Its hydropathy profile suggests the presence of twelve membrane-spanning regions flanked by hydrophilic N- and C-terminal domains. The N terminus does not resemble signal sequences found in secreted proteins. These features are characteristic of integral membrane proteins catalyzing translocation of ligands across cellular membranes. Protein sequence comparisons indicate strong resemblance to the arginine and histidine permeases of S. cerevisiae, but no marked sequence similarity to the proline permease of Escherichia coli or to other known prokaryotic or eukaryotic transport proteins. The strong similarity between the three yeast amino acid permeases suggests a common ancestor for the three proteins.

  4. Distinguishing between Protein Dynamics and Dye Photophysics in Single-Molecule FRET Experiments

    PubMed Central

    Chung, Hoi Sung; Louis, John M.; Eaton, William A.

    2010-01-01

    Abstract Förster resonance energy transfer (FRET) efficiency distributions in single-molecule experiments contain both structural and dynamical information. Extraction of this information from these distributions requires a careful analysis of contributions from dye photophysics. To investigate how mechanisms other than FRET affect the distributions obtained by counting donor and acceptor photons, we have measured single-molecule fluorescence trajectories of a small α/β protein, i.e., protein GB1, undergoing two-state, folding/unfolding transitions. Alexa 488 donor and Alexa 594 acceptor dyes were attached to cysteines at positions 10 and 57 to yield two isomers—donor10/acceptor57 and donor57/acceptor10—which could not be separated in the purification. The protein was immobilized via binding of a histidine tag added to a linker sequence at the N-terminus to cupric ions embedded in a polyethylene-glycol–coated glass surface. The distribution of FRET efficiencies assembled from the trajectories is complex with widths for the individual peaks in large excess of that caused by shot noise. Most of this complexity can be explained by two interfering photophysical effects—a photoinduced red shift of the donor dye and differences in the quantum yield of the acceptor dye for the two isomers resulting from differences in quenching rate by the cupric ion. Measurements of steady-state polarization, calculation of the donor-acceptor cross-correlation function from photon trajectories, and comparison of the single molecule and ensemble kinetics all indicate that conformational distributions and dynamics do not contribute to the complexity. PMID:20159166

  5. Distinguishing between protein dynamics and dye photophysics in single-molecule FRET experiments.

    PubMed

    Chung, Hoi Sung; Louis, John M; Eaton, William A

    2010-02-17

    Förster resonance energy transfer (FRET) efficiency distributions in single-molecule experiments contain both structural and dynamical information. Extraction of this information from these distributions requires a careful analysis of contributions from dye photophysics. To investigate how mechanisms other than FRET affect the distributions obtained by counting donor and acceptor photons, we have measured single-molecule fluorescence trajectories of a small alpha/beta protein, i.e., protein GB1, undergoing two-state, folding/unfolding transitions. Alexa 488 donor and Alexa 594 acceptor dyes were attached to cysteines at positions 10 and 57 to yield two isomers-donor(10)/acceptor(57) and donor(57)/acceptor(10)-which could not be separated in the purification. The protein was immobilized via binding of a histidine tag added to a linker sequence at the N-terminus to cupric ions embedded in a polyethylene-glycol-coated glass surface. The distribution of FRET efficiencies assembled from the trajectories is complex with widths for the individual peaks in large excess of that caused by shot noise. Most of this complexity can be explained by two interfering photophysical effects-a photoinduced red shift of the donor dye and differences in the quantum yield of the acceptor dye for the two isomers resulting from differences in quenching rate by the cupric ion. Measurements of steady-state polarization, calculation of the donor-acceptor cross-correlation function from photon trajectories, and comparison of the single molecule and ensemble kinetics all indicate that conformational distributions and dynamics do not contribute to the complexity. Copyright 2010 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  6. Run-length encoding graphic rules, biochemically editable designs and steganographical numeric data embedment for DNA-based cryptographical coding system.

    PubMed

    Kawano, Tomonori

    2013-03-01

    There have been a wide variety of approaches for handling the pieces of DNA as the "unplugged" tools for digital information storage and processing, including a series of studies applied to the security-related area, such as DNA-based digital barcodes, water marks and cryptography. In the present article, novel designs of artificial genes as the media for storing the digitally compressed data for images are proposed for bio-computing purpose while natural genes principally encode for proteins. Furthermore, the proposed system allows cryptographical application of DNA through biochemically editable designs with capacity for steganographical numeric data embedment. As a model case of image-coding DNA technique application, numerically and biochemically combined protocols are employed for ciphering the given "passwords" and/or secret numbers using DNA sequences. The "passwords" of interest were decomposed into single letters and translated into the font image coded on the separate DNA chains with both the coding regions in which the images are encoded based on the novel run-length encoding rule, and the non-coding regions designed for biochemical editing and the remodeling processes revealing the hidden orientation of letters composing the original "passwords." The latter processes require the molecular biological tools for digestion and ligation of the fragmented DNA molecules targeting at the polymerase chain reaction-engineered termini of the chains. Lastly, additional protocols for steganographical overwriting of the numeric data of interests over the image-coding DNA are also discussed.

  7. Sequence of the amino-terminal region of rat liver ribosomal proteins S4, S6, S8, L6, L7a, L18, L27, L30, L37, L37a, and L39.

    PubMed

    Wittmann-Liebold, B; Geissler, A W; Lin, A; Wool, I G

    1979-01-01

    The sequence of the amino-terminal region of eleven rat liver ribosomal proteins--S4, S6, S8, L6, L7a, L18, L27, L30, L37a, and L39--was determined. The analysis confirmed the homogeneity of the proteins and suggests that they are unique, since no extensive common sequences were found. The N-terminal regions of the rat liver proteins were compared with amino acid sequences in Saccharomyces cerevisiae and in Escherichia coli ribosomal proteins. It seems likely that the proteins L37 from rat liver and Y55 from yeast ribosomes are homologous. It is possible that rat liver L7a or L37a or both are related to S cerevisiae Y44, although the similar sequences are at the amino-terminus of the rat liver proteins and in an internal region of Y44. A number of similarities in the sequences of rat liver and E coli ribosomal proteins have been found; however, it is not yet possible to say whether they connote a common ancestry.

  8. Variability of the protein sequences of lcrV between epidemic and atypical rhamnose-positive strains of Yersinia pestis.

    PubMed

    Anisimov, Andrey P; Panfertsev, Evgeniy A; Svetoch, Tat'yana E; Dentovskaya, Svetlana V

    2007-01-01

    Sequencing of lcrV genes and comparison of the deduced amino acid sequences from ten Y. pestis strains belonging mostly to the group of atypical rhamnose-positive isolates (non-pestis subspecies or pestoides group) showed that the LcrV proteins analyzed could be classified into five sequence types. This classification was based on major amino acid polymorphisms among LcrV proteins in the four "hot points" of the protein sequences. Some additional minor polymorphisms were found throughout these sequence types. The "hot points" corresponded to amino acids 18 (Lys --> Asn), 72 (Lys --> Arg), 273 (Cys --> Ser), and 324-326 (Ser-Gly-Lys --> Arg) in the LcrV sequence of the reference Y. pestis strain CO92. One possible explanation for polymorphism in amino acid sequences of LcrV among different strains is that strain-specific variation resulted from adaptation of the plague pathogen to different rodent and lagomorph hosts.

  9. Robustness of Reconstructed Ancestral Protein Functions to Statistical Uncertainty.

    PubMed

    Eick, Geeta N; Bridgham, Jamie T; Anderson, Douglas P; Harms, Michael J; Thornton, Joseph W

    2017-02-01

    Hypotheses about the functions of ancient proteins and the effects of historical mutations on them are often tested using ancestral protein reconstruction (APR)-phylogenetic inference of ancestral sequences followed by synthesis and experimental characterization. Usually, some sequence sites are ambiguously reconstructed, with two or more statistically plausible states. The extent to which the inferred functions and mutational effects are robust to uncertainty about the ancestral sequence has not been studied systematically. To address this issue, we reconstructed ancestral proteins in three domain families that have different functions, architectures, and degrees of uncertainty; we then experimentally characterized the functional robustness of these proteins when uncertainty was incorporated using several approaches, including sampling amino acid states from the posterior distribution at each site and incorporating the alternative amino acid state at every ambiguous site in the sequence into a single "worst plausible case" protein. In every case, qualitative conclusions about the ancestral proteins' functions and the effects of key historical mutations were robust to sequence uncertainty, with similar functions observed even when scores of alternate amino acids were incorporated. There was some variation in quantitative descriptors of function among plausible sequences, suggesting that experimentally characterizing robustness is particularly important when quantitative estimates of ancient biochemical parameters are desired. The worst plausible case method appears to provide an efficient strategy for characterizing the functional robustness of ancestral proteins to large amounts of sequence uncertainty. Sampling from the posterior distribution sometimes produced artifactually nonfunctional proteins for sequences reconstructed with substantial ambiguity. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  10. A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis.

    PubMed

    Liu, Bin; Wang, Xiaolong; Lin, Lei; Dong, Qiwen; Wang, Xuan

    2008-12-01

    Protein remote homology detection and fold recognition are central problems in bioinformatics. Currently, discriminative methods based on support vector machine (SVM) are the most effective and accurate methods for solving these problems. A key step to improve the performance of the SVM-based methods is to find a suitable representation of protein sequences. In this paper, a novel building block of proteins called Top-n-grams is presented, which contains the evolutionary information extracted from the protein sequence frequency profiles. The protein sequence frequency profiles are calculated from the multiple sequence alignments outputted by PSI-BLAST and converted into Top-n-grams. The protein sequences are transformed into fixed-dimension feature vectors by the occurrence times of each Top-n-gram. The training vectors are evaluated by SVM to train classifiers which are then used to classify the test protein sequences. We demonstrate that the prediction performance of remote homology detection and fold recognition can be improved by combining Top-n-grams and latent semantic analysis (LSA), which is an efficient feature extraction technique from natural language processing. When tested on superfamily and fold benchmarks, the method combining Top-n-grams and LSA gives significantly better results compared to related methods. The method based on Top-n-grams significantly outperforms the methods based on many other building blocks including N-grams, patterns, motifs and binary profiles. Therefore, Top-n-gram is a good building block of the protein sequences and can be widely used in many tasks of the computational biology, such as the sequence alignment, the prediction of domain boundary, the designation of knowledge-based potentials and the prediction of protein binding sites.

  11. Aging does not affect generalized postural motor learning in response to variable amplitude oscillations of the support surface.

    PubMed

    Van Ooteghem, Karen; Frank, James S; Allard, Fran; Horak, Fay B

    2010-08-01

    Postural motor learning for dynamic balance tasks has been demonstrated in healthy older adults (Van Ooteghem et al. in Exp Brain Res 199(2):185-193, 2009). The purpose of this study was to investigate the type of knowledge (general or specific) obtained with balance training in this age group and to examine whether embedding perturbation regularities within a balance task masks specific learning. Two groups of older adults maintained balance on a translating platform that oscillated with variable amplitude and constant frequency. One group was trained using an embedded-sequence (ES) protocol which contained the same 15-s sequence of variable amplitude oscillations in the middle of each trial. A second group was trained using a looped-sequence (LS) protocol which contained a 15-s sequence repeated three times to form each trial. All trials were 45 s. Participants were not informed of any repetition. To examine learning, participants performed a retention test following a 24-h delay. LS participants also completed a transfer task. Specificity of learning was examined by comparing performance for repeated versus random sequences (ES) and training versus transfer sequences (LS). Performance was measured by deriving spatial and temporal measures of whole body center of mass (COM) and trunk orientation. Both groups improved performance with practice as characterized by reduced COM displacement, improved COM-platform phase relationships, and decreased angular trunk motion. Furthermore, improvements reflected general rather than specific postural motor learning regardless of training protocol (ES or LS). This finding is similar to young adults (Van Ooteghem et al. in Exp Brain Res 187(4):603-611, 2008) and indicates that age does not influence the type of learning which occurs for balance control.

  12. A peek into tropomyosin binding and unfolding on the actin filament.

    PubMed

    Singh, Abhishek; Hitchcock-Degregori, Sarah E

    2009-07-24

    Tropomyosin is a prototypical coiled coil along its length with subtle variations in structure that allow interactions with actin and other proteins. Actin binding globally stabilizes tropomyosin. Tropomyosin-actin interaction occurs periodically along the length of tropomyosin. However, it is not well understood how tropomyosin binds actin. Tropomyosin's periodic binding sites make differential contributions to two components of actin binding, cooperativity and affinity, and can be classified as primary or secondary sites. We show through mutagenesis and analysis of recombinant striated muscle alpha-tropomyosins that primary actin binding sites have a destabilizing coiled-coil interface, typically alanine-rich, embedded within a non-interface recognition sequence. Introduction of an Ala cluster in place of the native, more stable interface in period 2 and/or period 3 sites (of seven) increased the affinity or cooperativity of actin binding, analysed by cosedimentation and differential scanning calorimetry. Replacement of period 3 with period 5 sequence, an unstable region of known importance for cooperative actin binding, increased the cooperativity of binding. Introduction of the fluorescent probe, pyrene, near the mutation sites in periods 2 and 3 reported local instability, stabilization by actin binding, and local unfolding before or coincident with dissociation from actin (measured using light scattering), and chain dissociation (analyzed using circular dichroism). This, and previous work, suggests that regions of tropomyosin involved in binding actin have non-interface residues specific for interaction with actin and an unstable interface that is locally stabilized upon binding. The destabilized interface allows residues on the coiled-coil surface to obtain an optimal conformation for interaction with actin by increasing the number of local substates that the side chains can sample. We suggest that local disorder is a property typical of coiled coil binding sites and proteins that have multiple binding partners, of which tropomyosin is one type.

  13. Aberrant expression of epithelial leucine-rich repeat containing G protein-coupled receptor 5-positive cells in the eutopic endometrium in endometriosis and implications in deep-infiltrating endometriosis.

    PubMed

    Vallvé-Juanico, Júlia; Suárez-Salvador, Elena; Castellví, Josep; Ballesteros, Agustín; Taylor, Hugh S; Gil-Moreno, Antonio; Santamaria, Xavier

    2017-11-01

    To characterize leucine-rich repeat containing G protein-coupled receptor 5-positive (LGR5 + ) cells from the endometrium of women with endometriosis. Prospective experimental study. University hospital/fertility clinic. Twenty-seven women with endometriosis who underwent surgery and 12 healthy egg donors, together comprising 39 endometrial samples. Obtaining of uterine aspirates by using a Cornier Pipelle. Immunofluorescence in formalin-fixed paraffin-embedded tissue from mice and healthy and pathologic human endometrium using antibodies against LGR5, E-cadherin, and cytokeratin, and epithelial and stromal LGR5 + cells isolated from healthy and pathologic human eutopic endometrium by fluorescence-activated cell sorting and transcriptomic characterization by RNA high sequencing. Immunofluorescence showed that LGR5 + cells colocalized with epithelial markers in the stroma of the endometrium only in endometriotic patients. The results from RNA high sequencing of LGR5 + cells from epithelium and stroma did not show any statistically significant differences between them. The LGR5 + versus LGR5 - cells in pathologic endometrium showed 394 differentially expressed genes. The LGR5 + cells in deep-infiltrating endometriosis expressed inflammatory markers not present in the other types of the disease. Our results revealed the presence of aberrantly located LGR5 + cells coexpressing epithelial markers in the stromal compartment of women with endometriosis. These cells have a statistically significantly different expression profile in deep-infiltrating endometriosis in comparison with other types of endometriosis, independent of the menstrual cycle phase. Further studies are needed to elucidate their role and influence in reproductive outcomes. Copyright © 2017. Published by Elsevier Inc.

  14. Host Cell Virus Entry Mediated by Australian Bat Lyssavirus Envelope G glycoprotein

    DTIC Science & Technology

    2013-10-24

    39 Figure 7. Comparison of the amino acid sequences of Saccolaimus and Pteropus ABLV G mature protein... sequence analysis revealed that the PCR products were identical. Sequence comparisons of the ABLV N and other lyssavirus N proteins showed that ABLV...Saccolaimus flaviventris) (129). Nucleoprotein sequence comparisons revealed that the Saccolaimus N protein shared 96% amino acid homology with the Pteropus

  15. Extreme Sensory Complexity Encoded in the 10-Megabase Draft Genome Sequence of the Chromatically Acclimating Cyanobacterium Tolypothrix sp. PCC 7601

    PubMed Central

    Yerrapragada, Shaila; Shukla, Animesh; Hallsworth-Pepin, Kymberlie; Choi, Kwangmin; Wollam, Aye; Clifton, Sandra; Qin, Xiang; Muzny, Donna; Raghuraman, Sriram; Ashki, Haleh; Uzman, Akif; Highlander, Sarah K.; Fryszczyn, Bartlomiej G.; Fox, George E.; Tirumalai, Madhan R.; Liu, Yamei; Kim, Sun

    2015-01-01

    Tolypothrix sp. PCC 7601 is a freshwater filamentous cyanobacterium with complex responses to environmental conditions. Here, we present its 9.96-Mbp draft genome sequence, containing 10,065 putative protein-coding sequences, including 305 predicted two-component system proteins and 27 putative phytochrome-class photoreceptors, the most such proteins in any sequenced genome. PMID:25953173

  16. Neutrality and evolvability of designed protein sequences

    NASA Astrophysics Data System (ADS)

    Bhattacherjee, Arnab; Biswas, Parbati

    2010-07-01

    The effect of foldability on protein’s evolvability is analyzed by a two-prong approach consisting of a self-consistent mean-field theory and Monte Carlo simulations. Theory and simulation models representing protein sequences with binary patterning of amino acid residues compatible with a particular foldability criteria are used. This generalized foldability criterion is derived using the high temperature cumulant expansion approximating the free energy of folding. The effect of cumulative point mutations on these designed proteins is studied under neutral condition. The robustness, protein’s ability to tolerate random point mutations is determined with a selective pressure of stability (ΔΔG) for the theory designed sequences, which are found to be more robust than that of Monte Carlo and mean-field-biased Monte Carlo generated sequences. The results show that this foldability criterion selects viable protein sequences more effectively compared to the Monte Carlo method, which has a marked effect on how the selective pressure shapes the evolutionary sequence space. These observations may impact de novo sequence design and its applications in protein engineering.

  17. RStrucFam: a web server to associate structure and cognate RNA for RNA-binding proteins from sequence information.

    PubMed

    Ghosh, Pritha; Mathew, Oommen K; Sowdhamini, Ramanathan

    2016-10-07

    RNA-binding proteins (RBPs) interact with their cognate RNA(s) to form large biomolecular assemblies. They are versatile in their functionality and are involved in a myriad of processes inside the cell. RBPs with similar structural features and common biological functions are grouped together into families and superfamilies. It will be useful to obtain an early understanding and association of RNA-binding property of sequences of gene products. Here, we report a web server, RStrucFam, to predict the structure, type of cognate RNA(s) and function(s) of proteins, where possible, from mere sequence information. The web server employs Hidden Markov Model scan (hmmscan) to enable association to a back-end database of structural and sequence families. The database (HMMRBP) comprises of 437 HMMs of RBP families of known structure that have been generated using structure-based sequence alignments and 746 sequence-centric RBP family HMMs. The input protein sequence is associated with structural or sequence domain families, if structure or sequence signatures exist. In case of association of the protein with a family of known structures, output features like, multiple structure-based sequence alignment (MSSA) of the query with all others members of that family is provided. Further, cognate RNA partner(s) for that protein, Gene Ontology (GO) annotations, if any and a homology model of the protein can be obtained. The users can also browse through the database for details pertaining to each family, protein or RNA and their related information based on keyword search or RNA motif search. RStrucFam is a web server that exploits structurally conserved features of RBPs, derived from known family members and imprinted in mathematical profiles, to predict putative RBPs from sequence information. Proteins that fail to associate with such structure-centric families are further queried against the sequence-centric RBP family HMMs in the HMMRBP database. Further, all other essential information pertaining to an RBP, like overall function annotations, are provided. The web server can be accessed at the following link: http://caps.ncbs.res.in/rstrucfam .

  18. Embedding the Ni-SOD mimetic Ni-NCC within a polypeptide sequence alters the specificity of the reaction pathway.

    PubMed

    Krause, Mary E; Glass, Amanda M; Jackson, Timothy A; Laurence, Jennifer S

    2013-01-07

    The unique metal abstracting peptide asparagine-cysteine-cysteine (NCC) binds nickel in a square planar 2N:2S geometry and acts as a mimic of the enzyme nickel superoxide dismutase (Ni-SOD). The Ni-NCC tripeptide complex undergoes rapid, site-specific chiral inversion to dld-NCC in the presence of oxygen. Superoxide scavenging activity increases proportionally with the degree of chiral inversion. Characterization of the NCC sequence within longer peptides with absorption, circular dichroism (CD), and magnetic CD (MCD) spectroscopies and mass spectrometry (MS) shows that the geometry of metal coordination is maintained, though the electronic properties of the complex are varied to a small extent because of bis-amide, rather than amine/amide, coordination. In addition, both Ni-tripeptide and Ni-pentapeptide complexes have charges of -2. This study demonstrates that the chiral inversion chemistry does not occur when NCC is embedded in a longer polypeptide sequence. Nonetheless, the superoxide scavenging reactivity of the embedded Ni-NCC module is similar to that of the chirally inverted tripeptide complex, which is consistent with a minor change in the reduction potential for the Ni-pentapeptide complex. Together, this suggests that the charge of the complex could affect the SOD activity as much as a change in the primary coordination sphere. In Ni-NCC and other Ni-SOD mimics, changes in chirality, superoxide scavenging activity, and oxidation of the peptide itself all depend on the presence of dioxygen or its reduced derivatives (e.g., superoxide), and the extent to which each of these distinct reactions occurs is ruled by electronic and steric effects that emenate from the organization of ligands around the metal center.

  19. On the relationship between residue structural environment and sequence conservation in proteins.

    PubMed

    Liu, Jen-Wei; Lin, Jau-Ji; Cheng, Chih-Wen; Lin, Yu-Feng; Hwang, Jenn-Kang; Huang, Tsun-Tsao

    2017-09-01

    Residues that are crucial to protein function or structure are usually evolutionarily conserved. To identify the important residues in protein, sequence conservation is estimated, and current methods rely upon the unbiased collection of homologous sequences. Surprisingly, our previous studies have shown that the sequence conservation is closely correlated with the weighted contact number (WCN), a measure of packing density for residue's structural environment, calculated only based on the C α positions of a protein structure. Moreover, studies have shown that sequence conservation is correlated with environment-related structural properties calculated based on different protein substructures, such as a protein's all atoms, backbone atoms, side-chain atoms, or side-chain centroid. To know whether the C α atomic positions are adequate to show the relationship between residue environment and sequence conservation or not, here we compared C α atoms with other substructures in their contributions to the sequence conservation. Our results show that C α positions are substantially equivalent to the other substructures in calculations of various measures of residue environment. As a result, the overlapping contributions between C α atoms and the other substructures are high, yielding similar structure-conservation relationship. Take the WCN as an example, the average overlapping contribution to sequence conservation is 87% between C α and all-atom substructures. These results indicate that only C α atoms of a protein structure could reflect sequence conservation at the residue level. © 2017 Wiley Periodicals, Inc.

  20. Correlation of fitness landscapes from three orthologous TIM barrels originates from sequence and structure constraints

    PubMed Central

    Chan, Yvonne H.; Venev, Sergey V.; Zeldovich, Konstantin B.; Matthews, C. Robert

    2017-01-01

    Sequence divergence of orthologous proteins enables adaptation to environmental stresses and promotes evolution of novel functions. Limits on evolution imposed by constraints on sequence and structure were explored using a model TIM barrel protein, indole-3-glycerol phosphate synthase (IGPS). Fitness effects of point mutations in three phylogenetically divergent IGPS proteins during adaptation to temperature stress were probed by auxotrophic complementation of yeast with prokaryotic, thermophilic IGPS. Analysis of beneficial mutations pointed to an unexpected, long-range allosteric pathway towards the active site of the protein. Significant correlations between the fitness landscapes of distant orthologues implicate both sequence and structure as primary forces in defining the TIM barrel fitness landscape and suggest that fitness landscapes can be translocated in sequence space. Exploration of fitness landscapes in the context of a protein fold provides a strategy for elucidating the sequence-structure-fitness relationships in other common motifs. PMID:28262665

  1. Protein Sequencing with Tandem Mass Spectrometry

    NASA Astrophysics Data System (ADS)

    Ziady, Assem G.; Kinter, Michael

    The recent introduction of electrospray ionization techniques that are suitable for peptides and whole proteins has allowed for the design of mass spectrometric protocols that provide accurate sequence information for proteins. The advantages gained by these approaches over traditional Edman Degradation sequencing include faster analysis and femtomole, sometimes attomole, sensitivity. The ability to efficiently identify proteins has allowed investigators to conduct studies on their differential expression or modification in response to various treatments or disease states. In this chapter, we discuss the use of electrospray tandem mass spectrometry, a technique whereby protein-derived peptides are subjected to fragmentation in the gas phase, revealing sequence information for the protein. This powerful technique has been instrumental for the study of proteins and markers associated with various disorders, including heart disease, cancer, and cystic fibrosis. We use the study of protein expression in cystic fibrosis as an example.

  2. A plasma membrane sucrose-binding protein that mediates sucrose uptake shares structural and sequence similarity with seed storage proteins but remains functionally distinct.

    PubMed

    Overvoorde, P J; Chao, W S; Grimes, H D

    1997-06-20

    Photoaffinity labeling of a soybean cotyledon membrane fraction identified a sucrose-binding protein (SBP). Subsequent studies have shown that the SBP is a unique plasma membrane protein that mediates the linear uptake of sucrose in the presence of up to 30 mM external sucrose when ectopically expressed in yeast. Analysis of the SBP-deduced amino acid sequence indicates it lacks sequence similarity with other known transport proteins. Data presented here, however, indicate that the SBP shares significant sequence and structural homology with the vicilin-like seed storage proteins that organize into homotrimers. These similarities include a repeated sequence that forms the basis of the reiterated domain structure characteristic of the vicilin-like protein family. In addition, analytical ultracentrifugation and nonreducing SDS-polyacrylamide gel electrophoresis demonstrate that the SBP appears to be organized into oligomeric complexes with a Mr indicative of the existence of SBP homotrimers and homodimers. The structural similarity shared by the SBP and vicilin-like proteins provides a novel framework to explore the mechanistic basis of SBP-mediated sucrose uptake. Expression of the maize Glb protein (a vicilin-like protein closely related to the SBP) in yeast demonstrates that a closely related vicilin-like protein is unable to mediate sucrose uptake. Thus, despite sequence and structural similarities shared by the SBP and the vicilin-like protein family, the SBP is functionally divergent from other members of this group.

  3. Metagenome assembly through clustering of next-generation sequencing data using protein sequences.

    PubMed

    Sim, Mikang; Kim, Jaebum

    2015-02-01

    The study of environmental microbial communities, called metagenomics, has gained a lot of attention because of the recent advances in next-generation sequencing (NGS) technologies. Microbes play a critical role in changing their environments, and the mode of their effect can be solved by investigating metagenomes. However, the difficulty of metagenomes, such as the combination of multiple microbes and different species abundance, makes metagenome assembly tasks more challenging. In this paper, we developed a new metagenome assembly method by utilizing protein sequences, in addition to the NGS read sequences. Our method (i) builds read clusters by using mapping information against available protein sequences, and (ii) creates contig sequences by finding consensus sequences through probabilistic choices from the read clusters. By using simulated NGS read sequences from real microbial genome sequences, we evaluated our method in comparison with four existing assembly programs. We found that our method could generate relatively long and accurate metagenome assemblies, indicating that the idea of using protein sequences, as a guide for the assembly, is promising. Copyright © 2015 Elsevier B.V. All rights reserved.

  4. Strong-field tidal distortions of rotating black holes. III. Embeddings in hyperbolic three-space

    NASA Astrophysics Data System (ADS)

    Penna, Robert F.; Hughes, Scott A.; O'Sullivan, Stephen

    2017-09-01

    In previous work, we developed tools for quantifying the tidal distortion of a black hole's event horizon due to an orbiting companion. These tools use techniques which require large mass ratios (companion mass μ much smaller than black hole mass M ), but can be used for arbitrary bound orbits and for any black hole spin. We also showed how to visualize these distorted black holes by embedding their horizons in a global Euclidean three-space, E3. Such visualizations illustrate interesting and important information about horizon dynamics. Unfortunately, we could not visualize black holes with spin parameter a*>√{3 }/2 ≈0.866 : such holes cannot be globally embedded into E3. In this paper, we overcome this difficulty by showing how to embed the horizons of tidally distorted Kerr black holes in a hyperbolic three-space, H3. We use black hole perturbation theory to compute the Gaussian curvatures of tidally distorted event horizons, from which we build a two-dimensional metric of their distorted horizons. We develop a numerical method for embedding the tidally distorted horizons in H3. As an application, we give a sequence of embeddings into H3 of a tidally interacting black hole with spin a*=0.9999 . A small-amplitude, high-frequency oscillation seen in previous work shows up particularly clearly in these embeddings.

  5. Helicase-dependent amplification of nucleic acids.

    PubMed

    Cao, Yun; Kim, Hyun-Jin; Li, Ying; Kong, Huimin; Lemieux, Bertrand

    2013-10-11

    Helicase-dependent amplification (HDA) is a novel method for the isothermal in vitro amplification of nucleic acids. The HDA reaction selectively amplifies a target sequence by extension of two oligonucleotide primers. Unlike the polymerase chain reaction (PCR), HDA uses a helicase enzyme to separate the deoxyribonucleic acid (DNA) strands, rather than heat denaturation. This allows DNA amplification without the need for thermal cycling. The helicase used in HDA is a helicase super family II protein obtained from a thermophilic organism, Thermoanaerobacter tengcongensis (TteUvrD). This thermostable helicase is capable of unwinding blunt-end nucleic acid substrates at elevated temperatures (60° to 65°C). The HDA reaction can also be coupled with reverse transcription for ribonucleic acid (RNA) amplification. The products of this reaction can be detected during the reaction using fluorescent probes when incubations are conducted in a fluorimeter. Alternatively, products can be detected after amplification using a disposable amplicon containment device that contains an embedded lateral flow strip. Copyright © 2013 John Wiley & Sons, Inc.

  6. A Score of the Ability of a Three-Dimensional Protein Model to Retrieve Its Own Sequence as a Quantitative Measure of Its Quality and Appropriateness

    PubMed Central

    Martínez-Castilla, León P.; Rodríguez-Sotres, Rogelio

    2010-01-01

    Background Despite the remarkable progress of bioinformatics, how the primary structure of a protein leads to a three-dimensional fold, and in turn determines its function remains an elusive question. Alignments of sequences with known function can be used to identify proteins with the same or similar function with high success. However, identification of function-related and structure-related amino acid positions is only possible after a detailed study of every protein. Folding pattern diversity seems to be much narrower than sequence diversity, and the amino acid sequences of natural proteins have evolved under a selective pressure comprising structural and functional requirements acting in parallel. Principal Findings The approach described in this work begins by generating a large number of amino acid sequences using ROSETTA [Dantas G et al. (2003) J Mol Biol 332:449–460], a program with notable robustness in the assignment of amino acids to a known three-dimensional structure. The resulting sequence-sets showed no conservation of amino acids at active sites, or protein-protein interfaces. Hidden Markov models built from the resulting sequence sets were used to search sequence databases. Surprisingly, the models retrieved from the database sequences belonged to proteins with the same or a very similar function. Given an appropriate cutoff, the rate of false positives was zero. According to our results, this protocol, here referred to as Rd.HMM, detects fine structural details on the folding patterns, that seem to be tightly linked to the fitness of a structural framework for a specific biological function. Conclusion Because the sequence of the native protein used to create the Rd.HMM model was always amongst the top hits, the procedure is a reliable tool to score, very accurately, the quality and appropriateness of computer-modeled 3D-structures, without the need for spectroscopy data. However, Rd.HMM is very sensitive to the conformational features of the models' backbone. PMID:20830209

  7. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins

    PubMed Central

    Pruitt, Kim D.; Tatusova, Tatiana; Maglott, Donna R.

    2005-01-01

    The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes sequence data that are currently publicly available in the archival databases. The database incorporates data from over 2400 organisms and includes over one million proteins representing significant taxonomic diversity spanning prokaryotes, eukaryotes and viruses. Nucleotide and protein sequences are explicitly linked, and the sequences are linked to other resources including the NCBI Map Viewer and Gene. Sequences are annotated to include coding regions, conserved domains, variation, references, names, database cross-references, and other features using a combined approach of collaboration and other input from the scientific community, automated annotation, propagation from GenBank and curation by NCBI staff. PMID:15608248

  8. Coarse-grained sequences for protein folding and design.

    PubMed

    Brown, Scott; Fawzi, Nicolas J; Head-Gordon, Teresa

    2003-09-16

    We present the results of sequence design on our off-lattice minimalist model in which no specification of native-state tertiary contacts is needed. We start with a sequence that adopts a target topology and build on it through sequence mutation to produce new sequences that comprise distinct members within a target fold class. In this work, we use the alpha/beta ubiquitin fold class and design two new sequences that, when characterized through folding simulations, reproduce the differences in folding mechanism seen experimentally for proteins L and G. The primary implication of this work is that patterning of hydrophobic and hydrophilic residues is the physical origin for the success of relative contact-order descriptions of folding, and that these physics-based potentials provide a predictive connection between free energy landscapes and amino acid sequence (the original protein folding problem). We present results of the sequence mapping from a 20- to the three-letter code for determining a sequence that folds into the WW domain topology to illustrate future extensions to protein design.

  9. Coarse-grained sequences for protein folding and design

    PubMed Central

    Brown, Scott; Fawzi, Nicolas J.; Head-Gordon, Teresa

    2003-01-01

    We present the results of sequence design on our off-lattice minimalist model in which no specification of native-state tertiary contacts is needed. We start with a sequence that adopts a target topology and build on it through sequence mutation to produce new sequences that comprise distinct members within a target fold class. In this work, we use the α/β ubiquitin fold class and design two new sequences that, when characterized through folding simulations, reproduce the differences in folding mechanism seen experimentally for proteins L and G. The primary implication of this work is that patterning of hydrophobic and hydrophilic residues is the physical origin for the success of relative contact-order descriptions of folding, and that these physics-based potentials provide a predictive connection between free energy landscapes and amino acid sequence (the original protein folding problem). We present results of the sequence mapping from a 20- to the three-letter code for determining a sequence that folds into the WW domain topology to illustrate future extensions to protein design. PMID:12963815

  10. Characterization of DNA-protein interactions using high-throughput sequencing data from pulldown experiments

    NASA Astrophysics Data System (ADS)

    Moreland, Blythe; Oman, Kenji; Curfman, John; Yan, Pearlly; Bundschuh, Ralf

    Methyl-binding domain (MBD) protein pulldown experiments have been a valuable tool in measuring the levels of methylated CpG dinucleotides. Due to the frequent use of this technique, high-throughput sequencing data sets are available that allow a detailed quantitative characterization of the underlying interaction between methylated DNA and MBD proteins. Analyzing such data sets, we first found that two such proteins cannot bind closer to each other than 2 bp, consistent with structural models of the DNA-protein interaction. Second, the large amount of sequencing data allowed us to find rather weak but nevertheless clearly statistically significant sequence preferences for several bases around the required CpG. These results demonstrate that pulldown sequencing is a high-precision tool in characterizing DNA-protein interactions. This material is based upon work supported by the National Science Foundation under Grant No. DMR-1410172.

  11. UET: a database of evolutionarily-predicted functional determinants of protein sequences that cluster as functional sites in protein structures.

    PubMed

    Lua, Rhonald C; Wilson, Stephen J; Konecki, Daniel M; Wilkins, Angela D; Venner, Eric; Morgan, Daniel H; Lichtarge, Olivier

    2016-01-04

    The structure and function of proteins underlie most aspects of biology and their mutational perturbations often cause disease. To identify the molecular determinants of function as well as targets for drugs, it is central to characterize the important residues and how they cluster to form functional sites. The Evolutionary Trace (ET) achieves this by ranking the functional and structural importance of the protein sequence positions. ET uses evolutionary distances to estimate functional distances and correlates genotype variations with those in the fitness phenotype. Thus, ET ranks are worse for sequence positions that vary among evolutionarily closer homologs but better for positions that vary mostly among distant homologs. This approach identifies functional determinants, predicts function, guides the mutational redesign of functional and allosteric specificity, and interprets the action of coding sequence variations in proteins, people and populations. Now, the UET database offers pre-computed ET analyses for the protein structure databank, and on-the-fly analysis of any protein sequence. A web interface retrieves ET rankings of sequence positions and maps results to a structure to identify functionally important regions. This UET database integrates several ways of viewing the results on the protein sequence or structure and can be found at http://mammoth.bcm.tmc.edu/uet/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  12. A Fast Alignment-Free Approach for De Novo Detection of Protein Conserved Regions

    PubMed Central

    Abnousi, Armen; Broschat, Shira L.; Kalyanaraman, Ananth

    2016-01-01

    Background Identifying conserved regions in protein sequences is a fundamental operation, occurring in numerous sequence-driven analysis pipelines. It is used as a way to decode domain-rich regions within proteins, to compute protein clusters, to annotate sequence function, and to compute evolutionary relationships among protein sequences. A number of approaches exist for identifying and characterizing protein families based on their domains, and because domains represent conserved portions of a protein sequence, the primary computation involved in protein family characterization is identification of such conserved regions. However, identifying conserved regions from large collections (millions) of protein sequences presents significant challenges. Methods In this paper we present a new, alignment-free method for detecting conserved regions in protein sequences called NADDA (No-Alignment Domain Detection Algorithm). Our method exploits the abundance of exact matching short subsequences (k-mers) to quickly detect conserved regions, and the power of machine learning is used to improve the prediction accuracy of detection. We present a parallel implementation of NADDA using the MapReduce framework and show that our method is highly scalable. Results We have compared NADDA with Pfam and InterPro databases. For known domains annotated by Pfam, accuracy is 83%, sensitivity 96%, and specificity 44%. For sequences with new domains not present in the training set an average accuracy of 63% is achieved when compared to Pfam. A boost in results in comparison with InterPro demonstrates the ability of NADDA to capture conserved regions beyond those present in Pfam. We have also compared NADDA with ADDA and MKDOM2, assuming Pfam as ground-truth. On average NADDA shows comparable accuracy, more balanced sensitivity and specificity, and being alignment-free, is significantly faster. Excluding the one-time cost of training, runtimes on a single processor were 49s, 10,566s, and 456s for NADDA, ADDA, and MKDOM2, respectively, for a data set comprised of approximately 2500 sequences. PMID:27552220

  13. The cDNA sequence of mouse Pgp-1 and homology to human CD44 cell surface antigen and proteoglycan core/link proteins.

    PubMed

    Wolffe, E J; Gause, W C; Pelfrey, C M; Holland, S M; Steinberg, A D; August, J T

    1990-01-05

    We describe the isolation and sequencing of a cDNA encoding mouse Pgp-1. An oligonucleotide probe corresponding to the NH2-terminal sequence of the purified protein was synthesized by the polymerase chain reaction and used to screen a mouse macrophage lambda gt11 library. A cDNA clone with an insert of 1.2 kilobases was selected and sequenced. In Northern blot analysis, only cells expressing Pgp-1 contained mRNA species that hybridized with this Pgp-1 cDNA. The nucleotide sequence of the cDNA has a single open reading frame that yields a protein-coding sequence of 1076 base pairs followed by a 132-base pair 3'-untranslated sequence that includes a putative polyadenylation signal but no poly(A) tail. The translated sequence comprises a 13-amino acid signal peptide followed by a polypeptide core of 345 residues corresponding to an Mr of 37,800. Portions of the deduced amino acid sequence were identical to those obtained by amino acid sequence analysis from the purified glycoprotein, confirming that the cDNA encodes Pgp-1. The predicted structure of Pgp-1 includes an NH2-terminal extracellular domain (residues 14-265), a transmembrane domain (residues 266-286), and a cytoplasmic tail (residues 287-358). Portions of the mouse Pgp-1 sequence are highly similar to that of the human CD44 cell surface glycoprotein implicated in cell adhesion. The protein also shows sequence similarity to the proteoglycan tandem repeat sequences found in cartilage link protein and cartilage proteoglycan core protein which are thought to be involved in binding to hyaluronic acid.

  14. DWARF – a data warehouse system for analyzing protein families

    PubMed Central

    Fischer, Markus; Thai, Quan K; Grieb, Melanie; Pleiss, Jürgen

    2006-01-01

    Background The emerging field of integrative bioinformatics provides the tools to organize and systematically analyze vast amounts of highly diverse biological data and thus allows to gain a novel understanding of complex biological systems. The data warehouse DWARF applies integrative bioinformatics approaches to the analysis of large protein families. Description The data warehouse system DWARF integrates data on sequence, structure, and functional annotation for protein fold families. The underlying relational data model consists of three major sections representing entities related to the protein (biochemical function, source organism, classification to homologous families and superfamilies), the protein sequence (position-specific annotation, mutant information), and the protein structure (secondary structure information, superimposed tertiary structure). Tools for extracting, transforming and loading data from public available resources (ExPDB, GenBank, DSSP) are provided to populate the database. The data can be accessed by an interface for searching and browsing, and by analysis tools that operate on annotation, sequence, or structure. We applied DWARF to the family of α/β-hydrolases to host the Lipase Engineering database. Release 2.3 contains 6138 sequences and 167 experimentally determined protein structures, which are assigned to 37 superfamilies 103 homologous families. Conclusion DWARF has been designed for constructing databases of large structurally related protein families and for evaluating their sequence-structure-function relationships by a systematic analysis of sequence, structure and functional annotation. It has been applied to predict biochemical properties from sequence, and serves as a valuable tool for protein engineering. PMID:17094801

  15. Indel PDB: a database of structural insertions and deletions derived from sequence alignments of closely related proteins.

    PubMed

    Hsing, Michael; Cherkasov, Artem

    2008-06-25

    Insertions and deletions (indels) represent a common type of sequence variations, which are less studied and pose many important biological questions. Recent research has shown that the presence of sizable indels in protein sequences may be indicative of protein essentiality and their role in protein interaction networks. Examples of utilization of indels for structure-based drug design have also been recently demonstrated. Nonetheless many structural and functional characteristics of indels remain less researched or unknown. We have created a web-based resource, Indel PDB, representing a structural database of insertions/deletions identified from the sequence alignments of highly similar proteins found in the Protein Data Bank (PDB). Indel PDB utilized large amounts of available structural information to characterize 1-, 2- and 3-dimensional features of indel sites. Indel PDB contains 117,266 non-redundant indel sites extracted from 11,294 indel-containing proteins. Unlike loop databases, Indel PDB features more indel sequences with secondary structures including alpha-helices and beta-sheets in addition to loops. The insertion fragments have been characterized by their sequences, lengths, locations, secondary structure composition, solvent accessibility, protein domain association and three dimensional structures. By utilizing the data available in Indel PDB, we have studied and presented here several sequence and structural features of indels. We anticipate that Indel PDB will not only enable future functional studies of indels, but will also assist protein modeling efforts and identification of indel-directed drug binding sites.

  16. Sequence Determines Degree of Knottedness in a Coarse-Grained Protein Model

    NASA Astrophysics Data System (ADS)

    Wüst, Thomas; Reith, Daniel; Virnau, Peter

    2015-01-01

    Knots are abundant in globular homopolymers but rare in globular proteins. To shed new light on this long-standing conundrum, we study the influence of sequence on the formation of knots in proteins under native conditions within the framework of the hydrophobic-polar lattice protein model. By employing large-scale Wang-Landau simulations combined with suitable Monte Carlo trial moves we show that even though knots are still abundant on average, sequence introduces large variability in the degree of self-entanglements. Moreover, we are able to design sequences which are either almost always or almost never knotted. Our findings serve as proof of concept that the introduction of just one additional degree of freedom per monomer (in our case sequence) facilitates evolution towards a protein universe in which knots are rare.

  17. A Generative Angular Model of Protein Structure Evolution

    PubMed Central

    Golden, Michael; García-Portugués, Eduardo; Sørensen, Michael; Mardia, Kanti V.; Hamelryck, Thomas; Hein, Jotun

    2017-01-01

    Abstract Recently described stochastic models of protein evolution have demonstrated that the inclusion of structural information in addition to amino acid sequences leads to a more reliable estimation of evolutionary parameters. We present a generative, evolutionary model of protein structure and sequence that is valid on a local length scale. The model concerns the local dependencies between sequence and structure evolution in a pair of homologous proteins. The evolutionary trajectory between the two structures in the protein pair is treated as a random walk in dihedral angle space, which is modeled using a novel angular diffusion process on the two-dimensional torus. Coupling sequence and structure evolution in our model allows for modeling both “smooth” conformational changes and “catastrophic” conformational jumps, conditioned on the amino acid changes. The model has interpretable parameters and is comparatively more realistic than previous stochastic models, providing new insights into the relationship between sequence and structure evolution. For example, using the trained model we were able to identify an apparent sequence–structure evolutionary motif present in a large number of homologous protein pairs. The generative nature of our model enables us to evaluate its validity and its ability to simulate aspects of protein evolution conditioned on an amino acid sequence, a related amino acid sequence, a related structure or any combination thereof. PMID:28453724

  18. PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification.

    PubMed

    Thomas, Paul D; Kejariwal, Anish; Campbell, Michael J; Mi, Huaiyu; Diemer, Karen; Guo, Nan; Ladunga, Istvan; Ulitsky-Lazareva, Betty; Muruganujan, Anushya; Rabkin, Steven; Vandergriff, Jody A; Doremieux, Olivier

    2003-01-01

    The PANTHER database was designed for high-throughput analysis of protein sequences. One of the key features is a simplified ontology of protein function, which allows browsing of the database by biological functions. Biologist curators have associated the ontology terms with groups of protein sequences rather than individual sequences. Statistical models (Hidden Markov Models, or HMMs) are built from each of these groups. The advantage of this approach is that new sequences can be automatically classified as they become available. To ensure accurate functional classification, HMMs are constructed not only for families, but also for functionally distinct subfamilies. Multiple sequence alignments and phylogenetic trees, including curator-assigned information, are available for each family. The current version of the PANTHER database includes training sequences from all organisms in the GenBank non-redundant protein database, and the HMMs have been used to classify gene products across the entire genomes of human, and Drosophila melanogaster. The ontology terms and protein families and subfamilies, as well as Drosophila gene c;assifications, can be browsed and searched for free. Due to outstanding contractual obligations, access to human gene classifications and to protein family trees and multiple sequence alignments will temporarily require a nominal registration fee. PANTHER is publicly available on the web at http://panther.celera.com.

  19. The Protein Information Resource: an integrated public resource of functional annotation of proteins

    PubMed Central

    Wu, Cathy H.; Huang, Hongzhan; Arminski, Leslie; Castro-Alvear, Jorge; Chen, Yongxing; Hu, Zhang-Zhi; Ledley, Robert S.; Lewis, Kali C.; Mewes, Hans-Werner; Orcutt, Bruce C.; Suzek, Baris E.; Tsugita, Akira; Vinayaka, C. R.; Yeh, Lai-Su L.; Zhang, Jian; Barker, Winona C.

    2002-01-01

    The Protein Information Resource (PIR) serves as an integrated public resource of functional annotation of protein data to support genomic/proteomic research and scientific discovery. The PIR, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the PIR-International Protein Sequence Database (PSD), the major annotated protein sequence database in the public domain, containing about 250 000 proteins. To improve protein annotation and the coverage of experimentally validated data, a bibliography submission system is developed for scientists to submit, categorize and retrieve literature information. Comprehensive protein information is available from iProClass, which includes family classification at the superfamily, domain and motif levels, structural and functional features of proteins, as well as cross-references to over 40 biological databases. To provide timely and comprehensive protein data with source attribution, we have introduced a non-redundant reference protein database, PIR-NREF. The database consists of about 800 000 proteins collected from PIR-PSD, SWISS-PROT, TrEMBL, GenPept, RefSeq and PDB, with composite protein names and literature data. To promote database interoperability, we provide XML data distribution and open database schema, and adopt common ontologies. The PIR web site (http://pir.georgetown.edu/) features data mining and sequence analysis tools for information retrieval and functional identification of proteins based on both sequence and annotation information. The PIR databases and other files are also available by FTP (ftp://nbrfa.georgetown.edu/pir_databases). PMID:11752247

  20. The writer independent online handwriting recognition system frog on hand and cluster generative statistical dynamic time warping.

    PubMed

    Bahlmann, Claus; Burkhardt, Hans

    2004-03-01

    In this paper, we give a comprehensive description of our writer-independent online handwriting recognition system frog on hand. The focus of this work concerns the presentation of the classification/training approach, which we call cluster generative statistical dynamic time warping (CSDTW). CSDTW is a general, scalable, HMM-based method for variable-sized, sequential data that holistically combines cluster analysis and statistical sequence modeling. It can handle general classification problems that rely on this sequential type of data, e.g., speech recognition, genome processing, robotics, etc. Contrary to previous attempts, clustering and statistical sequence modeling are embedded in a single feature space and use a closely related distance measure. We show character recognition experiments of frog on hand using CSDTW on the UNIPEN online handwriting database. The recognition accuracy is significantly higher than reported results of other handwriting recognition systems. Finally, we describe the real-time implementation of frog on hand on a Linux Compaq iPAQ embedded device.

  1. Using infrared HOG-based pedestrian detection for outdoor autonomous searching UAV with embedded system

    NASA Astrophysics Data System (ADS)

    Shao, Yanhua; Mei, Yanying; Chu, Hongyu; Chang, Zhiyuan; He, Yuxuan; Zhan, Huayi

    2018-04-01

    Pedestrian detection (PD) is an important application domain in computer vision and pattern recognition. Unmanned Aerial Vehicles (UAVs) have become a major field of research in recent years. In this paper, an algorithm for a robust pedestrian detection method based on the combination of the infrared HOG (IR-HOG) feature and SVM is proposed for highly complex outdoor scenarios on the basis of airborne IR image sequences from UAV. The basic flow of our application operation is as follows. Firstly, the thermal infrared imager (TAU2-336), which was installed on our Outdoor Autonomous Searching (OAS) UAV, is used for taking pictures of the designated outdoor area. Secondly, image sequences collecting and processing were accomplished by using high-performance embedded system with Samsung ODROID-XU4 and Ubuntu as the core and operating system respectively, and IR-HOG features were extracted. Finally, the SVM is used to train the pedestrian classifier. Experiment show that, our method shows promising results under complex conditions including strong noise corruption, partial occlusion etc.

  2. Analysis of secreted proteins from Aspergillus flavus.

    PubMed

    Medina, Martha L; Haynes, Paul A; Breci, Linda; Francisco, Wilson A

    2005-08-01

    MS/MS techniques in proteomics make possible the identification of proteins from organisms with little or no genome sequence information available. Peptide sequences are obtained from tandem mass spectra by matching peptide mass and fragmentation information to protein sequence information from related organisms, including unannotated genome sequence data. This peptide identification data can then be grouped and reconstructed into protein data. In this study, we have used this approach to study protein secretion by Aspergillus flavus, a filamentous fungus for which very little genome sequence information is available. A. flavus is capable of degrading the flavonoid rutin (quercetin 3-O-glycoside), as the only source of carbon via an extracellular enzyme system. In this continuing study, a proteomic analysis was used to identify secreted proteins from A. flavus when grown on rutin. The growth media glucose and potato dextrose were used to identify differentially expressed secreted proteins. The secreted proteins were analyzed by 1- and 2-DE and MS/MS. A total of 51 unique A. flavus secreted proteins were identified from the three growth conditions. Ten proteins were unique to rutin-, five to glucose- and one to potato dextrose-grown A. flavus. Sixteen secreted proteins were common to all three media. Fourteen identifications were of hypothetical proteins or proteins of unknown functions. To our knowledge, this is the first extensive proteomic study conducted to identify the secreted proteins from a filamentous fungus.

  3. High-throughput analysis of the protein sequence-stability landscape using a quantitative "yeast surface two-hybrid" system and fragment reconstitution

    PubMed Central

    Dutta, Sanjib; Koide, Akiko; Koide, Shohei

    2008-01-01

    Stability evaluation of many mutants can lead to a better understanding of the sequence determinants of a structural motif and of factors governing protein stability and protein evolution. The traditional biophysical analysis of protein stability is low throughput, limiting our ability to widely explore the sequence space in a quantitative manner. In this study, we have developed a high-throughput library screening method for quantifying stability changes, which is based on protein fragment reconstitution and yeast surface display. Our method exploits the thermodynamic linkage between protein stability and fragment reconstitution and the ability of the yeast surface display technique to quantitatively evaluate protein-protein interactions. The method was applied to a fibronectin type III (FN3) domain. Characterization of fragment reconstitution was facilitated by the co-expression of two FN3 fragments, thus establishing a "yeast surface two-hybrid" method. Importantly, our method does not rely on competition between clones and thus eliminates a common limitation of high-throughput selection methods in which the most stable variants are predominantly recovered. Thus, it allows for the isolation of sequences that exhibits a desired level of stability. We identified over one hundred unique sequences for a β-bulge motif, which was significantly more informative than natural sequences of the FN3 family in revealing the sequence determinants for the β-bulge. Our method provides a powerful means to rapidly assess stability of many variants, to systematically assess contribution of different factors to protein stability and to enhance protein stability. PMID:18674545

  4. Extreme Sensory Complexity Encoded in the 10-Megabase Draft Genome Sequence of the Chromatically Acclimating Cyanobacterium Tolypothrix sp. PCC 7601.

    PubMed

    Yerrapragada, Shaila; Shukla, Animesh; Hallsworth-Pepin, Kymberlie; Choi, Kwangmin; Wollam, Aye; Clifton, Sandra; Qin, Xiang; Muzny, Donna; Raghuraman, Sriram; Ashki, Haleh; Uzman, Akif; Highlander, Sarah K; Fryszczyn, Bartlomiej G; Fox, George E; Tirumalai, Madhan R; Liu, Yamei; Kim, Sun; Kehoe, David M; Weinstock, George M

    2015-05-07

    Tolypothrix sp. PCC 7601 is a freshwater filamentous cyanobacterium with complex responses to environmental conditions. Here, we present its 9.96-Mbp draft genome sequence, containing 10,065 putative protein-coding sequences, including 305 predicted two-component system proteins and 27 putative phytochrome-class photoreceptors, the most such proteins in any sequenced genome. Copyright © 2015 Yerrapragada et al.

  5. `Reverse Chemical Evolution': A New Method to Search for Thermally Stable Biopolymers

    NASA Astrophysics Data System (ADS)

    Mitsuzawa, Shigenobu; Yukawa, Tetsuyuki

    2003-04-01

    The primitive sea on Earth may have had high-temperature and high-pressure conditions similar to those in present-day hydrothermal environments. If life originated in the hot sea, thermal stability of the constituent molecules would have been necessary. Thus far, however, it has been reported that biopolymers hydrolyze too rapidly to support life at temperatures of more than 200 °C. We herein propose a novel approach, called reverse chemical evolution, to search for biopolymers notably more stable against thermal decomposition than previously reported. The essence of the approach is that hydrolysis of a protein or functional RNA (m-, t-, r-RNA) at high temperature and high pressure simulating the ancient sea environment may yield thermally stable peptides or RNAs at higher concentrations than other peptides or RNAs. An experimental test hydrolyzing bovine ribonuclease A in aqueous solution at 205 °C and 25 MPa yielded three prominently stable molecules weighing 859, 1030 and 695 Da. They are thermally some tens or hundreds times more stable than a polyglycine of comparable mass. Sequence analyses of the 859- and 1030-Da molecules revealed that they are a heptapeptide and its homologue, respectively, elongated by two amino acids at the N-terminal region, originally embedded as residues 112-120 in the protein. They consist mainly of hydrophobic amino acids.

  6. A lipid-binding loop of botulinum neurotoxin serotypes B, DC and G is an essential feature to confer their exquisite potency

    PubMed Central

    Le Blanc, Alexander; Mahrhold, Stefan; Piesker, Janett; Luppa, Peter B.

    2018-01-01

    The exceptional toxicity of botulinum neurotoxins (BoNTs) is mediated by high avidity binding to complex polysialogangliosides and intraluminal segments of synaptic vesicle proteins embedded in the presynaptic membrane. One peculiarity is an exposed hydrophobic loop in the toxin’s cell binding domain HC, which is located between the ganglioside- and protein receptor-binding sites, and that is particularly pronounced in the serotypes BoNT/B, DC, and G sharing synaptotagmin as protein receptor. Here, we provide evidence that this HC loop is a critical component of their tripartite receptor recognition complex. Binding to nanodisc-embedded receptors and toxicity were virtually abolished in BoNT mutants lacking residues at the tip of the HC loop. Surface plasmon resonance experiments revealed that only insertion of the HC loop into the lipid-bilayer compensates for the entropic penalty inflicted by the dual-receptor binding. Our results represent a new paradigm of how BoNT/B, DC, and G employ ternary interactions with a protein, ganglioside, and lipids to mediate their extraordinary neurotoxicity. PMID:29718991

  7. Quantized phase coding and connected region labeling for absolute phase retrieval.

    PubMed

    Chen, Xiangcheng; Wang, Yuwei; Wang, Yajun; Ma, Mengchao; Zeng, Chunnian

    2016-12-12

    This paper proposes an absolute phase retrieval method for complex object measurement based on quantized phase-coding and connected region labeling. A specific code sequence is embedded into quantized phase of three coded fringes. Connected regions of different codes are labeled and assigned with 3-digit-codes combining the current period and its neighbors. Wrapped phase, more than 36 periods, can be restored with reference to the code sequence. Experimental results verify the capability of the proposed method to measure multiple isolated objects.

  8. A family of cellular proteins related to snake venom disintegrins.

    PubMed

    Weskamp, G; Blobel, C P

    1994-03-29

    Disintegrins are short soluble integrin ligands that were initially identified in snake venom. A previously recognized cellular protein with a disintegrin domain was the guinea pig sperm protein PH-30, a protein implicated in sperm-egg membrane binding and fusion. Here we present peptide sequences that are characteristic for several cellular disintegrin-domain proteins. These peptide sequences were deduced from cDNA sequence tags that were generated by polymerase chain reaction from various mouse tissue and a mouse muscle cell line. Northern blot analysis with four sequence tags revealed distinct mRNA expression patterns. Evidently, cellular proteins containing a disintegrin domain define a superfamily of potential integrin ligands that are likely to function in important cell-cell and cell-matrix interactions.

  9. Identification and application of self-binding zipper-like sequences in SARS-CoV spike protein.

    PubMed

    Zhang, Si Min; Liao, Ying; Neo, Tuan Ling; Lu, Yanning; Liu, Ding Xiang; Vahlne, Anders; Tam, James P

    2018-05-22

    Self-binding peptides containing zipper-like sequences, such as the Leu/Ile zipper sequence within the coiled coil regions of proteins and the cross-β spine steric zippers within the amyloid-like fibrils, could bind to the protein-of-origin through homophilic sequence-specific zipper motifs. These self-binding sequences represent opportunities for the development of biochemical tools and/or therapeutics. Here, we report on the identification of a putative self-binding β-zipper-forming peptide within the severe acute respiratory syndrome-associated coronavirus spike (S) protein and its application in viral detection. Peptide array scanning of overlapping peptides covering the entire length of S protein identified 34 putative self-binding peptides of six clusters, five of which contained octapeptide core consensus sequences. The Cluster I consensus octapeptide sequence GINITNFR was predicted by the Eisenberg's 3D profile method to have high amyloid-like fibrillation potential through steric β-zipper formation. Peptide C6 containing the Cluster I consensus sequence was shown to oligomerize and form amyloid-like fibrils. Taking advantage of this, C6 was further applied to detect the S protein expression in vitro by fluorescence staining. Meanwhile, the coiled-coil-forming Leu/Ile heptad repeat sequences within the S protein were under-represented during peptide array scanning, in agreement with that long peptide lengths were required to attain high helix-mediated interaction avidity. The data suggest that short β-zipper-like self-binding peptides within the S protein could be identified through combining the peptide scanning and predictive methods, and could be exploited as biochemical detection reagents for viral infection. Copyright © 2018. Published by Elsevier Ltd.

  10. Deep learning for pharmacovigilance: recurrent neural network architectures for labeling adverse drug reactions in Twitter posts.

    PubMed

    Cocos, Anne; Fiks, Alexander G; Masino, Aaron J

    2017-07-01

    Social media is an important pharmacovigilance data source for adverse drug reaction (ADR) identification. Human review of social media data is infeasible due to data quantity, thus natural language processing techniques are necessary. Social media includes informal vocabulary and irregular grammar, which challenge natural language processing methods. Our objective is to develop a scalable, deep-learning approach that exceeds state-of-the-art ADR detection performance in social media. We developed a recurrent neural network (RNN) model that labels words in an input sequence with ADR membership tags. The only input features are word-embedding vectors, which can be formed through task-independent pretraining or during ADR detection training. Our best-performing RNN model used pretrained word embeddings created from a large, non-domain-specific Twitter dataset. It achieved an approximate match F-measure of 0.755 for ADR identification on the dataset, compared to 0.631 for a baseline lexicon system and 0.65 for the state-of-the-art conditional random field model. Feature analysis indicated that semantic information in pretrained word embeddings boosted sensitivity and, combined with contextual awareness captured in the RNN, precision. Our model required no task-specific feature engineering, suggesting generalizability to additional sequence-labeling tasks. Learning curve analysis showed that our model reached optimal performance with fewer training examples than the other models. ADR detection performance in social media is significantly improved by using a contextually aware model and word embeddings formed from large, unlabeled datasets. The approach reduces manual data-labeling requirements and is scalable to large social media datasets. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  11. ATLASGAL - towards a complete sample of massive star forming clumps

    NASA Astrophysics Data System (ADS)

    Urquhart, J. S.; Moore, T. J. T.; Csengeri, T.; Wyrowski, F.; Schuller, F.; Hoare, M. G.; Lumsden, S. L.; Mottram, J. C.; Thompson, M. A.; Menten, K. M.; Walmsley, C. M.; Bronfman, L.; Pfalzner, S.; König, C.; Wienen, M.

    2014-09-01

    By matching infrared-selected, massive young stellar objects (MYSOs) and compact H II regions in the Red MSX Source survey to massive clumps found in the submillimetre ATLASGAL (APEX Telescope Large Area Survey of the Galaxy) survey, we have identified ˜1000 embedded young massive stars between 280° < ℓ < 350° and 10° < ℓ < 60° with | b | < 1.5°. Combined with an existing sample of radio-selected methanol masers and compact H II regions, the result is a catalogue of ˜1700 massive stars embedded within ˜1300 clumps located across the inner Galaxy, containing three observationally distinct subsamples, methanol-maser, MYSO and H II-region associations, covering the most important tracers of massive star formation, thought to represent key stages of evolution. We find that massive star formation is strongly correlated with the regions of highest column density in spherical, centrally condensed clumps. We find no significant differences between the three samples in clump structure or the relative location of the embedded stars, which suggests that the structure of a clump is set before the onset of star formation, and changes little as the embedded object evolves towards the main sequence. There is a strong linear correlation between clump mass and bolometric luminosity, with the most massive stars forming in the most massive clumps. We find that the MYSO and H II-region subsamples are likely to cover a similar range of evolutionary stages and that the majority are near the end of their main accretion phase. We find few infrared-bright MYSOs associated with the most massive clumps, probably due to very short pre-main-sequence lifetimes in the most luminous sources.

  12. Complete genome sequence of DSM 30083(T), the type strain (U5/41(T)) of Escherichia coli, and a proposal for delineating subspecies in microbial taxonomy.

    PubMed

    Meier-Kolthoff, Jan P; Hahnke, Richard L; Petersen, Jörn; Scheuner, Carmen; Michael, Victoria; Fiebig, Anne; Rohde, Christine; Rohde, Manfred; Fartmann, Berthold; Goodwin, Lynne A; Chertkov, Olga; Reddy, Tbk; Pati, Amrita; Ivanova, Natalia N; Markowitz, Victor; Kyrpides, Nikos C; Woyke, Tanja; Göker, Markus; Klenk, Hans-Peter

    2014-01-01

    Although Escherichia coli is the most widely studied bacterial model organism and often considered to be the model bacterium per se, its type strain was until now forgotten from microbial genomics. As a part of the G enomic E ncyclopedia of B acteria and A rchaea project, we here describe the features of E. coli DSM 30083(T) together with its genome sequence and annotation as well as novel aspects of its phenotype. The 5,038,133 bp containing genome sequence includes 4,762 protein-coding genes and 175 RNA genes as well as a single plasmid. Affiliation of a set of 250 genome-sequenced E. coli strains, Shigella and outgroup strains to the type strain of E. coli was investigated using digital DNA:DNA-hybridization (dDDH) similarities and differences in genomic G+C content. As in the majority of previous studies, results show Shigella spp. embedded within E. coli and in most cases forming a single subgroup of it. Phylogenomic trees also recover the proposed E. coli phylotypes as monophyla with minor exceptions and place DSM 30083(T) in phylotype B2 with E. coli S88 as its closest neighbor. The widely used lab strain K-12 is not only genomically but also physiologically strongly different from the type strain. The phylotypes do not express a uniform level of character divergence as measured using dDDH, however, thus an alternative arrangement is proposed and discussed in the context of bacterial subspecies. Analyses of the genome sequences of a large number of E. coli strains and of strains from > 100 other bacterial genera indicate a value of 79-80% dDDH as the most promising threshold for delineating subspecies, which in turn suggests the presence of five subspecies within E. coli.

  13. Graphene Nanopores for Protein Sequencing.

    PubMed

    Wilson, James; Sloman, Leila; He, Zhiren; Aksimentiev, Aleksei

    2016-07-19

    An inexpensive, reliable method for protein sequencing is essential to unraveling the biological mechanisms governing cellular behavior and disease. Current protein sequencing methods suffer from limitations associated with the size of proteins that can be sequenced, the time, and the cost of the sequencing procedures. Here, we report the results of all-atom molecular dynamics simulations that investigated the feasibility of using graphene nanopores for protein sequencing. We focus our study on the biologically significant phenylalanine-glycine repeat peptides (FG-nups)-parts of the nuclear pore transport machinery. Surprisingly, we found FG-nups to behave similarly to single stranded DNA: the peptides adhere to graphene and exhibit step-wise translocation when subject to a transmembrane bias or a hydrostatic pressure gradient. Reducing the peptide's charge density or increasing the peptide's hydrophobicity was found to decrease the translocation speed. Yet, unidirectional and stepwise translocation driven by a transmembrane bias was observed even when the ratio of charged to hydrophobic amino acids was as low as 1:8. The nanopore transport of the peptides was found to produce stepwise modulations of the nanopore ionic current correlated with the type of amino acids present in the nanopore, suggesting that protein sequencing by measuring ionic current blockades may be possible.

  14. Adhesive Proteins of Stalked and Acorn Barnacles Display Homology with Low Sequence Similarities

    PubMed Central

    Jonker, Jaimie-Leigh; Abram, Florence; Pires, Elisabete; Varela Coelho, Ana; Grunwald, Ingo; Power, Anne Marie

    2014-01-01

    Barnacle adhesion underwater is an important phenomenon to understand for the prevention of biofouling and potential biotechnological innovations, yet so far, identifying what makes barnacle glue proteins ‘sticky’ has proved elusive. Examination of a broad range of species within the barnacles may be instructive to identify conserved adhesive domains. We add to extensive information from the acorn barnacles (order Sessilia) by providing the first protein analysis of a stalked barnacle adhesive, Lepas anatifera (order Lepadiformes). It was possible to separate the L. anatifera adhesive into at least 10 protein bands using SDS-PAGE. Intense bands were present at approximately 30, 70, 90 and 110 kilodaltons (kDa). Mass spectrometry for protein identification was followed by de novo sequencing which detected 52 peptides of 7–16 amino acids in length. None of the peptides matched published or unpublished transcriptome sequences, but some amino acid sequence similarity was apparent between L. anatifera and closely-related Dosima fascicularis. Antibodies against two acorn barnacle proteins (ab-cp-52k and ab-cp-68k) showed cross-reactivity in the adhesive glands of L. anatifera. We also analysed the similarity of adhesive proteins across several barnacle taxa, including Pollicipes pollicipes (a stalked barnacle in the order Scalpelliformes). Sequence alignment of published expressed sequence tags clearly indicated that P. pollicipes possesses homologues for the 19 kDa and 100 kDa proteins in acorn barnacles. Homology aside, sequence similarity in amino acid and gene sequences tended to decline as taxonomic distance increased, with minimum similarities of 18–26%, depending on the gene. The results indicate that some adhesive proteins (e.g. 100 kDa) are more conserved within barnacles than others (20 kDa). PMID:25295513

  15. Adhesive proteins of stalked and acorn barnacles display homology with low sequence similarities.

    PubMed

    Jonker, Jaimie-Leigh; Abram, Florence; Pires, Elisabete; Varela Coelho, Ana; Grunwald, Ingo; Power, Anne Marie

    2014-01-01

    Barnacle adhesion underwater is an important phenomenon to understand for the prevention of biofouling and potential biotechnological innovations, yet so far, identifying what makes barnacle glue proteins 'sticky' has proved elusive. Examination of a broad range of species within the barnacles may be instructive to identify conserved adhesive domains. We add to extensive information from the acorn barnacles (order Sessilia) by providing the first protein analysis of a stalked barnacle adhesive, Lepas anatifera (order Lepadiformes). It was possible to separate the L. anatifera adhesive into at least 10 protein bands using SDS-PAGE. Intense bands were present at approximately 30, 70, 90 and 110 kilodaltons (kDa). Mass spectrometry for protein identification was followed by de novo sequencing which detected 52 peptides of 7-16 amino acids in length. None of the peptides matched published or unpublished transcriptome sequences, but some amino acid sequence similarity was apparent between L. anatifera and closely-related Dosima fascicularis. Antibodies against two acorn barnacle proteins (ab-cp-52k and ab-cp-68k) showed cross-reactivity in the adhesive glands of L. anatifera. We also analysed the similarity of adhesive proteins across several barnacle taxa, including Pollicipes pollicipes (a stalked barnacle in the order Scalpelliformes). Sequence alignment of published expressed sequence tags clearly indicated that P. pollicipes possesses homologues for the 19 kDa and 100 kDa proteins in acorn barnacles. Homology aside, sequence similarity in amino acid and gene sequences tended to decline as taxonomic distance increased, with minimum similarities of 18-26%, depending on the gene. The results indicate that some adhesive proteins (e.g. 100 kDa) are more conserved within barnacles than others (20 kDa).

  16. Meta-structure correlation in protein space unveils different selection rules for folded and intrinsically disordered proteins.

    PubMed

    Naranjo, Yandi; Pons, Miquel; Konrat, Robert

    2012-01-01

    The number of existing protein sequences spans a very small fraction of sequence space. Natural proteins have overcome a strong negative selective pressure to avoid the formation of insoluble aggregates. Stably folded globular proteins and intrinsically disordered proteins (IDPs) use alternative solutions to the aggregation problem. While in globular proteins folding minimizes the access to aggregation prone regions, IDPs on average display large exposed contact areas. Here, we introduce the concept of average meta-structure correlation maps to analyze sequence space. Using this novel conceptual view we show that representative ensembles of folded and ID proteins show distinct characteristics and respond differently to sequence randomization. By studying the way evolutionary constraints act on IDPs to disable a negative function (aggregation) we might gain insight into the mechanisms by which function-enabling information is encoded in IDPs.

  17. The limits of protein sequence comparison?

    PubMed Central

    Pearson, William R; Sierk, Michael L

    2010-01-01

    Modern sequence alignment algorithms are used routinely to identify homologous proteins, proteins that share a common ancestor. Homologous proteins always share similar structures and often have similar functions. Over the past 20 years, sequence comparison has become both more sensitive, largely because of profile-based methods, and more reliable, because of more accurate statistical estimates. As sequence and structure databases become larger, and comparison methods become more powerful, reliable statistical estimates will become even more important for distinguishing similarities that are due to homology from those that are due to analogy (convergence). The newest sequence alignment methods are more sensitive than older methods, but more accurate statistical estimates are needed for their full power to be realized. PMID:15919194

  18. Advances in Understanding Stimulus Responsive Phase Behavior of Intrinsically Disordered Protein Polymers.

    PubMed

    Ruff, Kiersten M; Roberts, Stefan; Chilkoti, Ashutosh; Pappu, Rohit V

    2018-06-24

    Proteins and synthetic polymers can undergo phase transitions in response to changes to intensive solution parameters such as temperature, proton chemical potentials (pH), and hydrostatic pressure. For proteins and protein-based polymers, the information required for stimulus responsive phase transitions is encoded in their amino acid sequence. Here, we review some of the key physical principles that govern the phase transitions of archetypal intrinsically disordered protein polymers (IDPPs). These are disordered proteins with highly repetitive amino acid sequences. Advances in recombinant technologies have enabled the design and synthesis of protein sequences of a variety of sequence complexities and lengths. We summarize insights that have been gleaned from the design and characterization of IDPPs that undergo thermo-responsive phase transitions and build on these insights to present a general framework for IDPPs with pH and pressure responsive phase behavior. In doing so, we connect the stimulus responsive phase behavior of IDPPs with repetitive sequences to the coil-to-globule transitions that these sequences undergo at the single chain level in response to changes in stimuli. The proposed framework and ongoing studies of stimulus responsive phase behavior of designed IDPPs have direct implications in bioengineering, where designing sequences with bespoke material properties broadens the spectrum of applications, and in biology and medicine for understanding the sequence-specific driving forces for the formation of protein-based membraneless organelles as well as biological matrices that act as scaffolds for cells and mediators of cell-to-cell communication. Copyright © 2018. Published by Elsevier Ltd.

  19. Function-based classification of carbohydrate-active enzymes by recognition of short, conserved peptide motifs.

    PubMed

    Busk, Peter Kamp; Lange, Lene

    2013-06-01

    Functional prediction of carbohydrate-active enzymes is difficult due to low sequence identity. However, similar enzymes often share a few short motifs, e.g., around the active site, even when the overall sequences are very different. To exploit this notion for functional prediction of carbohydrate-active enzymes, we developed a simple algorithm, peptide pattern recognition (PPR), that can divide proteins into groups of sequences that share a set of short conserved sequences. When this method was used on 118 glycoside hydrolase 5 proteins with 9% average pairwise identity and representing four characterized enzymatic functions, 97% of the proteins were sorted into groups correlating with their enzymatic activity. Furthermore, we analyzed 8,138 glycoside hydrolase 13 proteins including 204 experimentally characterized enzymes with 28 different functions. There was a 91% correlation between group and enzyme activity. These results indicate that the function of carbohydrate-active enzymes can be predicted with high precision by finding short, conserved motifs in their sequences. The glycoside hydrolase 61 family is important for fungal biomass conversion, but only a few proteins of this family have been functionally characterized. Interestingly, PPR divided 743 glycoside hydrolase 61 proteins into 16 subfamilies useful for targeted investigation of the function of these proteins and pinpointed three conserved motifs with putative importance for enzyme activity. Furthermore, the conserved sequences were useful for cloning of new, subfamily-specific glycoside hydrolase 61 proteins from 14 fungi. In conclusion, identification of conserved sequence motifs is a new approach to sequence analysis that can predict carbohydrate-active enzyme functions with high precision.

  20. A new method of preparing embeddment-free sections for transmission electron microscopy: applications to the cytoskeletal framework and other three-dimensional networks

    PubMed Central

    1984-01-01

    Diethylene glycol distearate is used as a removable embedding medium to produce embeddment -free sections for transmission electron microscopy. The easily cut sections of this material float and form ribbons in a water-filled knife trough and exhibit interference colors that aid in the selection of sections of equal thickness. The images obtained with embeddment -free sections are compared with those from the more conventional epoxy-embedded sections, and illustrate that embedding medium can obscure important biological structures, especially protein filament networks. The embeddment -free section methodology is well suited for morphological studies of cytoskeletal preparations obtained by extraction of cells with nonionic detergent in cytoskeletal stabilizing medium. The embeddment -free section also serves to bridge the very different images afforded by embedded sections and unembedded whole mounts. PMID:6539336

  1. The primary structure of rat liver ribosomal protein L37. Homology with yeast and bacterial ribosomal proteins.

    PubMed

    Lin, A; McNally, J; Wool, I G

    1983-09-10

    The covalent structure of the rat liver 60 S ribosomal subunit protein L37 was determined. Twenty-four tryptic peptides were purified and the sequence of each was established; they accounted for all 111 residues of L37. The sequence of the first 30 residues of L37, obtained previously by automated Edman degradation of the intact protein, provided the alignment of the first 9 tryptic peptides. Three peptides (CN1, CN2, and CN3) were produced by cleavage of protein L37 with cyanogen bromide. The sequence of CN1 (65 residues) was established from the sequence of secondary peptides resulting from cleavage with trypsin and chymotrypsin. The sequence of CN1 in turn served to order tryptic peptides 1 through 14. The sequence of CN2 (15 residues) was determined entirely by a micromanual procedure and allowed the alignment of tryptic peptides 14 through 18. The sequence of the NH2-terminal 28 amino acids of CN3 (31 residues) was determined; in addition the complete sequences of the secondary tryptic and chymotryptic peptides were done. The sequence of CN3 provided the order of tryptic peptides 18 through 24. Thus the sequence of the three cyanogen bromide peptides also accounted for the 111 residues of protein L37. The carboxyl-terminal amino acids were identified after carboxypeptidase A treatment. There is a disulfide bridge between half-cystinyl residues at positions 40 and 69. Rat liver ribosomal protein L37 is homologous with yeast YP55 and with Escherichia coli L34. Moreover, there is a segment of 17 residues in rat L37 that occurs, albeit with modifications, in yeast YP55 and in E. coli S4, L20, and L34.

  2. LenVarDB: database of length-variant protein domains.

    PubMed

    Mutt, Eshita; Mathew, Oommen K; Sowdhamini, Ramanathan

    2014-01-01

    Protein domains are functionally and structurally independent modules, which add to the functional variety of proteins. This array of functional diversity has been enabled by evolutionary changes, such as amino acid substitutions or insertions or deletions, occurring in these protein domains. Length variations (indels) can introduce changes at structural, functional and interaction levels. LenVarDB (freely available at http://caps.ncbs.res.in/lenvardb/) traces these length variations, starting from structure-based sequence alignments in our Protein Alignments organized as Structural Superfamilies (PASS2) database, across 731 structural classification of proteins (SCOP)-based protein domain superfamilies connected to 2 730 625 sequence homologues. Alignment of sequence homologues corresponding to a structural domain is available, starting from a structure-based sequence alignment of the superfamily. Orientation of the length-variant (indel) regions in protein domains can be visualized by mapping them on the structure and on the alignment. Knowledge about location of length variations within protein domains and their visual representation will be useful in predicting changes within structurally or functionally relevant sites, which may ultimately regulate protein function. Non-technical summary: Evolutionary changes bring about natural changes to proteins that may be found in many organisms. Such changes could be reflected as amino acid substitutions or insertions-deletions (indels) in protein sequences. LenVarDB is a database that provides an early overview of observed length variations that were set among 731 protein families and after examining >2 million sequences. Indels are followed up to observe if they are close to the active site such that they can affect the activity of proteins. Inclusion of such information can aid the design of bioengineering experiments.

  3. Multi-level machine learning prediction of protein-protein interactions in Saccharomyces cerevisiae.

    PubMed

    Zubek, Julian; Tatjewski, Marcin; Boniecki, Adam; Mnich, Maciej; Basu, Subhadip; Plewczynski, Dariusz

    2015-01-01

    Accurate identification of protein-protein interactions (PPI) is the key step in understanding proteins' biological functions, which are typically context-dependent. Many existing PPI predictors rely on aggregated features from protein sequences, however only a few methods exploit local information about specific residue contacts. In this work we present a two-stage machine learning approach for prediction of protein-protein interactions. We start with the carefully filtered data on protein complexes available for Saccharomyces cerevisiae in the Protein Data Bank (PDB) database. First, we build linear descriptions of interacting and non-interacting sequence segment pairs based on their inter-residue distances. Secondly, we train machine learning classifiers to predict binary segment interactions for any two short sequence fragments. The final prediction of the protein-protein interaction is done using the 2D matrix representation of all-against-all possible interacting sequence segments of both analysed proteins. The level-I predictor achieves 0.88 AUC for micro-scale, i.e., residue-level prediction. The level-II predictor improves the results further by a more complex learning paradigm. We perform 30-fold macro-scale, i.e., protein-level cross-validation experiment. The level-II predictor using PSIPRED-predicted secondary structure reaches 0.70 precision, 0.68 recall, and 0.70 AUC, whereas other popular methods provide results below 0.6 threshold (recall, precision, AUC). Our results demonstrate that multi-scale sequence features aggregation procedure is able to improve the machine learning results by more than 10% as compared to other sequence representations. Prepared datasets and source code for our experimental pipeline are freely available for download from: http://zubekj.github.io/mlppi/ (open source Python implementation, OS independent).

  4. Genetic Characterization of Echinococcus granulosus from a Large Number of Formalin-Fixed, Paraffin-Embedded Tissue Samples of Human Isolates in Iran

    PubMed Central

    Rostami, Sima; Torbaghan, Shams Shariat; Dabiri, Shahriar; Babaei, Zahra; Mohammadi, Mohammad Ali; Sharbatkhori, Mitra; Harandi, Majid Fasihi

    2015-01-01

    Cystic echinococcosis (CE), caused by the larval stage of Echinococcus granulosus, presents an important medical and veterinary problem globally, including that in Iran. Different genotypes of E. granulosus have been reported from human isolates worldwide. This study identifies the genotype of the parasite responsible for human hydatidosis in three provinces of Iran using formalin-fixed paraffin-embedded tissue samples. In this study, 200 formalin-fixed paraffin-embedded tissue samples from human CE cases were collected from Alborz, Tehran, and Kerman provinces. Polymerase chain reaction amplification and sequencing of the partial mitochondrial cytochrome c oxidase subunit 1 gene were performed for genetic characterization of the samples. Phylogenetic analysis of the isolates from this study and reference sequences of different genotypes was done using a maximum likelihood method. In total, 54.4%, 0.8%, 1%, and 40.8% of the samples were identified as the G1, G2, G3, and G6 genotypes, respectively. The findings of the current study confirm the G1 genotype (sheep strain) to be the most prevalent genotype involved in human CE cases in Iran and indicates the high prevalence of the G6 genotype with a high infectivity for humans. Furthermore, this study illustrates the first documented human CE case in Iran infected with the G2 genotype. PMID:25535316

  5. The primary structures of ribosomal proteins L16, L23 and L33 from the archaebacterium Halobacterium marismortui.

    PubMed

    Hatakeyama, T; Hatakeyama, T; Kimura, M

    1988-11-21

    The complete amino acid sequences of ribosomal proteins L16, L23 and L33 from the archaebacterium Halobacterium marismortui were determined. The sequences were established by manual sequencing of peptides produced with several proteases as well as by cleavage with dilute HCl. Proteins L16, L23 and L33 consist of 119, 154 and 69 amino acid residues, and their molecular masses are 13,538, 16,812 and 7620 Da, respectively. The comparison of their sequences with those of ribosomal proteins from other organisms revealed that L23 and L33 are related to eubacterial ribosomal proteins from Escherichia coli and Bacillus stearothermophilus, while protein L16 was found to be homologous to a eukaryotic ribosomal protein from yeast. These results provide information about the special phylogenetic position of archaebacteria.

  6. [Multiplexing mapping of human cDNAs]. Final report, September 1, 1991--February 28, 1994

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    Using PCR with automated product analysis, 329 human brain cDNA sequences have been assigned to individual human chromosomes. Primers were designed from single-pass cDNA sequences expressed sequence tags (ESTs). Primers were used in PCR reactions with DNA from somatic cell hybrid mapping panels as templates, often with multiplexing. Many ESTs mapped match sequence database records. To evaluate of these matches, the position of the primers relative to the matching region (In), the BLAST scores and the Poisson probability values of the EST/sequence record match were determined. In cases where the gene product was stringently identified by the sequence match hadmore » already been mapped, the gene locus determined by EST was consistent with the previous position which strongly supports the validity of assigning unknown genes to human chromosomes based on the EST sequence matches. In the present cases mapping the ESTs to a chromosome can also be considered to have mapped the known gene product: rolipram-sensitive cAMP phosphodiesterase, chromosome 1; protein phosphatase 2A{beta}, chromosome 4; alpha-catenin, chromosome 5; the ELE1 oncogene, chromosome 10q11.2 or q2.1-q23; MXII protein, chromosome l0q24-qter; ribosomal protein L18a homologue, chromosome 14; ribosomal protein L3, chromosome 17; and moesin, Xp11-cen. There were also ESTs mapped that were closely related to non-human sequence records. These matches therefore can be considered to identify human counterparts of known gene products, or members of known gene families. Examples of these include membrane proteins, translation-associated proteins, structural proteins, and enzymes. These data then demonstrate that single pass sequence information is sufficient to design PCR primers useful for assigning cDNA sequences to human chromosomes. When the EST sequence matches previous sequence database records, the chromosome assignments of the EST can be used to make preliminary assignments of the human gene to a chromosome.« less

  7. Full-Length Venom Protein cDNA Sequences from Venom-Derived mRNA: Exploring Compositional Variation and Adaptive Multigene Evolution

    PubMed Central

    Modahl, Cassandra M.; Mackessy, Stephen P.

    2016-01-01

    Envenomation of humans by snakes is a complex and continuously evolving medical emergency, and treatment is made that much more difficult by the diverse biochemical composition of many venoms. Venomous snakes and their venoms also provide models for the study of molecular evolutionary processes leading to adaptation and genotype-phenotype relationships. To compare venom complexity and protein sequences, venom gland transcriptomes are assembled, which usually requires the sacrifice of snakes for tissue. However, toxin transcripts are also present in venoms, offering the possibility of obtaining cDNA sequences directly from venom. This study provides evidence that unknown full-length venom protein transcripts can be obtained from the venoms of multiple species from all major venomous snake families. These unknown venom protein cDNAs are obtained by the use of primers designed from conserved signal peptide sequences within each venom protein superfamily. This technique was used to assemble a partial venom gland transcriptome for the Middle American Rattlesnake (Crotalus simus tzabcan) by amplifying sequences for phospholipases A2, serine proteases, C-lectins, and metalloproteinases from within venom. Phospholipase A2 sequences were also recovered from the venoms of several rattlesnakes and an elapid snake (Pseudechis porphyriacus), and three-finger toxin sequences were recovered from multiple rear-fanged snake species, demonstrating that the three major clades of advanced snakes (Elapidae, Viperidae, Colubridae) have stable mRNA present in their venoms. These cDNA sequences from venom were then used to explore potential activities derived from protein sequence similarities and evolutionary histories within these large multigene superfamilies. Venom-derived sequences can also be used to aid in characterizing venoms that lack proteomic profiles and identify sequence characteristics indicating specific envenomation profiles. This approach, requiring only venom, provides access to cDNA sequences in the absence of living specimens, even from commercial venom sources, to evaluate important regional differences in venom composition and to study snake venom protein evolution. PMID:27280639

  8. Predicting protein-binding RNA nucleotides with consideration of binding partners.

    PubMed

    Tuvshinjargal, Narankhuu; Lee, Wook; Park, Byungkyu; Han, Kyungsook

    2015-06-01

    In recent years several computational methods have been developed to predict RNA-binding sites in protein. Most of these methods do not consider interacting partners of a protein, so they predict the same RNA-binding sites for a given protein sequence even if the protein binds to different RNAs. Unlike the problem of predicting RNA-binding sites in protein, the problem of predicting protein-binding sites in RNA has received little attention mainly because it is much more difficult and shows a lower accuracy on average. In our previous study, we developed a method that predicts protein-binding nucleotides from an RNA sequence. In an effort to improve the prediction accuracy and usefulness of the previous method, we developed a new method that uses both RNA and protein sequence data. In this study, we identified effective features of RNA and protein molecules and developed a new support vector machine (SVM) model to predict protein-binding nucleotides from RNA and protein sequence data. The new model that used both protein and RNA sequence data achieved a sensitivity of 86.5%, a specificity of 86.2%, a positive predictive value (PPV) of 72.6%, a negative predictive value (NPV) of 93.8% and Matthews correlation coefficient (MCC) of 0.69 in a 10-fold cross validation; it achieved a sensitivity of 58.8%, a specificity of 87.4%, a PPV of 65.1%, a NPV of 84.2% and MCC of 0.48 in independent testing. For comparative purpose, we built another prediction model that used RNA sequence data alone and ran it on the same dataset. In a 10 fold-cross validation it achieved a sensitivity of 85.7%, a specificity of 80.5%, a PPV of 67.7%, a NPV of 92.2% and MCC of 0.63; in independent testing it achieved a sensitivity of 67.7%, a specificity of 78.8%, a PPV of 57.6%, a NPV of 85.2% and MCC of 0.45. In both cross-validations and independent testing, the new model that used both RNA and protein sequences showed a better performance than the model that used RNA sequence data alone in most performance measures. To the best of our knowledge, this is the first sequence-based prediction of protein-binding nucleotides in RNA which considers the binding partner of RNA. The new model will provide valuable information for designing biochemical experiments to find putative protein-binding sites in RNA with unknown structure. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  9. Probing the Proteome on Earth and Beyond

    NASA Astrophysics Data System (ADS)

    Ostrom, P.

    2008-12-01

    Less than a decade ago, protein sequencing was the bane of paleobiology. Since that time researchers have completely sequenced proteins in >50 Ka fossils, been dazzled by reports of collagen peptides in dinosaur bones, and witnessed the development of phylogenetic trees from ancient protein sequences. Enlisting proteomics as biosignature is now in our grasp. In this talk the pitfalls and challenges of mass spectrometric approaches to protein sequencing will be illustrated and phylogenetic applications will be discussed. Work on extinct organisms at Michigan State University, University of Michigan and York University will provide a vantage point to assess methodologies, explore diagenetic alterations, evaluate mass spectra and illustrate issues associated with data base searching. Challenges encountered in the study of paleoproteomics, such as the absence of sequences for extinct organisms in commercially available databases, protein diagenesis and low concentrations of target are parallel to those that will be encountered when protein sequencing is extended to extreme and extraterrestrial environments. Thus, lessons learned from interrogating the ancient proteome are important and necessary step in developing proteomics as a biosignature tools.

  10. Basis of altered RNA-binding specificity by PUF proteins revealed by crystal structures of yeast Puf4p

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Miller, Matthew T.; Higgin, Joshua J.; Hall, Traci M.Tanaka

    2008-06-06

    Pumilio/FBF (PUF) family proteins are found in eukaryotic organisms and regulate gene expression post-transcriptionally by binding to sequences in the 3' untranslated region of target transcripts. PUF proteins contain an RNA binding domain that typically comprises eight {alpha}-helical repeats, each of which recognizes one RNA base. Some PUF proteins, including yeast Puf4p, have altered RNA binding specificity and use their eight repeats to bind to RNA sequences with nine or ten bases. Here we report the crystal structures of Puf4p alone and in complex with a 9-nucleotide (nt) target RNA sequence, revealing that Puf4p accommodates an 'extra' nucleotide by modestmore » adaptations allowing one base to be turned away from the RNA binding surface. Using structural information and sequence comparisons, we created a mutant Puf4p protein that preferentially binds to an 8-nt target RNA sequence over a 9-nt sequence and restores binding of each protein repeat to one RNA base.« less

  11. Identifying functionally informative evolutionary sequence profiles.

    PubMed

    Gil, Nelson; Fiser, Andras

    2018-04-15

    Multiple sequence alignments (MSAs) can provide essential input to many bioinformatics applications, including protein structure prediction and functional annotation. However, the optimal selection of sequences to obtain biologically informative MSAs for such purposes is poorly explored, and has traditionally been performed manually. We present Selection of Alignment by Maximal Mutual Information (SAMMI), an automated, sequence-based approach to objectively select an optimal MSA from a large set of alternatives sampled from a general sequence database search. The hypothesis of this approach is that the mutual information among MSA columns will be maximal for those MSAs that contain the most diverse set possible of the most structurally and functionally homogeneous protein sequences. SAMMI was tested to select MSAs for functional site residue prediction by analysis of conservation patterns on a set of 435 proteins obtained from protein-ligand (peptides, nucleic acids and small substrates) and protein-protein interaction databases. Availability and implementation: A freely accessible program, including source code, implementing SAMMI is available at https://github.com/nelsongil92/SAMMI.git. andras.fiser@einstein.yu.edu. Supplementary data are available at Bioinformatics online.

  12. Efficient farnesylation of an extended C-terminal C(x)3X sequence motif expands the scope of the prenylated proteome.

    PubMed

    Blanden, Melanie J; Suazo, Kiall F; Hildebrandt, Emily R; Hardgrove, Daniel S; Patel, Meet; Saunders, William P; Distefano, Mark D; Schmidt, Walter K; Hougland, James L

    2018-02-23

    Protein prenylation is a post-translational modification that has been most commonly associated with enabling protein trafficking to and interaction with cellular membranes. In this process, an isoprenoid group is attached to a cysteine near the C terminus of a substrate protein by protein farnesyltransferase (FTase) or protein geranylgeranyltransferase type I or II (GGTase-I and GGTase-II). FTase and GGTase-I have long been proposed to specifically recognize a four-amino acid C AAX C-terminal sequence within their substrates. Surprisingly, genetic screening reveals that yeast FTase can modify sequences longer than the canonical C AAX sequence, specifically C( x ) 3 X sequences with four amino acids downstream of the cysteine. Biochemical and cell-based studies using both peptide and protein substrates reveal that mammalian FTase orthologs can also prenylate C( x ) 3 X sequences. As the search to identify physiologically relevant C( x ) 3 X proteins begins, this new prenylation motif nearly doubles the number of proteins within the yeast and human proteomes that can be explored as potential FTase substrates. This work expands our understanding of prenylation's impact within the proteome, establishes the biologically relevant reactivity possible with this new motif, and opens new frontiers in determining the impact of non-canonically prenylated proteins on cell function. © 2018 by The American Society for Biochemistry and Molecular Biology, Inc.

  13. Inferences from structural comparison: flexibility, secondary structure wobble and sequence alignment optimization.

    PubMed

    Zhang, Gaihua; Su, Zhen

    2012-01-01

    Work on protein structure prediction is very useful in biological research. To evaluate their accuracy, experimental protein structures or their derived data are used as the 'gold standard'. However, as proteins are dynamic molecular machines with structural flexibility such a standard may be unreliable. To investigate the influence of the structure flexibility, we analysed 3,652 protein structures of 137 unique sequences from 24 protein families. The results showed that (1) the three-dimensional (3D) protein structures were not rigid: the root-mean-square deviation (RMSD) of the backbone Cα of structures with identical sequences was relatively large, with the average of the maximum RMSD from each of the 137 sequences being 1.06 Å; (2) the derived data of the 3D structure was not constant, e.g. the highest ratio of the secondary structure wobble site was 60.69%, with the sequence alignments from structural comparisons of two proteins in the same family sometimes being completely different. Proteins may have several stable conformations and the data derived from resolved structures as a 'gold standard' should be optimized before being utilized as criteria to evaluate the prediction methods, e.g. sequence alignment from structural comparison. Helix/β-sheet transition exists in normal free proteins. The coil ratio of the 3D structure could affect its resolution as determined by X-ray crystallography.

  14. Watermarking scheme for authentication of compressed image

    NASA Astrophysics Data System (ADS)

    Hsieh, Tsung-Han; Li, Chang-Tsun; Wang, Shuo

    2003-11-01

    As images are commonly transmitted or stored in compressed form such as JPEG, to extend the applicability of our previous work, a new scheme for embedding watermark in compressed domain without resorting to cryptography is proposed. In this work, a target image is first DCT transformed and quantised. Then, all the coefficients are implicitly watermarked in order to minimize the risk of being attacked on the unwatermarked coefficients. The watermarking is done through registering/blending the zero-valued coefficients with a binary sequence to create the watermark and involving the unembedded coefficients during the process of embedding the selected coefficients. The second-order neighbors and the block itself are considered in the process of the watermark embedding in order to thwart different attacks such as cover-up, vector quantisation, and transplantation. The experiments demonstrate the capability of the proposed scheme in thwarting local tampering, geometric transformation such as cropping, and common signal operations such as lowpass filtering.

  15. A secure steganography for privacy protection in healthcare system.

    PubMed

    Liu, Jing; Tang, Guangming; Sun, Yifeng

    2013-04-01

    Private data in healthcare system require confidentiality protection while transmitting. Steganography is the art of concealing data into a cover media for conveying messages confidentially. In this paper, we propose a steganographic method which can provide private data in medical system with very secure protection. In our method, a cover image is first mapped into a 1D pixels sequence by Hilbert filling curve and then divided into non-overlapping embedding units with three consecutive pixels. We use adaptive pixel pair match (APPM) method to embed digits in the pixel value differences (PVD) of the three pixels and the base of embedded digits is dependent on the differences among the three pixels. By solving an optimization problem, minimal distortion of the pixel ternaries caused by data embedding can be obtained. The experimental results show our method is more suitable to privacy protection of healthcare system than prior steganographic works.

  16. Indigenous and introduced potyviruses of legumes and Passiflora spp. from Australia: biological properties and comparison of coat protein sequences

    USDA-ARS?s Scientific Manuscript database

    Coat protein sequences of 33 Potyvirus isolates from legume and Passiflora spp. were sequenced to determine the identity of infecting viruses. Phylogenetic analysis of the sequences revealed the presence of seven distinct virus species....

  17. In silico characterization and analysis of RTBP1 and NgTRF1 protein through MD simulation and molecular docking - A comparative study.

    PubMed

    Mukherjee, Koel; Pandey, Dev Mani; Vidyarthi, Ambarish Saran

    2015-02-06

    Gaining access to sequence and structure information of telomere binding proteins helps in understanding the essential biological processes involve in conserved sequence specific interaction between DNA and the proteins. Rice telomere binding protein (RTBP1) and Nicotiana glutinosa telomere repeat binding factor (NgTRF1) are helix turn helix motif type of proteins that plays role in telomeric DNA protection and length regulation. Both the proteins share same type of domain but till now there is very less communication on the in silico studies of these complete proteins.Here we intend to do a comparative study between two proteins through modeling of the complete proteins, physiochemical characterization, MD simulation and DNA-protein docking. I-TASSER and CLC protein work bench was performed to find out the protein 3D structure as well as the different parameters to characterize the proteins. MD simulation was completed by GROMOS forcefield of GROMACS for 10 ns of time stretch. The simulated 3D structures were docked with template DNA (3D DNA modeled through 3D-DART) of TTTAGGG conserved sequence motif using HADDOCK web server.Digging up all the facts about the proteins it was reveled that around 120 amino acids in the tail part was showing a good sequence similarity between the proteins. Molecular modeling, sequence characterization and secondary structure prediction also indicates the similarity between the protein's structure and sequence. The result of MD simulation highlights on the RMSD, RMSF, Rg, PCA and Energy plots which also conveys the similar type of motional behavior between them. The best complex formation for both the proteins in docking result also indicates for the first interaction site which is mainly the helix3 region of the DNA binding domain. The overall computational analysis reveals that RTBP1 and NgTRF1 proteins display good amount of similarity in their physicochemical properties, structure, dynamics and binding mode.

  18. In Silico Characterization and Analysis of RTBP1 and NgTRF1 Protein Through MD Simulation and Molecular Docking: A Comparative Study.

    PubMed

    Mukherjee, Koel; Pandey, Dev Mani; Vidyarthi, Ambarish Saran

    2015-09-01

    Gaining access to sequence and structure information of telomere-binding proteins helps in understanding the essential biological processes involve in conserved sequence-specific interaction between DNA and the proteins. Rice telomere-binding protein (RTBP1) and Nicotiana glutinosa telomere repeat binding factor (NgTRF1) are helix-turn-helix motif type of proteins that plays role in telomeric DNA protection and length regulation. Both the proteins share same type of domain, but till now there is very less communication on the in silico studies of these complete proteins. Here we intend to do a comparative study between two proteins through modeling of the complete proteins, physiochemical characterization, MD simulation and DNA-protein docking. I-TASSER and CLC protein work bench was performed to find out the protein 3D structure as well as the different parameters to characterize the proteins. MD simulation was completed by GROMOS forcefield of GROMACS for 10 ns of time stretch. The simulated 3D structures were docked with template DNA (3D DNA modeled through 3D-DART) of TTTAGGG conserved sequence motif using HADDOCK Web server. By digging up all the facts about the proteins, it was revealed that around 120 amino acids in the tail part were showing a good sequence similarity between the proteins. Molecular modeling, sequence characterization and secondary structure prediction also indicate the similarity between the protein's structure and sequence. The result of MD simulation highlights on the RMSD, RMSF, Rg, PCA and energy plots which also conveys the similar type of motional behavior between them. The best complex formation for both the proteins in docking result also indicates for the first interaction site which is mainly the helix3 region of the DNA-binding domain. The overall computational analysis reveals that RTBP1 and NgTRF1 proteins display good amount of similarity in their physicochemical properties, structure, dynamics and binding mode.

  19. Pyrin gene and mutants thereof, which cause familial Mediterranean fever

    DOEpatents

    Kastner, Daniel L [Bethesda, MD; Aksentijevichh, Ivona [Bethesda, MD; Centola, Michael [Tacoma Park, MD; Deng, Zuoming [Gaithersburg, MD; Sood, Ramen [Rockville, MD; Collins, Francis S [Rockville, MD; Blake, Trevor [Laytonsville, MD; Liu, P Paul [Ellicott City, MD; Fischel-Ghodsian, Nathan [Los Angeles, CA; Gumucio, Deborah L [Ann Arbor, MI; Richards, Robert I [North Adelaide, AU; Ricke, Darrell O [San Diego, CA; Doggett, Norman A [Santa Cruz, NM; Pras, Mordechai [Tel-Hashomer, IL

    2003-09-30

    The invention provides the nucleic acid sequence encoding the protein associated with familial Mediterranean fever (FMF). The cDNA sequence is designated as MEFV. The invention is also directed towards fragments of the DNA sequence, as well as the corresponding sequence for the RNA transcript and fragments thereof. Another aspect of the invention provides the amino acid sequence for a protein (pyrin) associated with FMF. The invention is directed towards both the full length amino acid sequence, fusion proteins containing the amino acid sequence and fragments thereof. The invention is also directed towards mutants of the nucleic acid and amino acid sequences associated with FMF. In particular, the invention discloses three missense mutations, clustered in within about 40 to 50 amino acids, in the highly conserved rfp (B30.2) domain at the C-terminal of the protein. These mutants include M6801, M694V, K695R, and V726A. Additionally, the invention includes methods for diagnosing a patient at risk for having FMF and kits therefor.

  20. The PAXgene® Tissue System Preserves Phosphoproteins in Human Tissue Specimens and Enables Comprehensive Protein Biomarker Research

    PubMed Central

    Gündisch, Sibylle; Schott, Christina; Wolff, Claudia; Tran, Kai; Beese, Christian; Viertler, Christian; Zatloukal, Kurt; Becker, Karl-Friedrich

    2013-01-01

    Precise quantitation of protein biomarkers in clinical tissue specimens is a prerequisite for accurate and effective diagnosis, prognosis, and personalized medicine. Although progress is being made, protein analysis from formalin-fixed and paraffin-embedded tissues is still challenging. In previous reports, we showed that the novel formalin-free tissue preservation technology, the PAXgene Tissue System, allows the extraction of intact and immunoreactive proteins from PAXgene-fixed and paraffin-embedded (PFPE) tissues. In the current study, we focused on the analysis of phosphoproteins and the applicability of two-dimensional gel electrophoresis (2D-PAGE) and enzyme-linked immunosorbent assay (ELISA) to the analysis of a variety of malignant and non-malignant human tissues. Using western blot analysis, we found that phosphoproteins are quantitatively preserved in PFPE tissues, and signal intensities are comparable to that in paired, frozen tissues. Furthermore, proteins extracted from PFPE samples are suitable for 2D-PAGE and can be quantified by ELISA specific for denatured proteins. In summary, the PAXgene Tissue System reliably preserves phosphoproteins in human tissue samples, even after prolonged fixation or stabilization times, and is compatible with methods for protein analysis such as 2D-PAGE and ELISA. We conclude that the PAXgene Tissue System has the potential to serve as a versatile tissue fixative for modern pathology. PMID:23555997

  1. Protein sequence annotation in the genome era: the annotation concept of SWISS-PROT+TREMBL.

    PubMed

    Apweiler, R; Gateau, A; Contrino, S; Martin, M J; Junker, V; O'Donovan, C; Lang, F; Mitaritonna, N; Kappus, S; Bairoch, A

    1997-01-01

    SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotation, a minimal level of redundancy and high level of integration with other databases. Ongoing genome sequencing projects have dramatically increased the number of protein sequences to be incorporated into SWISS-PROT. Since we do not want to dilute the quality standards of SWISS-PROT by incorporating sequences without proper sequence analysis and annotation, we cannot speed up the incorporation of new incoming data indefinitely. However, as we also want to make the sequences available as fast as possible, we introduced TREMBL (TRanslation of EMBL nucleotide sequence database), a supplement to SWISS-PROT. TREMBL consists of computer-annotated entries in SWISS-PROT format derived from the translation of all coding sequences (CDS) in the EMBL nucleotide sequence database, except for CDS already included in SWISS-PROT. While TREMBL is already of immense value, its computer-generated annotation does not match the quality of SWISS-PROTs. The main difference is in the protein functional information attached to sequences. With this in mind, we are dedicating substantial effort to develop and apply computer methods to enhance the functional information attached to TREMBL entries.

  2. Seed Storage Proteins as a System for Teaching Protein Identification by Mass Spectrometry in Biochemistry Laboratory

    ERIC Educational Resources Information Center

    Wilson, Karl A.; Tan-Wilson, Anna

    2013-01-01

    Mass spectrometry (MS) has become an important tool in studying biological systems. One application is the identification of proteins and peptides by the matching of peptide and peptide fragment masses to the sequences of proteins in protein sequence databases. Often prior protein separation of complex protein mixtures by 2D-PAGE is needed,…

  3. Cross-Specificities between cII-like Proteins and pRE-like Promoters of Lambdoid Bacteriophages

    PubMed Central

    Wulff, Daniel L.; Mahoney, Michael E.

    1987-01-01

    We have investigated the activation of transcription from the pRE promoters of phages λ, 21 and P22 by the λ and 21 cII proteins and the P22 c1 (cII-like) protein, using an in vivo system in which cII protein from a derepressed prophage activates transcription from a pRE DNA fragment on a multicopy plasmid. We find that each protein is highly specific for its own cognate pRE promoter, although measureable cross-reactions are observed. The primary recognition sequence for cII protein on λ pRE is a pair of TTGC repeat sequences in the sequence 5'-TTGCN 6TTGC-3' at the -35 region of the promoter. This same sequence is found in 21 pRE, while P22 pRE has the sequence 5'-TTGCN6TTGT-3', which is the same as that of λctr1, a pRE+ variant of λ. λctr1 pRE is half as active as λ + pRE when assayed with either the λ cII or the P22 c1 proteins. Therefore, the single base change in the P22 repeat sequence cannot explain why the P22 c1 protein is much more active with P22 pRE than λ p RE. The dya5 mutation, a G→A change at position -43 of pRE, makes pRE a stronger promoter when assayed with either the λ or 21 cII proteins or the P22 c1 protein. We conclude that efficient activation of a cII-dependent promoter by a cII protein requires sequence information in addition to the TTGC repeat sequences. We do not know the characteristics of the proteins which are responsible for the specificity of each protein for its own cognate promoter. However, λdya8, which has a Glu27→Lys alteration in the λ cII protein and a cII+ phenotype, results in a mutant cII protein that is much more highly specific than wild-type cII protein for its own cognate λ p RE promoter. This is especially remarkable because the dya8 amino acid alteration makes the helix-2 region (the region of the protein predicted to make contact with the phosphodiester backbone of the DNA) of λ cII protein conform exactly with the helix-2 region of the P22 c1 protein in both charge and charge distribution. PMID:2953649

  4. Sequence co-evolution gives 3D contacts and structures of protein complexes

    PubMed Central

    Hopf, Thomas A; Schärfe, Charlotta P I; Rodrigues, João P G L M; Green, Anna G; Kohlbacher, Oliver; Sander, Chris; Bonvin, Alexandre M J J; Marks, Debora S

    2014-01-01

    Protein–protein interactions are fundamental to many biological processes. Experimental screens have identified tens of thousands of interactions, and structural biology has provided detailed functional insight for select 3D protein complexes. An alternative rich source of information about protein interactions is the evolutionary sequence record. Building on earlier work, we show that analysis of correlated evolutionary sequence changes across proteins identifies residues that are close in space with sufficient accuracy to determine the three-dimensional structure of the protein complexes. We evaluate prediction performance in blinded tests on 76 complexes of known 3D structure, predict protein–protein contacts in 32 complexes of unknown structure, and demonstrate how evolutionary couplings can be used to distinguish between interacting and non-interacting protein pairs in a large complex. With the current growth of sequences, we expect that the method can be generalized to genome-wide elucidation of protein–protein interaction networks and used for interaction predictions at residue resolution. DOI: http://dx.doi.org/10.7554/eLife.03430.001 PMID:25255213

  5. Protein sequences from mastodon and Tyrannosaurus rex revealed by mass spectrometry.

    PubMed

    Asara, John M; Schweitzer, Mary H; Freimark, Lisa M; Phillips, Matthew; Cantley, Lewis C

    2007-04-13

    Fossilized bones from extinct taxa harbor the potential for obtaining protein or DNA sequences that could reveal evolutionary links to extant species. We used mass spectrometry to obtain protein sequences from bones of a 160,000- to 600,000-year-old extinct mastodon (Mammut americanum) and a 68-million-year-old dinosaur (Tyrannosaurus rex). The presence of T. rex sequences indicates that their peptide bonds were remarkably stable. Mass spectrometry can thus be used to determine unique sequences from ancient organisms from peptide fragmentation patterns, a valuable tool to study the evolution and adaptation of ancient taxa from which genomic sequences are unlikely to be obtained.

  6. Structural analysis of DNA binding by C.Csp231I, a member of a novel class of R-M controller proteins regulating gene expression

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shevtsov, M. B.; Streeter, S. D.; Thresh, S.-J.

    2015-02-01

    The structure of the new class of controller proteins (exemplified by C.Csp231I) in complex with its 21 bp DNA-recognition sequence is presented, and the molecular basis of sequence recognition in this class of proteins is discussed. An unusual extended spacer between the dimer binding sites suggests a novel interaction between the two C-protein dimers. In a wide variety of bacterial restriction–modification systems, a regulatory ‘controller’ protein (or C-protein) is required for effective transcription of its own gene and for transcription of the endonuclease gene found on the same operon. We have recently turned our attention to a new class ofmore » controller proteins (exemplified by C.Csp231I) that have quite novel features, including a much larger DNA-binding site with an 18 bp (∼60 Å) spacer between the two palindromic DNA-binding sequences and a very different recognition sequence from the canonical GACT/AGTC. Using X-ray crystallography, the structure of the protein in complex with its 21 bp DNA-recognition sequence was solved to 1.8 Å resolution, and the molecular basis of sequence recognition in this class of proteins was elucidated. An unusual aspect of the promoter sequence is the extended spacer between the dimer binding sites, suggesting a novel interaction between the two C-protein dimers when bound to both recognition sites correctly spaced on the DNA. A U-bend model is proposed for this tetrameric complex, based on the results of gel-mobility assays, hydrodynamic analysis and the observation of key contacts at the interface between dimers in the crystal.« less

  7. Structure and Sequence Search on Aptamer-Protein Docking

    NASA Astrophysics Data System (ADS)

    Xiao, Jiajie; Bonin, Keith; Guthold, Martin; Salsbury, Freddie

    2015-03-01

    Interactions between proteins and deoxyribonucleic acid (DNA) play a significant role in the living systems, especially through gene regulation. However, short nucleic acids sequences (aptamers) with specific binding affinity to specific proteins exhibit clinical potential as therapeutics. Our capillary and gel electrophoresis selection experiments show that specific sequences of aptamers can be selected that bind specific proteins. Computationally, given the experimentally-determined structure and sequence of a thrombin-binding aptamer, we can successfully dock the aptamer onto thrombin in agreement with experimental structures of the complex. In order to further study the conformational flexibility of this thrombin-binding aptamer and to potentially develop a predictive computational model of aptamer-binding, we use GPU-enabled molecular dynamics simulations to both examine the conformational flexibility of the aptamer in the absence of binding to thrombin, and to determine our ability to fold an aptamer. This study should help further de-novo predictions of aptamer sequences by enabling the study of structural and sequence-dependent effects on aptamer-protein docking specificity.

  8. Generate Optimized Genetic Rhythm for Enzyme Expression in Non-native systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    2016-11-03

    Most amino acids are represented by more than one codon, resulting in redundancy in the genetic code. Silent codon substitutions that do not alter the amino acid sequence still have an effect on protein expression. We have developed an algorithm, GoGREEN, to enhance the expression of foreign proteins in a host organism. GoGREEN selects codons according to frequency patterns seen in the gene of interest using the codon usage table from the host organism. GoGREEN is also designed to accommodate gaps in the sequence.This software takes for input (1) the aligned protein sequences for genes the user wishes to express,more » (2) the codon usage table for the host organism, (3) and the DNA sequence for the target protein found in the host organism. The program will select codons based on codon usage patterns for the target DNA sequence. The program will also select codons for “gaps” found in the aligned protein sequences using the codon usage table from the host organism.« less

  9. How proteins bind to DNA: target discrimination and dynamic sequence search by the telomeric protein TRF1

    PubMed Central

    2017-01-01

    Abstract Target search as performed by DNA-binding proteins is a complex process, in which multiple factors contribute to both thermodynamic discrimination of the target sequence from overwhelmingly abundant off-target sites and kinetic acceleration of dynamic sequence interrogation. TRF1, the protein that binds to telomeric tandem repeats, faces an intriguing variant of the search problem where target sites are clustered within short fragments of chromosomal DNA. In this study, we use extensive (>0.5 ms in total) MD simulations to study the dynamical aspects of sequence-specific binding of TRF1 at both telomeric and non-cognate DNA. For the first time, we describe the spontaneous formation of a sequence-specific native protein–DNA complex in atomistic detail, and study the mechanism by which proteins avoid off-target binding while retaining high affinity for target sites. Our calculated free energy landscapes reproduce the thermodynamics of sequence-specific binding, while statistical approaches allow for a comprehensive description of intermediate stages of complex formation. PMID:28633355

  10. Sequence- and Interactome-Based Prediction of Viral Protein Hotspots Targeting Host Proteins: A Case Study for HIV Nef

    PubMed Central

    Sarmady, Mahdi; Dampier, William; Tozeren, Aydin

    2011-01-01

    Virus proteins alter protein pathways of the host toward the synthesis of viral particles by breaking and making edges via binding to host proteins. In this study, we developed a computational approach to predict viral sequence hotspots for binding to host proteins based on sequences of viral and host proteins and literature-curated virus-host protein interactome data. We use a motif discovery algorithm repeatedly on collections of sequences of viral proteins and immediate binding partners of their host targets and choose only those motifs that are conserved on viral sequences and highly statistically enriched among binding partners of virus protein targeted host proteins. Our results match experimental data on binding sites of Nef to host proteins such as MAPK1, VAV1, LCK, HCK, HLA-A, CD4, FYN, and GNB2L1 with high statistical significance but is a poor predictor of Nef binding sites on highly flexible, hoop-like regions. Predicted hotspots recapture CD8 cell epitopes of HIV Nef highlighting their importance in modulating virus-host interactions. Host proteins potentially targeted or outcompeted by Nef appear crowding the T cell receptor, natural killer cell mediated cytotoxicity, and neurotrophin signaling pathways. Scanning of HIV Nef motifs on multiple alignments of hepatitis C protein NS5A produces results consistent with literature, indicating the potential value of the hotspot discovery in advancing our understanding of virus-host crosstalk. PMID:21738584

  11. Relating protein conformational changes to packing efficiency and disorder

    PubMed Central

    Bhardwaj, Nitin; Gerstein, Mark

    2009-01-01

    Changes in protein conformation play key roles in facilitating various biochemical processes, ranging from signaling and phosphorylation to transport and catalysis. While various factors that drive these motions such as environmental changes and binding of small molecules are well understood, specific causative effects on the structural features of the protein due to these conformational changes have not been studied on a large scale. Here, we study protein conformational changes in relation to two key structural metrics: packing efficiency and disorder. Packing has been shown to be crucial for protein stability and function by many protein design and engineering studies. We study changes in packing efficiency during conformational changes, thus extending the analysis from a static context to a dynamic perspective and report some interesting observations. First, we study various proteins that adopt alternate conformations and find that tendencies to show motion and change in packing efficiency are correlated: residues that change their packing efficiency show larger motions. Second, our results suggest that residues that show higher changes in packing during motion are located on the changing interfaces which are formed during these conformational changes. These changing interfaces are slightly different from shear or static interfaces that have been analyzed in previous studies. Third, analysis of packing efficiency changes in the context of secondary structure shows that, as expected, residues buried in helices show the least change in packing efficiency, whereas those embedded in bends are most likely to change packing. Finally, by relating protein disorder to motions, we show that marginally disordered residues which are ordered enough to be crystallized but have sequence patterns indicative of disorder show higher dislocation and a higher change in packing than ordered ones and are located mostly on the changing interfaces. Overall, our results demonstrate that between the two conformations, the cores of the proteins remain mostly intact, whereas the interfaces display the most elasticity, both in terms of disorder and change in packing efficiency. By doing a variety of tests, we also show that our observations are robust to the solvation state of the proteins. PMID:19472340

  12. A Circular Dichroism Reference Database for Membrane Proteins

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wallace,B.; Wien, F.; Stone, T.

    2006-01-01

    Membrane proteins are a major product of most genomes and the target of a large number of current pharmaceuticals, yet little information exists on their structures because of the difficulty of crystallising them; hence for the most part they have been excluded from structural genomics programme targets. Furthermore, even methods such as circular dichroism (CD) spectroscopy which seek to define secondary structure have not been fully exploited because of technical limitations to their interpretation for membrane embedded proteins. Empirical analyses of circular dichroism (CD) spectra are valuable for providing information on secondary structures of proteins. However, the accuracy of themore » results depends on the appropriateness of the reference databases used in the analyses. Membrane proteins have different spectral characteristics than do soluble proteins as a result of the low dielectric constants of membrane bilayers relative to those of aqueous solutions (Chen & Wallace (1997) Biophys. Chem. 65:65-74). To date, no CD reference database exists exclusively for the analysis of membrane proteins, and hence empirical analyses based on current reference databases derived from soluble proteins are not adequate for accurate analyses of membrane protein secondary structures (Wallace et al (2003) Prot. Sci. 12:875-884). We have therefore created a new reference database of CD spectra of integral membrane proteins whose crystal structures have been determined. To date it contains more than 20 proteins, and spans the range of secondary structures from mostly helical to mostly sheet proteins. This reference database should enable more accurate secondary structure determinations of membrane embedded proteins and will become one of the reference database options in the CD calculation server DICHROWEB (Whitmore & Wallace (2004) NAR 32:W668-673).« less

  13. Inverse statistical physics of protein sequences: a key issues review.

    PubMed

    Cocco, Simona; Feinauer, Christoph; Figliuzzi, Matteo; Monasson, Rémi; Weigt, Martin

    2018-03-01

    In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e. evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.

  14. Inverse statistical physics of protein sequences: a key issues review

    NASA Astrophysics Data System (ADS)

    Cocco, Simona; Feinauer, Christoph; Figliuzzi, Matteo; Monasson, Rémi; Weigt, Martin

    2018-03-01

    In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e. evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.

  15. Analysis of sequencing data for probing RNA secondary structures and protein-RNA binding in studying posttranscriptional regulations.

    PubMed

    Hu, Xihao; Wu, Yang; Lu, Zhi John; Yip, Kevin Y

    2016-11-01

    High-throughput sequencing has been used to study posttranscriptional regulations, where the identification of protein-RNA binding is a major and fast-developing sub-area, which is in turn benefited by the sequencing methods for whole-transcriptome probing of RNA secondary structures. In the study of RNA secondary structures using high-throughput sequencing, bases are modified or cleaved according to their structural features, which alter the resulting composition of sequencing reads. In the study of protein-RNA binding, methods have been proposed to immuno-precipitate (IP) protein-bound RNA transcripts in vitro or in vivo By sequencing these transcripts, the protein-RNA interactions and the binding locations can be identified. For both types of data, read counts are affected by a combination of confounding factors, including expression levels of transcripts, sequence biases, mapping errors and the probing or IP efficiency of the experimental protocols. Careful processing of the sequencing data and proper extraction of important features are fundamentally important to a successful analysis. Here we review and compare different experimental methods for probing RNA secondary structures and binding sites of RNA-binding proteins (RBPs), and the computational methods proposed for analyzing the corresponding sequencing data. We suggest how these two types of data should be integrated to study the structural properties of RBP binding sites as a systematic way to better understand posttranscriptional regulations. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  16. SIBIS: a Bayesian model for inconsistent protein sequence estimation.

    PubMed

    Khenoussi, Walyd; Vanhoutrève, Renaud; Poch, Olivier; Thompson, Julie D

    2014-09-01

    The prediction of protein coding genes is a major challenge that depends on the quality of genome sequencing, the accuracy of the model used to elucidate the exonic structure of the genes and the complexity of the gene splicing process leading to different protein variants. As a consequence, today's protein databases contain a huge amount of inconsistency, due to both natural variants and sequence prediction errors. We have developed a new method, called SIBIS, to detect such inconsistencies based on the evolutionary information in multiple sequence alignments. A Bayesian framework, combined with Dirichlet mixture models, is used to estimate the probability of observing specific amino acids and to detect inconsistent or erroneous sequence segments. We evaluated the performance of SIBIS on a reference set of protein sequences with experimentally validated errors and showed that the sensitivity is significantly higher than previous methods, with only a small loss of specificity. We also assessed a large set of human sequences from the UniProt database and found evidence of inconsistency in 48% of the previously uncharacterized sequences. We conclude that the integration of quality control methods like SIBIS in automatic analysis pipelines will be critical for the robust inference of structural, functional and phylogenetic information from these sequences. Source code, implemented in C on a linux system, and the datasets of protein sequences are freely available for download at http://www.lbgi.fr/∼julie/SIBIS. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  17. Using next-generation sequencing for high resolution multiplex analysis of copy number variation from nanogram quantities of DNA from formalin-fixed paraffin-embedded specimens.

    PubMed

    Wood, Henry M; Belvedere, Ornella; Conway, Caroline; Daly, Catherine; Chalkley, Rebecca; Bickerdike, Melissa; McKinley, Claire; Egan, Phil; Ross, Lisa; Hayward, Bruce; Morgan, Joanne; Davidson, Leslie; MacLennan, Ken; Ong, Thian K; Papagiannopoulos, Kostas; Cook, Ian; Adams, David J; Taylor, Graham R; Rabbitts, Pamela

    2010-08-01

    The use of next-generation sequencing technologies to produce genomic copy number data has recently been described. Most approaches, however, reply on optimal starting DNA, and are therefore unsuitable for the analysis of formalin-fixed paraffin-embedded (FFPE) samples, which largely precludes the analysis of many tumour series. We have sought to challenge the limits of this technique with regards to quality and quantity of starting material and the depth of sequencing required. We confirm that the technique can be used to interrogate DNA from cell lines, fresh frozen material and FFPE samples to assess copy number variation. We show that as little as 5 ng of DNA is needed to generate a copy number karyogram, and follow this up with data from a series of FFPE biopsies and surgical samples. We have used various levels of sample multiplexing to demonstrate the adjustable resolution of the methodology, depending on the number of samples and available resources. We also demonstrate reproducibility by use of replicate samples and comparison with microarray-based comparative genomic hybridization (aCGH) and digital PCR. This technique can be valuable in both the analysis of routine diagnostic samples and in examining large repositories of fixed archival material.

  18. De novo assembly and characterization of the Trichuris trichiura adult worm transcriptome using Ion Torrent sequencing.

    PubMed

    Santos, Leonardo N; Silva, Eduardo S; Santos, André S; De Sá, Pablo H; Ramos, Rommel T; Silva, Artur; Cooper, Philip J; Barreto, Maurício L; Loureiro, Sebastião; Pinheiro, Carina S; Alcantara-Neves, Neuza M; Pacheco, Luis G C

    2016-07-01

    Infection with helminthic parasites, including the soil-transmitted helminth Trichuris trichiura (human whipworm), has been shown to modulate host immune responses and, consequently, to have an impact on the development and manifestation of chronic human inflammatory diseases. De novo derivation of helminth proteomes from sequencing of transcriptomes will provide valuable data to aid identification of parasite proteins that could be evaluated as potential immunotherapeutic molecules in near future. Herein, we characterized the transcriptome of the adult stage of the human whipworm T. trichiura, using next-generation sequencing technology and a de novo assembly strategy. Nearly 17.6 million high-quality clean reads were assembled into 6414 contiguous sequences, with an N50 of 1606bp. In total, 5673 protein-encoding sequences were confidentially identified in the T. trichiura adult worm transcriptome; of these, 1013 sequences represent potential newly discovered proteins for the species, most of which presenting orthologs already annotated in the related species T. suis. A number of transcripts representing probable novel non-coding transcripts for the species T. trichiura were also identified. Among the most abundant transcripts, we found sequences that code for proteins involved in lipid transport, such as vitellogenins, and several chitin-binding proteins. Through a cross-species expression analysis of gene orthologs shared by T. trichiura and the closely related parasites T. suis and T. muris it was possible to find twenty-six protein-encoding genes that are consistently highly expressed in the adult stages of the three helminth species. Additionally, twenty transcripts could be identified that code for proteins previously detected by mass spectrometry analysis of protein fractions of the whipworm somatic extract that present immunomodulatory activities. Five of these transcripts were amongst the most highly expressed protein-encoding sequences in the T. trichiura adult worm. Besides, orthologs of proteins demonstrated to have potent immunomodulatory properties in related parasitic helminths were also predicted from the T. trichiura de novo assembled transcriptome. Copyright © 2016. Published by Elsevier B.V.

  19. Genetic characterization of L-Zagreb mumps vaccine strain.

    PubMed

    Ivancic, Jelena; Gulija, Tanja Kosutic; Forcic, Dubravko; Baricevic, Marijana; Jug, Renata; Mesko-Prejac, Majda; Mazuran, Renata

    2005-04-01

    Eleven mumps vaccine strains, all containing live attenuated virus, have been used throughout the world. Although L-Zagreb mumps vaccine has been licensed since 1972, only its partial nucleotide sequence was previously determined (accession numbers , and ). Therefore, we sequenced the entire genome of L-Zagreb vaccine strain (Institute of Immunology Inc., Zagreb, Croatia). In order to investigate the genetic stability of the vaccine, sequences of both L-Zagreb master seed and currently produced vaccine batch were determined and no difference between them was observed. A phylogenetic analysis based on SH gene sequence has shown that L-Zagreb strain does not belong to any of established mumps genotypes and that it is most similar to old, laboratory preserved European strains (1950s-1970s). L-Zagreb nucleotide and deduced protein sequences were compared with other mumps virus sequences obtained from the GenBank. Emphasis was put on functionally important protein regions and known antigenic epitopes. The extensive comparisons of nucleotide and deduced protein sequences between L-Zagreb vaccine strain and other previously determined mumps virus sequences have shown that while the functional regions of HN, V, and L proteins are well conserved among various mumps strains, there can be a substantial amino acid difference in antigenic epitopes of all proteins and in functional regions of F protein. No molecular pattern was identified that can be used as a distinction marker between virulent and attenuated strains.

  20. An Alignment-Free Algorithm in Comparing the Similarity of Protein Sequences Based on Pseudo-Markov Transition Probabilities among Amino Acids

    PubMed Central

    Li, Yushuang; Yang, Jiasheng; Zhang, Yi

    2016-01-01

    In this paper, we have proposed a novel alignment-free method for comparing the similarity of protein sequences. We first encode a protein sequence into a 440 dimensional feature vector consisting of a 400 dimensional Pseudo-Markov transition probability vector among the 20 amino acids, a 20 dimensional content ratio vector, and a 20 dimensional position ratio vector of the amino acids in the sequence. By evaluating the Euclidean distances among the representing vectors, we compare the similarity of protein sequences. We then apply this method into the ND5 dataset consisting of the ND5 protein sequences of 9 species, and the F10 and G11 datasets representing two of the xylanases containing glycoside hydrolase families, i.e., families 10 and 11. As a result, our method achieves a correlation coefficient of 0.962 with the canonical protein sequence aligner ClustalW in the ND5 dataset, much higher than those of other 5 popular alignment-free methods. In addition, we successfully separate the xylanases sequences in the F10 family and the G11 family and illustrate that the F10 family is more heat stable than the G11 family, consistent with a few previous studies. Moreover, we prove mathematically an identity equation involving the Pseudo-Markov transition probability vector and the amino acids content ratio vector. PMID:27918587

  1. DNA sequence analysis of a 10 624 bp fragment of the left arm of chromosome XV from Saccharomyces cerevisiae reveals a RNA binding protein, a mitochondrial protein, two ribosomal proteins and two new open reading frames.

    PubMed

    Lafuente, M J; Gamo, F J; Gancedo, C

    1996-09-01

    We have determined the sequence of a 10624 bp DNA segment located in the left arm of chromosome XV of Saccharomyces cerevisiae. The sequence contains eight open reading frames (ORFs) longer than 100 amino acids. Two of them do not present significant homology with sequences found in the databases. The product of ORF o0553 is identical to the protein encoded by the gene SMF1. Internal to it there is another ORF, o0555 that is apparently expressed. The proteins encoded by ORFs o0559 and o0565 are identical to ribosomal proteins S19.e and L18 respectively. ORF o0550 encodes a protein with an RNA binding signature including RNP motifs and stretches rich in asparagine, glutamine and arginine.

  2. Plastid-targeting peptides from the chlorarachniophyte Bigelowiella natans.

    PubMed

    Rogers, Matthew B; Archibald, John M; Field, Matthew A; Li, Catherine; Striepen, Boris; Keeling, Patrick J

    2004-01-01

    Chlorarachniophytes are marine amoeboflagellate protists that have acquired their plastid (chloroplast) through secondary endosymbiosis with a green alga. Like other algae, most of the proteins necessary for plastid function are encoded in the nuclear genome of the secondary host. These proteins are targeted to the organelle using a bipartite leader sequence consisting of a signal peptide (allowing entry in to the endomembrane system) and a chloroplast transit peptide (for transport across the chloroplast envelope membranes). We have examined the leader sequences from 45 full-length predicted plastid-targeted proteins from the chlorarachniophyte Bigelowiella natans with the goal of understanding important features of these sequences and possible conserved motifs. The chemical characteristics of these sequences were compared with a set of 10 B. natans endomembrane-targeted proteins and 38 cytosolic or nuclear proteins, which show that the signal peptides are similar to those of most other eukaryotes, while the transit peptides differ from those of other algae in some characteristics. Consistent with this, the leader sequence from one B. natans protein was tested for function in the apicomplexan parasite, Toxoplasma gondii, and shown to direct the secretion of the protein.

  3. Use of signal sequences as an in situ removable sequence element to stimulate protein synthesis in cell-free extracts

    PubMed Central

    Ahn, Jin-Ho; Hwang, Mi-Yeon; Lee, Kyung-Ho; Choi, Cha-Yong; Kim, Dong-Myung

    2007-01-01

    This study developed a method to boost the expression of recombinant proteins in a cell-free protein synthesis system without leaving additional amino acid residues. It was found that the nucleotide sequences of the signal peptides serve as an efficient downstream box to stimulate protein synthesis when they were fused upstream of the target genes. The extent of stimulation was critically affected by the identity of the second codons of the signal sequences. Moreover, the yield of the synthesized protein was enhanced by as much as 10 times in the presence of an optimal second codon. The signal peptides were in situ cleaved and the target proteins were produced in their native sizes by carrying out the cell-free synthesis reactions in the presence of Triton X-100, most likely through the activation of signal peptidase in the S30 extract. The amplification of the template DNA and the addition of the signal sequences were accomplished by PCR. Hence, elevated levels of recombinant proteins were generated within several hours. PMID:17185295

  4. Investigating Correlation between Protein Sequence Similarity and Semantic Similarity Using Gene Ontology Annotations.

    PubMed

    Ikram, Najmul; Qadir, Muhammad Abdul; Afzal, Muhammad Tanvir

    2018-01-01

    Sequence similarity is a commonly used measure to compare proteins. With the increasing use of ontologies, semantic (function) similarity is getting importance. The correlation between these measures has been applied in the evaluation of new semantic similarity methods, and in protein function prediction. In this research, we investigate the relationship between the two similarity methods. The results suggest absence of a strong correlation between sequence and semantic similarities. There is a large number of proteins with low sequence similarity and high semantic similarity. We observe that Pearson's correlation coefficient is not sufficient to explain the nature of this relationship. Interestingly, the term semantic similarity values above 0 and below 1 do not seem to play a role in improving the correlation. That is, the correlation coefficient depends only on the number of common GO terms in proteins under comparison, and the semantic similarity measurement method does not influence it. Semantic similarity and sequence similarity have a distinct behavior. These findings are of significant effect for future works on protein comparison, and will help understand the semantic similarity between proteins in a better way.

  5. Tandem mass spectrometry for the detection of plant pathogenic fungi and the effects of database composition on protein inferences.

    PubMed

    Padliya, Neerav D; Garrett, Wesley M; Campbell, Kimberly B; Tabb, David L; Cooper, Bret

    2007-11-01

    LC-MS/MS has demonstrated potential for detecting plant pathogens. Unlike PCR or ELISA, LC-MS/MS does not require pathogen-specific reagents for the detection of pathogen-specific proteins and peptides. However, the MS/MS approach we and others have explored does require a protein sequence reference database and database-search software to interpret tandem mass spectra. To evaluate the limitations of database composition on pathogen identification, we analyzed proteins from cultured Ustilago maydis, Phytophthora sojae, Fusarium graminearum, and Rhizoctonia solani by LC-MS/MS. When the search database did not contain sequences for a target pathogen, or contained sequences to related pathogens, target pathogen spectra were reliably matched to protein sequences from nontarget organisms, giving an illusion that proteins from nontarget organisms were identified. Our analysis demonstrates that when database-search software is used as part of the identification process, a paradox exists whereby additional sequences needed to detect a wide variety of possible organisms may lead to more cross-species protein matches and misidentification of pathogens.

  6. BayesMotif: de novo protein sorting motif discovery from impure datasets.

    PubMed

    Hu, Jianjun; Zhang, Fan

    2010-01-18

    Protein sorting is the process that newly synthesized proteins are transported to their target locations within or outside of the cell. This process is precisely regulated by protein sorting signals in different forms. A major category of sorting signals are amino acid sub-sequences usually located at the N-terminals or C-terminals of protein sequences. Genome-wide experimental identification of protein sorting signals is extremely time-consuming and costly. Effective computational algorithms for de novo discovery of protein sorting signals is needed to improve the understanding of protein sorting mechanisms. We formulated the protein sorting motif discovery problem as a classification problem and proposed a Bayesian classifier based algorithm (BayesMotif) for de novo identification of a common type of protein sorting motifs in which a highly conserved anchor is present along with a less conserved motif regions. A false positive removal procedure is developed to iteratively remove sequences that are unlikely to contain true motifs so that the algorithm can identify motifs from impure input sequences. Experiments on both implanted motif datasets and real-world datasets showed that the enhanced BayesMotif algorithm can identify anchored sorting motifs from pure or impure protein sequence dataset. It also shows that the false positive removal procedure can help to identify true motifs even when there is only 20% of the input sequences containing true motif instances. We proposed BayesMotif, a novel Bayesian classification based algorithm for de novo discovery of a special category of anchored protein sorting motifs from impure datasets. Compared to conventional motif discovery algorithms such as MEME, our algorithm can find less-conserved motifs with short highly conserved anchors. Our algorithm also has the advantage of easy incorporation of additional meta-sequence features such as hydrophobicity or charge of the motifs which may help to overcome the limitations of PWM (position weight matrix) motif model.

  7. System in biology leading to cell pathology: stable protein-protein interactions after covalent modifications by small molecules or in transgenic cells.

    PubMed

    Malina, Halina Z

    2011-01-19

    The physiological processes in the cell are regulated by reversible, electrostatic protein-protein interactions. Apoptosis is such a regulated process, which is critically important in tissue homeostasis and development and leads to complete disintegration of the cell. Pathological apoptosis, a process similar to apoptosis, is associated with aging and infection. The current study shows that pathological apoptosis is a process caused by the covalent interactions between the signaling proteins, and a characteristic of this pathological network is the covalent binding of calmodulin to regulatory sequences. Small molecules able to bind covalently to the amino group of lysine, histidine, arginine, or glutamine modify the regulatory sequences of the proteins. The present study analyzed the interaction of calmodulin with the BH3 sequence of Bax, and the calmodulin-binding sequence of myristoylated alanine-rich C-kinase substrate in the presence of xanthurenic acid in primary retinal epithelium cell cultures and murine epithelial fibroblast cell lines transformed with SV40 (wild type [WT], Bid knockout [Bid-/-], and Bax-/-/Bak-/- double knockout [DKO]). Cell death was observed to be associated with the covalent binding of calmodulin, in parallel, to the regulatory sequences of proteins. Xanthurenic acid is known to activate caspase-3 in primary cell cultures, and the results showed that this activation is also observed in WT and Bid-/- cells, but not in DKO cells. However, DKO cells were not protected against death, but high rates of cell death occurred by detachment. The results showed that small molecules modify the basic amino acids in the regulatory sequences of proteins leading to covalent interactions between the modified sequences (e.g., calmodulin to calmodulin-binding sites). The formation of these polymers (aggregates) leads to an unregulated and, consequently, pathological protein network. The results suggest a mechanism for the involvement of small molecules in disease development. In the knockout cells, incorrect interactions between proteins were observed without the protein modification by small molecules, indicating the abnormality of the protein network in the transgenic system. The irreversible protein-protein interactions lead to protein aggregation and cell degeneration, which are observed in all aging-associated diseases.

  8. System in biology leading to cell pathology: stable protein-protein interactions after covalent modifications by small molecules or in transgenic cells

    PubMed Central

    2011-01-01

    Background The physiological processes in the cell are regulated by reversible, electrostatic protein-protein interactions. Apoptosis is such a regulated process, which is critically important in tissue homeostasis and development and leads to complete disintegration of the cell. Pathological apoptosis, a process similar to apoptosis, is associated with aging and infection. The current study shows that pathological apoptosis is a process caused by the covalent interactions between the signaling proteins, and a characteristic of this pathological network is the covalent binding of calmodulin to regulatory sequences. Results Small molecules able to bind covalently to the amino group of lysine, histidine, arginine, or glutamine modify the regulatory sequences of the proteins. The present study analyzed the interaction of calmodulin with the BH3 sequence of Bax, and the calmodulin-binding sequence of myristoylated alanine-rich C-kinase substrate in the presence of xanthurenic acid in primary retinal epithelium cell cultures and murine epithelial fibroblast cell lines transformed with SV40 (wild type [WT], Bid knockout [Bid-/-], and Bax-/-/Bak-/- double knockout [DKO]). Cell death was observed to be associated with the covalent binding of calmodulin, in parallel, to the regulatory sequences of proteins. Xanthurenic acid is known to activate caspase-3 in primary cell cultures, and the results showed that this activation is also observed in WT and Bid-/- cells, but not in DKO cells. However, DKO cells were not protected against death, but high rates of cell death occurred by detachment. Conclusions The results showed that small molecules modify the basic amino acids in the regulatory sequences of proteins leading to covalent interactions between the modified sequences (e.g., calmodulin to calmodulin-binding sites). The formation of these polymers (aggregates) leads to an unregulated and, consequently, pathological protein network. The results suggest a mechanism for the involvement of small molecules in disease development. In the knockout cells, incorrect interactions between proteins were observed without the protein modification by small molecules, indicating the abnormality of the protein network in the transgenic system. The irreversible protein-protein interactions lead to protein aggregation and cell degeneration, which are observed in all aging-associated diseases. PMID:21247434

  9. The complete CDS of the prion protein (PRNP) gene of African lion (Panthera leo).

    PubMed

    Maj, Andrzej; Spellman, Garth M; Sarver, Shane K

    2008-04-01

    We provide the complete PRNP CDS sequence for the African lion, which is different from the previously published sequence and more similar to other carnivore sequences. The newly obtained prion protein sequence differs from the domestic cat sequence at three amino acid positions and contains only four octapeptide repeats. We recommend that this sequence be used as the reference sequence for future studies of the PRNP gene for this species.

  10. Sequence analysis and expression of the M1 and M2 matrix protein genes of hirame rhabdovirus (HIRRV)

    USGS Publications Warehouse

    Nishizawa, T.; Kurath, G.; Winton, J.R.

    1997-01-01

    We have cloned and sequenced a 2318 nucleotide region of the genomic RNA of hirame rhabdovirus (HIRRV), an important viral pathogen of Japanese flounder Paralichthys olivaceus. This region comprises approximately two-thirds of the 3' end of the nucleocapsid protein (N) gene and the complete matrix protein (M1 and M2) genes with the associated intergenic regions. The partial N gene sequence was 812 nucleotides in length with an open reading frame (ORF) that encoded the carboxyl-terminal 250 amino acids of the N protein. The M1 and M2 genes were 771 and 700 nucleotides in length, respectively, with ORFs encoding proteins of 227 and 193 amino acids. The M1 gene sequence contained an additional small ORF that could encode a highly basic, arginine-rich protein of 25 amino acids. Comparisons of the N, M1, and M2 gene sequences of HIRRV with the corresponding sequences of the fish rhabdoviruses, infectious hematopoietic necrosis virus (IHNV) or viral hemorrhagic septicemia virus (VHSV) indicated that HIRRV was more closely related to IHNV than to VHSV, but was clearly distinct from either. The putative consensus gene termination sequence for IHNV and VHSV, AGAYAG(A)(7), was present in the N-M1, M1-M2, and M2-G intergenic regions of HIRRV as were the putative transcription initiation sequences YGGCAC and AACA. An Escherichia coli expression system was used to produce recombinant proteins from the M1 and M2 genes of HIRRV. These were the same size as the authentic M1 and M2 proteins and reacted with anti-HIRRV rabbit serum in western blots. These reagents can be used for further study of the fish immune response and to test novel control methods.

  11. An Evolution-Based Approach to De Novo Protein Design and Case Study on Mycobacterium tuberculosis

    PubMed Central

    Brender, Jeffrey R.; Czajka, Jeff; Marsh, David; Gray, Felicia; Cierpicki, Tomasz; Zhang, Yang

    2013-01-01

    Computational protein design is a reverse procedure of protein folding and structure prediction, where constructing structures from evolutionarily related proteins has been demonstrated to be the most reliable method for protein 3-dimensional structure prediction. Following this spirit, we developed a novel method to design new protein sequences based on evolutionarily related protein families. For a given target structure, a set of proteins having similar fold are identified from the PDB library by structural alignments. A structural profile is then constructed from the protein templates and used to guide the conformational search of amino acid sequence space, where physicochemical packing is accommodated by single-sequence based solvation, torsion angle, and secondary structure predictions. The method was tested on a computational folding experiment based on a large set of 87 protein structures covering different fold classes, which showed that the evolution-based design significantly enhances the foldability and biological functionality of the designed sequences compared to the traditional physics-based force field methods. Without using homologous proteins, the designed sequences can be folded with an average root-mean-square-deviation of 2.1 Å to the target. As a case study, the method is extended to redesign all 243 structurally resolved proteins in the pathogenic bacteria Mycobacterium tuberculosis, which is the second leading cause of death from infectious disease. On a smaller scale, five sequences were randomly selected from the design pool and subjected to experimental validation. The results showed that all the designed proteins are soluble with distinct secondary structure and three have well ordered tertiary structure, as demonstrated by circular dichroism and NMR spectroscopy. Together, these results demonstrate a new avenue in computational protein design that uses knowledge of evolutionary conservation from protein structural families to engineer new protein molecules of improved fold stability and biological functionality. PMID:24204234

  12. Prediction of Protein Structural Classes for Low-Similarity Sequences Based on Consensus Sequence and Segmented PSSM.

    PubMed

    Liang, Yunyun; Liu, Sanyang; Zhang, Shengli

    2015-01-01

    Prediction of protein structural classes for low-similarity sequences is useful for understanding fold patterns, regulation, functions, and interactions of proteins. It is well known that feature extraction is significant to prediction of protein structural class and it mainly uses protein primary sequence, predicted secondary structure sequence, and position-specific scoring matrix (PSSM). Currently, prediction solely based on the PSSM has played a key role in improving the prediction accuracy. In this paper, we propose a novel method called CSP-SegPseP-SegACP by fusing consensus sequence (CS), segmented PsePSSM, and segmented autocovariance transformation (ACT) based on PSSM. Three widely used low-similarity datasets (1189, 25PDB, and 640) are adopted in this paper. Then a 700-dimensional (700D) feature vector is constructed and the dimension is decreased to 224D by using principal component analysis (PCA). To verify the performance of our method, rigorous jackknife cross-validation tests are performed on 1189, 25PDB, and 640 datasets. Comparison of our results with the existing PSSM-based methods demonstrates that our method achieves the favorable and competitive performance. This will offer an important complementary to other PSSM-based methods for prediction of protein structural classes for low-similarity sequences.

  13. DeepText2GO: Improving large-scale protein function prediction with deep semantic text representation.

    PubMed

    You, Ronghui; Huang, Xiaodi; Zhu, Shanfeng

    2018-06-06

    As of April 2018, UniProtKB has collected more than 115 million protein sequences. Less than 0.15% of these proteins, however, have been associated with experimental GO annotations. As such, the use of automatic protein function prediction (AFP) to reduce this huge gap becomes increasingly important. The previous studies conclude that sequence homology based methods are highly effective in AFP. In addition, mining motif, domain, and functional information from protein sequences has been found very helpful for AFP. Other than sequences, alternative information sources such as text, however, may be useful for AFP as well. Instead of using BOW (bag of words) representation in traditional text-based AFP, we propose a new method called DeepText2GO that relies on deep semantic text representation, together with different kinds of available protein information such as sequence homology, families, domains, and motifs, to improve large-scale AFP. Furthermore, DeepText2GO integrates text-based methods with sequence-based ones by means of a consensus approach. Extensive experiments on the benchmark dataset extracted from UniProt/SwissProt have demonstrated that DeepText2GO significantly outperformed both text-based and sequence-based methods, validating its superiority. Copyright © 2018 Elsevier Inc. All rights reserved.

  14. Transitive homology-guided structural studies lead to discovery of Cro proteins with 40% sequence identity but different folds

    PubMed Central

    Roessler, Christian G.; Hall, Branwen M.; Anderson, William J.; Ingram, Wendy M.; Roberts, Sue A.; Montfort, William R.; Cordes, Matthew H. J.

    2008-01-01

    Proteins that share common ancestry may differ in structure and function because of divergent evolution of their amino acid sequences. For a typical diverse protein superfamily, the properties of a few scattered members are known from experiment. A satisfying picture of functional and structural evolution in relation to sequence changes, however, may require characterization of a larger, well chosen subset. Here, we employ a “stepping-stone” method, based on transitive homology, to target sequences intermediate between two related proteins with known divergent properties. We apply the approach to the question of how new protein folds can evolve from preexisting folds and, in particular, to an evolutionary change in secondary structure and oligomeric state in the Cro family of bacteriophage transcription factors, initially identified by sequence-structure comparison of distant homologs from phages P22 and λ. We report crystal structures of two Cro proteins, Xfaso 1 and Pfl 6, with sequences intermediate between those of P22 and λ. The domains show 40% sequence identity but differ by switching of α-helix to β-sheet in a C-terminal region spanning ≈25 residues. Sedimentation analysis also suggests a correlation between helix-to-sheet conversion and strengthened dimerization. PMID:18227506

  15. The SUPERFAMILY database in 2004: additions and improvements.

    PubMed

    Madera, Martin; Vogel, Christine; Kummerfeld, Sarah K; Chothia, Cyrus; Gough, Julian

    2004-01-01

    The SUPERFAMILY database provides structural assignments to protein sequences and a framework for analysis of the results. At the core of the database is a library of profile Hidden Markov Models that represent all proteins of known structure. The library is based on the SCOP classification of proteins: each model corresponds to a SCOP domain and aims to represent an entire superfamily. We have applied the library to predicted proteins from all completely sequenced genomes (currently 154), the Swiss-Prot and TrEMBL databases and other sequence collections. Close to 60% of all proteins have at least one match, and one half of all residues are covered by assignments. All models and full results are available for download and online browsing at http://supfam.org. Users can study the distribution of their superfamily of interest across all completely sequenced genomes, investigate with which other superfamilies it combines and retrieve proteins in which it occurs. Alternatively, concentrating on a particular genome as a whole, it is possible first, to find out its superfamily composition, and secondly, to compare it with that of other genomes to detect superfamilies that are over- or under-represented. In addition, the webserver provides the following standard services: sequence search; keyword search for genomes, superfamilies and sequence identifiers; and multiple alignment of genomic, PDB and custom sequences.

  16. DNA-binding proteins from marine bacteria expand the known sequence diversity of TALE-like repeats

    PubMed Central

    de Lange, Orlando; Wolf, Christina; Thiel, Philipp; Krüger, Jens; Kleusch, Christian; Kohlbacher, Oliver; Lahaye, Thomas

    2015-01-01

    Transcription Activator-Like Effectors (TALEs) of Xanthomonas bacteria are programmable DNA binding proteins with unprecedented target specificity. Comparative studies into TALE repeat structure and function are hindered by the limited sequence variation among TALE repeats. More sequence-diverse TALE-like proteins are known from Ralstonia solanacearum (RipTALs) and Burkholderia rhizoxinica (Bats), but RipTAL and Bat repeats are conserved with those of TALEs around the DNA-binding residue. We study two novel marine-organism TALE-like proteins (MOrTL1 and MOrTL2), the first to date of non-terrestrial origin. We have assessed their DNA-binding properties and modelled repeat structures. We found that repeats from these proteins mediate sequence specific DNA binding conforming to the TALE code, despite low sequence similarity to TALE repeats, and with novel residues around the BSR. However, MOrTL1 repeats show greater sequence discriminating power than MOrTL2 repeats. Sequence alignments show that there are only three residues conserved between repeats of all TALE-like proteins including the two new additions. This conserved motif could prove useful as an identifier for future TALE-likes. Additionally, comparing MOrTL repeats with those of other TALE-likes suggests a common evolutionary origin for the TALEs, RipTALs and Bats. PMID:26481363

  17. Saccharose solid matrix embedded proteins: a new method for sample preparation for X-ray absorption spectroscopy.

    PubMed

    Ascone, I; Sabatucci, A; Bubacco, L; Di Muro, P; Salvato, B

    2000-01-01

    In this study, solid samples of hemoglobin and hemocyanin have been prepared by embedding the proteins into a saccharose-based matrix. These materials have been developed specifically for specimens for X-ray absorption spectroscopy (XAS). The preservation of protein conformation and active site organization was tested, making comparisons between the solid and the corresponding liquid samples, using resonance Raman, infra red, fluorescence and XAS. The XAS spectra of irradiated solid and liquid samples were then compared, and the preservation of biological activity of the proteins during both preparation procedure and X-ray irradiation was assessed. In all cases, the measurements clearly demonstrate that protein solid samples are both structurally and functionally quite well preserved, much better than those in the liquid state. The saccharose matrix provides an excellent protection against X-ray damages, allowing for longer exposure to the X-ray beam. Moreover, the demonstrated long-term stability of samples permits their preparation and storage in optimal conditions, allowing for the repetition of data collection with the same sample in several experimental sessions. The very high protein concentration that can be reached results in a significantly better signal-to-noise ratio, particularly useful for high molecular weight proteins with a low metal-to-protein ratio. On the bases of the above-mentioned results, we propose the new method as a standard procedure for the preparation of biological samples to be used for XAS spectroscopy.

  18. Molecular characterization of an ependymin precursor from goldfish brain.

    PubMed

    Königstorfer, A; Sterrer, S; Eckerskorn, C; Lottspeich, F; Schmidt, R; Hoffmann, W

    1989-01-01

    Ependymins are thought to be implicated in fundamental processes involved in plasticity of the goldfish CNS. Gas-phase sequencing of purified ependymins beta and gamma revealed that they share the same N-terminal sequence. Each sequence displays microheterogeneities at several positions. Based on the protein sequences obtained, we constructed synthetic oligonucleotides and used them as hybridization probes for screening cDNA libraries of goldfish brain. In this article we describe the full-length sequence of a mRNA encoding a precursor of ependymins. A cleavable signal sequence characteristic of secretory proteins is located at the N-terminal end, followed directly by the ependymin sequence. Also, two potential N-glycosylation sites were detected. A computer search revealed that ependymins form a novel family of unique proteins.

  19. Capturing Attention When Attention "Blinks"

    ERIC Educational Resources Information Center

    Wee, Serena; Chua, Fook K.

    2004-01-01

    Four experiments addressed the question of whether attention may be captured when the visual system is in the midst of an attentional blink (AB). Participants identified 2 target letters embedded among distractor letters in a rapid serial visual presentation sequence. In some trials, a square frame was inserted between the targets; as the only…

  20. Molecular Beam Epitaxial Materials Study for Microwave and Millimeter Wave Devices.

    DTIC Science & Technology

    1978-10-01

    competing for domi- nance with any given set of system components and deposition sequence. The evidence indicates that BeO substrate heaters contribute...34Single- Tranverse -Mode Injection Lasers with Embedded Stripe Layer Grown by Molecular Beam Epitaxy," Appl. Phys. Lett., 29, pp. 164-166 (1976). 178

Top