Science.gov

Sample records for acid sequence identified

  1. Method for identifying and quantifying nucleic acid sequence aberrations

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1998-07-21

    A method is disclosed for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe. 11 figs.

  2. Method for identifying and quantifying nucleic acid sequence aberrations

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1998-01-01

    A method for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe.

  3. Modulation of anti-endotoxin property of Temporin L by minor amino acid substitution in identified phenylalanine zipper sequence.

    PubMed

    Srivastava, Saurabh; Kumar, Amit; Tripathi, Amit Kumar; Tandon, Anshika; Ghosh, Jimut Kanti

    2016-11-01

    A 13-residue frog antimicrobial peptide Temporin L (TempL) possesses versatile antimicrobial activities and is considered a lead molecule for the development of new antimicrobial agents. To find out the amino acid sequences that influence the anti-microbial property of TempL, a phenylalanine zipper-like sequence was identified in it which was not reported earlier. Several alanine-substituted analogs and a scrambled peptide having the same composition of TempL were designed for evaluating the role of this motif. To investigate whether leucine residues instead of phenylalanine residues at 'a' and/or 'd' position(s) of the heptad repeat sequence could alter its antimicrobial property, several TempL analogs were synthesized after replacing these phenylalanine residues with leucine residues. Replacing phenylalanine residues with alanine residues in the phenylalanine zipper sequence significantly compromised the anti-endotoxin property of TempL. This is evident from the higher production of tumor necrosis factor-α and interleukin-6 in lipopolysaccharide (LPS)-stimulated rat bone-marrow-derived macrophage cells in the presence of its alanine-substituted analogs than TempL itself. However, replacement of these phenylalanine residues with leucine residues significantly augmented anti-endotoxin property of TempL. A single alanine-substituted TempL analog (F8A-TempL) showed significantly reduced cytotoxicity but retained the antibacterial activity of TempL, while the two single leucine-substituted analogs (F5L-TempL and F8L-TempL), although exhibiting lower cytotoxicity, were able to retain the antibacterial activity of the parent peptide. The results demonstrate how minor amino acid substitutions in the identified phenylalanine zipper sequence in TempL could yield analogs with better antibacterial and/or anti-endotoxin properties with their plausible mechanism of action.

  4. Composition for nucleic acid sequencing

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2008-08-26

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  5. High speed nucleic acid sequencing

    SciTech Connect

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid. Each type of labeled nucleotide comprises an acceptor fluorophore attached to a phosphate portion of the nucleotide such that the fluorophore is removed upon incorporation into a growing strand. Fluorescent signal is emitted via fluorescent resonance energy transfer between the donor fluorophore and the acceptor fluorophore as each nucleotide is incorporated into the growing strand. The sequence is deduced by identifying which base is being incorporated into the growing strand.

  6. De novo Sequencing and Transcriptome Analysis of Pinellia ternata Identify the Candidate Genes Involved in the Biosynthesis of Benzoic Acid and Ephedrine

    PubMed Central

    Zhang, Guang-hui; Jiang, Ni-hao; Song, Wan-ling; Ma, Chun-hua; Yang, Sheng-chao; Chen, Jun-wen

    2016-01-01

    Background: The medicinal herb, Pinellia ternata, is purported to be an anti-emetic with analgesic and sedative effects. Alkaloids are the main biologically active compounds in P. ternata, especially ephedrine that is a phenylpropylamino alkaloid specifically produced by Ephedra and Catha edulis. However, how ephedrine is synthesized in plants is uncertain. Only the phenylalanine ammonia lyase (PAL) and relevant genes in this pathway have been characterized. Genomic information of P. ternata is also unavailable. Results: We analyzed the transcriptome of the tuber of P. ternata with the Illumina HiSeq™ 2000 sequencing platform. 66,813,052 high-quality reads were generated, and these reads were assembled de novo into 89,068 unigenes. Most known genes involved in benzoic acid biosynthesis were identified in the unigene dataset of P. ternata, and the expression patterns of some ephedrine biosynthesis-related genes were analyzed by reverse transcription quantitative real-time PCR (RT-qPCR). Also, 14,468 simple sequence repeats (SSRs) were identified from 12,000 unigenes. Twenty primer pairs for SSRs were randomly selected for the validation of their amplification effect. Conclusion: RNA-seq data was used for the first time to provide a comprehensive gene information on P. ternata at the transcriptional level. These data will advance molecular genetics in this valuable medicinal plant. PMID:27579029

  7. RNA sequencing identifies upregulated kyphoscoliosis peptidase and phosphatidic acid signaling pathways in muscle hypertrophy generated by transgenic expression of myostatin propeptide.

    PubMed

    Miao, Yuanxin; Yang, Jinzeng; Xu, Zhong; Jing, Lu; Zhao, Shuhong; Li, Xinyun

    2015-04-09

    Myostatin (MSTN), a member of the transforming growth factor-β superfamily, plays a crucial negative role in muscle growth. MSTN mutations or inhibitions can dramatically increase muscle mass in most mammal species. Previously, we generated a transgenic mouse model of muscle hypertrophy via the transgenic expression of the MSTN N-terminal propeptide cDNA under the control of the skeletal muscle-specific MLC1 promoter. Here, we compare the mRNA profiles between transgenic mice and wild-type littermate controls with a high-throughput RNA sequencing method. The results show that 132 genes were significantly differentially expressed between transgenic mice and wild-type control mice; 97 of these genes were up-regulated, and 35 genes were down-regulated in the skeletal muscle. Several genes that had not been reported to be involved in muscle hypertrophy were identified, including up-regulated myosin binding protein H (mybph), and zinc metallopeptidase STE24 (Zmpste24). In addition, kyphoscoliosis peptidase (Ky), which plays a vital role in muscle growth, was also up-regulated in the transgenic mice. Interestingly, a pathway analysis based on grouping the differentially expressed genes uncovered that cardiomyopathy-related pathways and phosphatidic acid (PA) pathways (Dgki, Dgkz, Plcd4) were up-regulated. Increased PA signaling may increase mTOR signaling, resulting in skeletal muscle growth. The findings of the RNA sequencing analysis help to understand the molecular mechanisms of muscle hypertrophy caused by MSTN inhibition.

  8. RNA Sequencing Identifies Upregulated Kyphoscoliosis Peptidase and Phosphatidic Acid Signaling Pathways in Muscle Hypertrophy Generated by Transgenic Expression of Myostatin Propeptide

    PubMed Central

    Miao, Yuanxin; Yang, Jinzeng; Xu, Zhong; Jing, Lu; Zhao, Shuhong; Li, Xinyun

    2015-01-01

    Myostatin (MSTN), a member of the transforming growth factor-β superfamily, plays a crucial negative role in muscle growth. MSTN mutations or inhibitions can dramatically increase muscle mass in most mammal species. Previously, we generated a transgenic mouse model of muscle hypertrophy via the transgenic expression of the MSTN N-terminal propeptide cDNA under the control of the skeletal muscle-specific MLC1 promoter. Here, we compare the mRNA profiles between transgenic mice and wild-type littermate controls with a high-throughput RNA sequencing method. The results show that 132 genes were significantly differentially expressed between transgenic mice and wild-type control mice; 97 of these genes were up-regulated, and 35 genes were down-regulated in the skeletal muscle. Several genes that had not been reported to be involved in muscle hypertrophy were identified, including up-regulated myosin binding protein H (mybph), and zinc metallopeptidase STE24 (Zmpste24). In addition, kyphoscoliosis peptidase (Ky), which plays a vital role in muscle growth, was also up-regulated in the transgenic mice. Interestingly, a pathway analysis based on grouping the differentially expressed genes uncovered that cardiomyopathy-related pathways and phosphatidic acid (PA) pathways (Dgki, Dgkz, Plcd4) were up-regulated. Increased PA signaling may increase mTOR signaling, resulting in skeletal muscle growth. The findings of the RNA sequencing analysis help to understand the molecular mechanisms of muscle hypertrophy caused by MSTN inhibition. PMID:25860951

  9. Method for sequencing nucleic acid molecules

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2006-05-30

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  10. Method for sequencing nucleic acid molecules

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2006-06-06

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  11. Identifying acid salts of magnesium

    SciTech Connect

    Plumb, R.; Thivierge, R.F. Jr.; Xu, W.W.

    1987-11-05

    In preliminary work they found that significant quantities of certain nitrogen oxides and of sulfuric acid were absorbed by lower hydrates of magnesium sulfate. It appeared that acid salts were being formed but the known chemistry of group IIA (group 2) sulfates and acid sulfates which was worked out many years ago did not provide an explanation of their observations. They developed a new technique for delineating the solidus boundary of ternary mixtures using friability tests and applied it to the systems of interest. Magnesium acid salt hydrates with compositions on the solidus boundary could be readily identified. X-ray powder patterns confirmed the existence of two previously unknown ternary compounds, Mg/sub 2/(HSO/sub 4/)/sub 2/SO/sub 4/ x 4H/sub 2/O and Mg(HSO/sub 4/)/sub 2/ x H/sub 2/SO/sub 4/ x 3H/sub 2/O. Mixed acid sulfate-nitrate-hydrates could be detected but fuming at room temperatures interfered with quantitative determinations of the solidus boundary and X-ray measurements.

  12. Transcriptome sequencing revealed the transcriptional organization at ribosome-mediated attenuation sites in Corynebacterium glutamicum and identified a novel attenuator involved in aromatic amino acid biosynthesis.

    PubMed

    Neshat, Armin; Mentz, Almut; Rückert, Christian; Kalinowski, Jörn

    2014-11-20

    The Gram-positive bacterium Corynebacterium glutamicum belongs to the order Corynebacteriales and is used as a producer of amino acids at industrial scales. Due to its economic importance, gene expression and particularly the regulation of amino acid biosynthesis has been investigated extensively. Applying the high-resolution technique of transcriptome sequencing (RNA-seq), recently a vast amount of data has been generated that was used to comprehensively analyze the C. glutamicum transcriptome. By analyzing RNA-seq data from a small RNA cDNA library of C. glutamicum, short transcripts in the known transcriptional attenuators sites of the trp operon, the ilvBNC operon and the leuA gene were verified. Furthermore, whole transcriptome RNA-seq data were used to elucidate the transcriptional organization of these three amino acid biosynthesis operons. In addition, we discovered and analyzed the novel attenuator aroR, located upstream of the aroF gene (cg1129). The DAHP synthase encoded by aroF catalyzes the first step in aromatic amino acid synthesis. The AroR leader peptide contains the amino acid sequence motif F-Y-F, indicating a regulatory effect by phenylalanine and tyrosine. Analysis by real-time RT-PCR suggests that the attenuator regulates the transcription of aroF in dependence of the cellular amount of tRNA loaded with phenylalanine when comparing a phenylalanine-auxotrophic C. glutamicum mutant fed with limiting and excess amounts of a phenylalanine-containing dipeptide. Additionally, the very interesting finding was made that all analyzed attenuators are leaderless transcripts.

  13. Identifying a base in a nucleic acid

    DOEpatents

    Fodor, Stephen P. A.; Lipshutz, Robert J.; Huang, Xiaohua

    2005-02-08

    Devices and techniques for hybridization of nucleic acids and for determining the sequence of nucleic acids. Arrays of nucleic acids are formed by techniques, preferably high resolution, light-directed techniques. Positions of hybridization of a target nucleic acid are determined by, e.g., epifluorescence microscopy. Devices and techniques are proposed to determine the sequence of a target nucleic acid more efficiently and more quickly through such synthesis and detection techniques.

  14. Methods for analyzing nucleic acid sequences

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid. The method provides a complex comprising a polymerase enzyme, a target nucleic acid molecule, and a primer, wherein the complex is immobilized on a support Fluorescent label is attached to a terminal phosphate group of the nucleotide or nucleotide analog. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The time duration of the signal from labeled nucleotides or nucleotide analogs that become incorporated is distinguished from freely diffusing labels by a longer retention in the observation volume for the nucleotides or nucleotide analogs that become incorporated than for the freely diffusing labels.

  15. Identifying subset errors in multiple sequence alignments.

    PubMed

    Roy, Aparna; Taddese, Bruck; Vohra, Shabana; Thimmaraju, Phani K; Illingworth, Christopher J R; Simpson, Lisa M; Mukherjee, Keya; Reynolds, Christopher A; Chintapalli, Sree V

    2014-01-01

    Multiple sequence alignment (MSA) accuracy is important, but there is no widely accepted method of judging the accuracy that different alignment algorithms give. We present a simple approach to detecting two types of error, namely block shifts and the misplacement of residues within a gap. Given a MSA, subsets of very similar sequences are generated through the use of a redundancy filter, typically using a 70-90% sequence identity cut-off. Subsets thus produced are typically small and degenerate, and errors can be easily detected even by manual examination. The errors, albeit minor, are inevitably associated with gaps in the alignment, and so the procedure is particularly relevant to homology modelling of protein loop regions. The usefulness of the approach is illustrated in the context of the universal but little known [K/R]KLH motif that occurs in intracellular loop 1 of G protein coupled receptors (GPCR); other issues relevant to GPCR modelling are also discussed.

  16. Chip-based sequencing nucleic acids

    DOEpatents

    Beer, Neil Reginald

    2014-08-26

    A system for fast DNA sequencing by amplification of genetic material within microreactors, denaturing, demulsifying, and then sequencing the material, while retaining it in a PCR/sequencing zone by a magnetic field. One embodiment includes sequencing nucleic acids on a microchip that includes a microchannel flow channel in the microchip. The nucleic acids are isolated and hybridized to magnetic nanoparticles or to magnetic polystyrene-coated beads. Microreactor droplets are formed in the microchannel flow channel. The microreactor droplets containing the nucleic acids and the magnetic nanoparticles are retained in a magnetic trap in the microchannel flow channel and sequenced.

  17. Distinguishing Proteins From Arbitrary Amino Acid Sequences

    PubMed Central

    Yau, Stephen S.-T.; Mao, Wei-Guang; Benson, Max; He, Rong Lucy

    2015-01-01

    What kinds of amino acid sequences could possibly be protein sequences? From all existing databases that we can find, known proteins are only a small fraction of all possible combinations of amino acids. Beginning with Sanger's first detailed determination of a protein sequence in 1952, previous studies have focused on describing the structure of existing protein sequences in order to construct the protein universe. No one, however, has developed a criteria for determining whether an arbitrary amino acid sequence can be a protein. Here we show that when the collection of arbitrary amino acid sequences is viewed in an appropriate geometric context, the protein sequences cluster together. This leads to a new computational test, described here, that has proved to be remarkably accurate at determining whether an arbitrary amino acid sequence can be a protein. Even more, if the results of this test indicate that the sequence can be a protein, and it is indeed a protein sequence, then its identity as a protein sequence is uniquely defined. We anticipate our computational test will be useful for those who are attempting to complete the job of discovering all proteins, or constructing the protein universe. PMID:25609314

  18. The complete amino acid sequence of prochymosin.

    PubMed Central

    Foltmann, B; Pedersen, V B; Jacobsen, H; Kauffman, D; Wybrandt, G

    1977-01-01

    The total sequence of 365 amino acid residues in bovine prochymosin is presented. Alignment with the amino acid sequence of porcine pepsinogen shows that 204 amino acid residues are common to the two zymogens. Further comparison and alignment with the amino acid sequence of penicillopepsin shows that 66 residues are located at identical positions in all three proteases. The three enzymes belong to a large group of proteases with two aspartate residues in the active center. This group forms a family derived from one common ancestor. PMID:329280

  19. Dipeptide Sequence Determination: Analyzing Phenylthiohydantoin Amino Acids by HPLC

    NASA Astrophysics Data System (ADS)

    Barton, Janice S.; Tang, Chung-Fei; Reed, Steven S.

    2000-02-01

    Amino acid composition and sequence determination, important techniques for characterizing peptides and proteins, are essential for predicting conformation and studying sequence alignment. This experiment presents improved, fundamental methods of sequence analysis for an upper-division biochemistry laboratory. Working in pairs, students use the Edman reagent to prepare phenylthiohydantoin derivatives of amino acids for determination of the sequence of an unknown dipeptide. With a single HPLC technique, students identify both the N-terminal amino acid and the composition of the dipeptide. This method yields good precision of retention times and allows use of a broad range of amino acids as components of the dipeptide. Students learn fundamental principles and techniques of sequence analysis and HPLC.

  20. Complete genome sequence of southern tomato virus identified from China using next generation sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Complete genome sequence of a double-stranded RNA (dsRNA) virus, southern tomato virus (STV), on tomatoes in China, was elucidated using small RNAs deep sequencing. The identified STV_CN12 shares 99% sequence identity to other isolates from Mexico, France, Spain, and U.S. This is the first report ...

  1. Promoter sequences and algorithmical methods for identifying them.

    PubMed

    Vanet, A; Marsan, L; Sagot, M F

    1999-01-01

    This paper presents a survey of currently available mathematical models and algorithmical methods for trying to identify promoter sequences. The methods concern both searching in a genome for a previously defined consensus and extracting a consensus from a set of sequences. Such methods were often tailored for either eukaryotes or prokaryotes although this does not preclude use of the same method for both types of organisms. The survey therefore covers all methods; however, emphasis is placed on prokaryotic promoter sequence identification. Illustrative applications of the main extracting algorithms are given for three bacteria.

  2. Method of Identifying a Base in a Nucleic Acid

    DOEpatents

    Fodor, Stephen P. A.; Lipshutz, Robert J.; Huang, Xiaohua

    1999-01-01

    Devices and techniques for hybridization of nucleic acids and for determining the sequence of nucleic acids. Arrays of nucleic acids are formed by techniques, preferably high resolution, light-directed techniques. Positions of hybridization of a target nucleic acid are determined by, e.g., epifluorescence microscopy. Devices and techniques are proposed to determine the sequence of a target nucleic acid more efficiently and more quickly through such synthesis and detection techniques.

  3. Probe kit for identifying a base in a nucleic acid

    DOEpatents

    Fodor, Stephen P. A.; Lipshutz, Robert J.; Huang, Xiaohua

    2001-01-01

    Devices and techniques for hybridization of nucleic acids and for determining the sequence of nucleic acids. Arrays of nucleic acids are formed by techniques, preferably high resolution, light-directed techniques. Positions of hybridization of a target nucleic acid are determined by, e.g., epifluorescence microscopy. Devices and techniques are proposed to determine the sequence of a target nucleic acid more efficiently and more quickly through such synthesis and detection techniques.

  4. Identifying features in biological sequences: Sixth workshop report

    SciTech Connect

    Burks, C.; Myers, E.; Pearson, W.R.

    1995-12-31

    This report covers the sixth of an annual series of workshops held at the Aspen Center for Physics concentrating particularly on the identification of features in DNA sequence, and more broadly on related topics in computational molecular biology. The workshop series originally focused primarily on discussion of current needs and future strategies for identifying and predicting the presence of complex functional units on sequenced, but otherwise uncharacterized, genomic DNA. We addressed the need for computationally-based, automatic tools for synthesizing available data about individual consensus sequences and local compositional patterns into the composite objects (e.g., genes) that are -- as composite entities -- the true object of interest when scanning DNA sequences. The workshop was structured to promote sustained informal contact and exchange of expertise between molecular biologists, computer scientists, and mathematicians.

  5. A homozygous mutation in PEX16 identified by whole-exome sequencing ending a diagnostic odyssey

    PubMed Central

    Bacino, Carlos A.; Chao, Yu-Hsin; Seto, Elaine; Lotze, Tim; Xia, Fan; Jones, Richard O.; Moser, Ann; Wangler, Michael F.

    2015-01-01

    We present a patient with a unique neurological phenotype with a progressive neurodegenerative. An 18-year diagnostic odyssey for the patient ended when exome sequencing identified a homozygous PEX16 mutation suggesting an atypical peroxisomal biogenesis disorder (PBD). Interestingly, the patient's peroxisomal biochemical abnormalities were subtle, such that plasma very-long-chain fatty acids initially failed to provide a diagnosis. This case suggests that next-generation sequencing may be diagnostic in some atypical peroxisomal biogenesis disorders. PMID:26644994

  6. Multigenome DNA sequence conservation identifies Hox cis-regulatory elements

    PubMed Central

    Kuntz, Steven G.; Schwarz, Erich M.; DeModena, John A.; De Buysscher, Tristan; Trout, Diane; Shizuya, Hiroaki; Sternberg, Paul W.; Wold, Barbara J.

    2008-01-01

    To learn how well ungapped sequence comparisons of multiple species can predict cis-regulatory elements in Caenorhabditis elegans, we made such predictions across the large, complex ceh-13/lin-39 locus and tested them transgenically. We also examined how prediction quality varied with different genomes and parameters in our comparisons. Specifically, we sequenced ∼0.5% of the C. brenneri and C. sp. 3 PS1010 genomes, and compared five Caenorhabditis genomes (C. elegans, C. briggsae, C. brenneri, C. remanei, and C. sp. 3 PS1010) to find regulatory elements in 22.8 kb of noncoding sequence from the ceh-13/lin-39 Hox subcluster. We developed the MUSSA program to find ungapped DNA sequences with N-way transitive conservation, applied it to the ceh-13/lin-39 locus, and transgenically assayed 21 regions with both high and low degrees of conservation. This identified 10 functional regulatory elements whose activities matched known ceh-13/lin-39 expression, with 100% specificity and a 77% recovery rate. One element was so well conserved that a similar mouse Hox cluster sequence recapitulated the native nematode expression pattern when tested in worms. Our findings suggest that ungapped sequence comparisons can predict regulatory elements genome-wide. PMID:18981268

  7. Phenolic acid esterases, coding sequences and methods

    DOEpatents

    Blum, David L.; Kataeva, Irina; Li, Xin-Liang; Ljungdahl, Lars G.

    2002-01-01

    Described herein are four phenolic acid esterases, three of which correspond to domains of previously unknown function within bacterial xylanases, from XynY and XynZ of Clostridium thermocellum and from a xylanase of Ruminococcus. The fourth specifically exemplified xylanase is a protein encoded within the genome of Orpinomyces PC-2. The amino acids of these polypeptides and nucleotide sequences encoding them are provided. Recombinant host cells, expression vectors and methods for the recombinant production of phenolic acid esterases are also provided.

  8. Exome capture sequencing identifies a novel mutation in BBS4

    PubMed Central

    Wang, Hui; Chen, Xianfeng; Dudinsky, Lynn; Patenia, Claire; Chen, Yiyun; Li, Yumei; Wei, Yue; Abboud, Emad B.; Al-Rajhi, Ali A.; Lewis, Richard Alan; Lupski, James R.; Mardon, Graeme; Gibbs, Richard A.; Perkins, Brian D.

    2011-01-01

    Purpose Leber congenital amaurosis (LCA) is one of the most severe eye dystrophies characterized by severe vision loss at an early stage and accounts for approximately 5% of all retinal dystrophies. The purpose of this study was to identify a novel LCA disease allele or gene and to develop an approach combining genetic mapping with whole exome sequencing. Methods Three patients from King Khaled Eye Specialist Hospital (KKESH205) underwent whole genome single nucleotide polymorphism genotyping, and a single candidate region was identified. Taking advantage of next-generation high-throughput DNA sequencing technologies, whole exome capture sequencing was performed on patient KKESH205#7. Sanger direct sequencing was used during the validation step. The zebrafish model was used to examine the function of the mutant allele. Results A novel missense mutation in Bardet-Biedl syndrome 4 protein (BBS4) was identified in a consanguineous family from Saudi Arabia. This missense mutation in the fifth exon (c.253G>C;p.E85Q) of BBS4 is likely a disease-causing mutation as it segregates with the disease. The mutation is not found in the single nucleotide polymorphism (SNP) database, the 1000 Genomes Project, or matching normal controls. Functional analysis of this mutation in zebrafish indicates that the G253C allele is pathogenic. Coinjection of the G253C allele cannot rescue the mislocalization of rhodopsin in the retina when BBS4 is knocked down by morpholino injection. Immunofluorescence analysis in cell culture shows that this missense mutation in BBS4 does not cause obvious defects in protein expression or pericentriolar localization. Conclusions This mutation likely mainly reduces or abolishes BBS4 function in the retina. Further studies of this allele will provide important insights concerning the pleiotropic nature of BBS4 function. PMID:22219648

  9. Correlation approach to identify coding regions in DNA sequences

    NASA Technical Reports Server (NTRS)

    Ossadnik, S. M.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1994-01-01

    Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.

  10. Identifying Affinity Classes of Inorganic Materials Binding Sequences via a Graph-Based Model.

    PubMed

    Du, Nan; Knecht, Marc R; Swihart, Mark T; Tang, Zhenghua; Walsh, Tiffany R; Zhang, Aidong

    2015-01-01

    Rapid advances in bionanotechnology have recently generated growing interest in identifying peptides that bind to inorganic materials and classifying them based on their inorganic material affinities. However, there are some distinct characteristics of inorganic materials binding sequence data that limit the performance of many widely-used classification methods when applied to this problem. In this paper, we propose a novel framework to predict the affinity classes of peptide sequences with respect to an associated inorganic material. We first generate a large set of simulated peptide sequences based on an amino acid transition matrix tailored for the specific inorganic material. Then the probability of test sequences belonging to a specific affinity class is calculated by minimizing an objective function. In addition, the objective function is minimized through iterative propagation of probability estimates among sequences and sequence clusters. Results of computational experiments on two real inorganic material binding sequence data sets show that the proposed framework is highly effective for identifying the affinity classes of inorganic material binding sequences. Moreover, the experiments on the structural classification of proteins (SCOP) data set shows that the proposed framework is general and can be applied to traditional protein sequences.

  11. Phenotype Sequencing: Identifying the Genes That Cause a Phenotype Directly from Pooled Sequencing of Independent Mutants

    PubMed Central

    Harper, Marc A.; Chen, Zugen; Toy, Traci; Machado, Iara M. P.; Nelson, Stanley F.; Liao, James C.; Lee, Christopher J.

    2011-01-01

    Random mutagenesis and phenotype screening provide a powerful method for dissecting microbial functions, but their results can be laborious to analyze experimentally. Each mutant strain may contain 50–100 random mutations, necessitating extensive functional experiments to determine which one causes the selected phenotype. To solve this problem, we propose a “Phenotype Sequencing” approach in which genes causing the phenotype can be identified directly from sequencing of multiple independent mutants. We developed a new computational analysis method showing that 1. causal genes can be identified with high probability from even a modest number of mutant genomes; 2. costs can be cut many-fold compared with a conventional genome sequencing approach via an optimized strategy of library-pooling (multiple strains per library) and tag-pooling (multiple tagged libraries per sequencing lane). We have performed extensive validation experiments on a set of E. coli mutants with increased isobutanol biofuel tolerance. We generated a range of sequencing experiments varying from 3 to 32 mutant strains, with pooling on 1 to 3 sequencing lanes. Our statistical analysis of these data (4099 mutations from 32 mutant genomes) successfully identified 3 genes (acrB, marC, acrA) that have been independently validated as causing this experimental phenotype. It must be emphasized that our approach reduces mutant sequencing costs enormously. Whereas a conventional genome sequencing experiment would have cost $7,200 in reagents alone, our Phenotype Sequencing design yielded the same information value for only $1200. In fact, our smallest experiments reliably identified acrB and marC at a cost of only $110–$340. PMID:21364744

  12. Exome Sequencing Identifies Potentially Druggable Mutations in Nasopharyngeal Carcinoma

    PubMed Central

    Chow, Yock Ping; Tan, Lu Ping; Chai, San Jiun; Abdul Aziz, Norazlin; Choo, Siew Woh; Lim, Paul Vey Hong; Pathmanathan, Rajadurai; Mohd Kornain, Noor Kaslina; Lum, Chee Lun; Pua, Kin Choo; Yap, Yoke Yeow; Tan, Tee Yong; Teo, Soo Hwang; Khoo, Alan Soo-Beng; Patel, Vyomesh

    2017-01-01

    In this study, we first performed whole exome sequencing of DNA from 10 untreated and clinically annotated fresh frozen nasopharyngeal carcinoma (NPC) biopsies and matched bloods to identify somatically mutated genes that may be amenable to targeted therapeutic strategies. We identified a total of 323 mutations which were either non-synonymous (n = 238) or synonymous (n = 85). Furthermore, our analysis revealed genes in key cancer pathways (DNA repair, cell cycle regulation, apoptosis, immune response, lipid signaling) were mutated, of which those in the lipid-signaling pathway were the most enriched. We next extended our analysis on a prioritized sub-set of 37 mutated genes plus top 5 mutated cancer genes listed in COSMIC using a custom designed HaloPlex target enrichment panel with an additional 88 NPC samples. Our analysis identified 160 additional non-synonymous mutations in 37/42 genes in 66/88 samples. Of these, 99/160 mutations within potentially druggable pathways were further selected for validation. Sanger sequencing revealed that 77/99 variants were true positives, giving an accuracy of 78%. Taken together, our study indicated that ~72% (n = 71/98) of NPC samples harbored mutations in one of the four cancer pathways (EGFR-PI3K-Akt-mTOR, NOTCH, NF-κB, DNA repair) which may be potentially useful as predictive biomarkers of response to matched targeted therapies. PMID:28256603

  13. Identifying satellites and periodic repetitions in biological sequences.

    PubMed

    Sagot, M F; Myers, E W

    1998-01-01

    We present in this paper an algorithm for identifying satellites in DNA sequences. Satellites (simple, micro, or mini) are repeats in number between 30 and as many as 1,000,000 whose lengths vary between 2 and hundreds of base pairs and that appear, with some mutations, in tandem along the sequence. We concentrate here on short to moderately long (up to 30-40 base pairs) approximate tandem repeats where copies may differ up to epsilon = 15-20% from a consensus model of the repeating unit (implying individual units may vary by 2 epsilon from each other). The algorithm is composed of two parts. The first one consists of a filter that basically eliminates all regions whose probability of containing a satellite is less than one in 10(4) when epsilon = 10%. The second part realizes an exhaustive exploration of the space of all possible models for the repeating units present in the sequence. It therefore has the advantage over previous work of being able to report a consensus model, say m, of the repeated unit as well as the span of the satellite. The first phase was designed for efficiency and takes only O (n) time where n is the length of the sequence. The second phase was designed for sensitivity and takes time O (n . N (e, k)) in the worst case where k is the length of the repeating unit m, e = [epsilon k] is the number of differences allowed between each repeat unit and the model m, and N (e, k) is the maximum number of words that are not more than e differences from another word of length k. That is, N (e, k) is the maximum size of an e-neighborhood of a string of length k. Experiments reveal the second phase to be considerably faster in practice than the worst-case complexity bound suggests. Finally, the present algorithm is easily adapted to finding tandem repeats in protein sequences, as well as extended to identifying mixed direct-inverse tandem repeats.

  14. 77 FR 65537 - Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-10-29

    ... Amino Acid Sequence Disclosures ACTION: Proposed collection; comment request. SUMMARY: The United States....'' SUPPLEMENTARY INFORMATION: I. Abstract Patent applications that contain nucleotide and/or amino acid...

  15. Identifying and calling insertions, deletions, and single-base mutations efficiently from sequence data

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Whole genome sequencing studies can directly identify causative mutations for subsequent use in genomic evaluations, but sequence variant identification is a lengthy and sometimes inaccurate process. The speed and accuracy of identifying small insertions and deletions of sequence, collectively terme...

  16. Whole-exome sequencing identifies variants in invasive pituitary adenomas

    PubMed Central

    Lan, Xiaolei; Gao, Hua; Wang, Fei; Feng, Jie; Bai, Jiwei; Zhao, Peng; Cao, Lei; Gui, Songbai; Gong, Lei; Zhang, Yazhuo

    2016-01-01

    Pituitary adenomas exhibit a wide range of behaviors. The prediction of invasion or malignant behavior in pituitary adenomas remains challenging. The objective of the present study was to identify the genetic abnormalities associated with invasion in sporadic pituitary adenomas. In the present study, the exomes of six invasive pituitary adenomas (IPA) and six non-invasive pituitary adenomas (nIPA) were sequenced by whole-exome sequencing. Variants were confirmed by dideoxynucleotide sequencing, and candidate driver genes were assessed in an additional 28 pituitary adenomas. A total of 15 identified variants were mainly associated with angiogenesis, metabolism, cell cycle phase, cellular component organization, cytoskeleton and biogenesis immune at a cellular level, including 13 variants that occurred as single nucleotide variants and 2 that comprised of insertions. The messenger RNA (mRNA) levels of diffuse panbronchiolitis critical region 1 (DPCR1), KIAA0226, myxovirus (influenza virus) resistance, proline-rich protein BstNI subfamily 3, PR domain containing 2, with ZNF domain, RIZ1 (PRDM2), PR domain containing 8 (PRDM8), SPANX family member N2 (SPANXN2), TRIO and F-actin binding protein and zinc finger protein 717 in IPA specimens were 50% decreased compared with nIPA specimens. In particular, DPCR1, PRDM2, PRDM8 and SPANXN2 mRNA levels in IPA specimens were approximately four-fold lower compared with nIPA specimens (P=0.003, 0.007, 0.009 and 0.004, respectively). By contrast, the mRNA levels of dentin sialophospho protein, EGF like domain, multiple 7 (EGFL7), low density lipoprotein receptor-related protein 1B and dynein, axonemal, assembly factor 1 (LRRC50) were increased in IPA compared with nIPA specimens (P=0.041, 0.037, 0.022 and 0.013, respectively). Furthermore, decreased PRDM2 expression was associated with tumor recurrence. The findings of the present study indicate that DPCR1, EGFL7, the PRDM family and LRRC50 in pituitary adenomas are modifiers of

  17. Exome sequencing identifies recurrent somatic RAC1 mutations in melanoma

    PubMed Central

    Krauthammer, Michael; Kong, Yong; Ha, Byung Hak; Evans, Perry; Bacchiocchi, Antonella; McCusker, James P; Cheng, Elaine; Davis, Matthew J; Goh, Gerald; Choi, Murim; Ariyan, Stephan; Narayan, Deepak; Dutton-Regester, Ken; Capatana, Ana; Holman, Edna C; Bosenberg, Marcus; Sznol, Mario; Kluger, Harriet M; Brash, Douglas E; Stern, David F; Materin, Miguel A; Lo, Roger S; Mane, Shrikant; Ma, Shuangge; Kidd, Kenneth K; Hayward, Nicholas K; Lifton, Richard P; Schlessinger, Joseph; Boggon, Titus J; Halaban, Ruth

    2012-01-01

    We characterized the mutational landscape of melanoma, the form of skin cancer with the highest mortality rate, by sequencing the exomes of 147 melanomas. Sun-exposed melanomas had markedly more ultraviolet (UV)-like C>T somatic mutations compared to sun-shielded acral, mucosal and uveal melanomas. Among the newly identified cancer genes was PPP6C, encoding a serine/threonine phosphatase, which harbored mutations that clustered in the active site in 12% of sun-exposed melanomas, exclusively in tumors with mutations in BRAF or NRAS. Notably, we identified a recurrent UV-signature, an activating mutation in RAC1 in 9.2% of sun-exposed melanomas. This activating mutation, the third most frequent in our cohort of sun-exposed melanoma after those of BRAF and NRAS, changes Pro29 to serine (RAC1P29S) in the highly conserved switch I domain. Crystal structures, and biochemical and functional studies of RAC1P29S showed that the alteration releases the conformational restraint conferred by the conserved proline, causes an increased binding of the protein to downstream effectors, and promotes melanocyte proliferation and migration. These findings raise the possibility that pharmacological inhibition of downstream effectors of RAC1 signaling could be of therapeutic benefit. PMID:22842228

  18. Exome sequencing identifies recurrent somatic RAC1 mutations in melanoma

    SciTech Connect

    Krauthammer, Michael; Kong, Yong; Ha, Byung Hak; Evans, Perry; Bacchiocchi, Antonella; McCusker, James P.; Cheng, Elaine; Davis, Matthew J.; Goh, Gerald; Choi, Murim; Ariyan, Stephan; Narayan, Deepak; Dutton-Regester, Ken; Capatana, Ana; Holman, Edna C.; Bosenberg, Marcus; Sznol, Mario; Kluger, Harriet M.; Brash, Douglas E.; Stern, David F.; Materin, Miguel A.; Lo, Roger S.; Mane, Shrikant; Ma, Shuangge; Kidd, Kenneth K.; Hayward, Nicholas K.; Lifton, Richard P.; Schlessinger, Joseph; Boggon, Titus J.; Halaban, Ruth

    2012-10-11

    We characterized the mutational landscape of melanoma, the form of skin cancer with the highest mortality rate, by sequencing the exomes of 147 melanomas. Sun-exposed melanomas had markedly more ultraviolet (UV)-like C>T somatic mutations compared to sun-shielded acral, mucosal and uveal melanomas. Among the newly identified cancer genes was PPP6C, encoding a serine/threonine phosphatase, which harbored mutations that clustered in the active site in 12% of sun-exposed melanomas, exclusively in tumors with mutations in BRAF or NRAS. Notably, we identified a recurrent UV-signature, an activating mutation in RAC1 in 9.2% of sun-exposed melanomas. This activating mutation, the third most frequent in our cohort of sun-exposed melanoma after those of BRAF and NRAS, changes Pro29 to serine (RAC1{sup P29S}) in the highly conserved switch I domain. Crystal structures, and biochemical and functional studies of RAC1{sup P29S} showed that the alteration releases the conformational restraint conferred by the conserved proline, causes an increased binding of the protein to downstream effectors, and promotes melanocyte proliferation and migration. These findings raise the possibility that pharmacological inhibition of downstream effectors of RAC1 signaling could be of therapeutic benefit.

  19. Detection of nucleic acid sequences by invader-directed cleavage

    DOEpatents

    Brow, Mary Ann D.; Hall, Jeff Steven Grotelueschen; Lyamichev, Victor; Olive, David Michael; Prudent, James Robert

    1999-01-01

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The 5' nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based by charge.

  20. Los Alamos sequence analysis package for nucleic acids and proteins.

    PubMed Central

    Kanehisa, M I

    1982-01-01

    An interactive system for computer analysis of nucleic acid and protein sequences has been developed for the Los Alamos DNA Sequence Database. It provides a convenient way to search or verify various sequence features, e.g., restriction enzyme sites, protein coding frames, and properties of coded proteins. Further, the comprehensive analysis package on a large-scale database can be used for comparative studies on sequence and structural homologies in order to find unnoted information stored in nucleic acid sequences. PMID:6174934

  1. Exome sequencing covers >98% of mutations identified on targeted next generation sequencing panels

    PubMed Central

    LaDuca, Holly; Farwell, Kelly D.; Vuong, Huy; Lu, Hsiao-Mei; Mu, Wenbo; Shahmirzadi, Layla; Tang, Sha; Chen, Jefferey; Bhide, Shruti; Chao, Elizabeth C.

    2017-01-01

    Background With the expanded availability of next generation sequencing (NGS)-based clinical genetic tests, clinicians seeking to test patients with Mendelian diseases must weigh the superior coverage of targeted gene panels with the greater number of genes included in whole exome sequencing (WES) when considering their first-tier testing approach. Here, we use an in silico analysis to predict the analytic sensitivity of WES using pathogenic variants identified on targeted NGS panels as a reference. Methods Corresponding nucleotide positions for 1533 different alterations classified as pathogenic or likely pathogenic identified on targeted NGS multi-gene panel tests in our laboratory were interrogated in data from 100 randomly-selected clinical WES samples to quantify the sequence coverage at each position. Pathogenic variants represented 91 genes implicated in hereditary cancer, X-linked intellectual disability, primary ciliary dyskinesia, Marfan syndrome/aortic aneurysms, cardiomyopathies and arrhythmias. Results When assessing coverage among 100 individual WES samples for each pathogenic variant (153,300 individual assessments), 99.7% (n = 152,798) would likely have been detected on WES. All pathogenic variants had at least some coverage on exome sequencing, with a total of 97.3% (n = 1491) detectable across all 100 individuals. For the remaining 42 pathogenic variants, the number of WES samples with adequate coverage ranged from 35 to 99. Factors such as location in GC-rich, repetitive, or homologous regions likely explain why some of these alterations were not detected across all samples. To validate study findings, a similar analysis was performed against coverage data from 60,706 exomes available through the Exome Aggregation Consortium (ExAC). Results from this validation confirmed that 98.6% (91,743,296/93,062,298) of pathogenic variants demonstrated adequate depth for detection. Conclusions Results from this in silico analysis suggest that exome

  2. MetaPhinder-Identifying Bacteriophage Sequences in Metagenomic Data Sets.

    PubMed

    Jurtz, Vanessa Isabell; Villarroel, Julia; Lund, Ole; Voldby Larsen, Mette; Nielsen, Morten

    Bacteriophages are the most abundant biological entity on the planet, but at the same time do not account for much of the genetic material isolated from most environments due to their small genome sizes. They also show great genetic diversity and mosaic genomes making it challenging to analyze and understand them. Here we present MetaPhinder, a method to identify assembled genomic fragments (i.e.contigs) of phage origin in metagenomic data sets. The method is based on a comparison to a database of whole genome bacteriophage sequences, integrating hits to multiple genomes to accomodate for the mosaic genome structure of many bacteriophages. The method is demonstrated to out-perform both BLAST methods based on single hits and methods based on k-mer comparisons. MetaPhinder is available as a web service at the Center for Genomic Epidemiology https://cge.cbs.dtu.dk/services/MetaPhinder/, while the source code can be downloaded from https://bitbucket.org/genomicepidemiology/metaphinder or https://github.com/vanessajurtz/MetaPhinder.

  3. Hybridization and sequencing of nucleic acids using base pair mismatches

    DOEpatents

    Fodor, Stephen P. A.; Lipshutz, Robert J.; Huang, Xiaohua

    2001-01-01

    Devices and techniques for hybridization of nucleic acids and for determining the sequence of nucleic acids. Arrays of nucleic acids are formed by techniques, preferably high resolution, light-directed techniques. Positions of hybridization of a target nucleic acid are determined by, e.g., epifluorescence microscopy. Devices and techniques are proposed to determine the sequence of a target nucleic acid more efficiently and more quickly through such synthesis and detection techniques.

  4. A novel HLA-B*51 allele (B*5116) identified by nucleotide sequencing.

    PubMed

    Tamouza, R; Carbonnelle, E; Schaeffer, V; Sadki, K; Abed, Y; Marzais, F; Poirier, J C; Fortier, C; Toubert, A; Raffoux, C; Charron, D

    2000-02-01

    We report here an additional HLA-B*51 variant designated HLA-B*5116. Detected by an abnormal serological reactivity pattern, this variant was identified as a B*51 allele by polymerase chain reaction using sequence-specific primers (PCR-SSP) and characterized by nucleotide sequencing. The new variant sequence match closely with the classical HLA-B*5101 excepted two adjacent nucleotide substitutions at positions 216 and 217 of the third exon and the subsequent Leucine to Glutamic acid change at codon 163 of the alpha2 domain (CTG-->GAG). This new variant was not detected in three different ethnic groups (French, Algerian and Lebanese) suggesting a very rare frequency.

  5. A novel exogenous retrovirus sequence identified in humans.

    PubMed Central

    Griffiths, D J; Venables, P J; Weiss, R A; Boyd, M T

    1997-01-01

    A 932-bp retrovirus sequence was cloned by reverse transcriptase PCR from salivary gland tissue of a patient with Sjögren's syndrome. The sequence is related to that of type B and type D retroviruses and was present in a sucrose density gradient fraction corresponding to that of an enveloped retrovirus particle. Sequences amplified from tissues of eight individuals with or without Sjögren's syndrome had over 90% similarity and were present at a level of less than one copy per 10(3) cells. The sequence was not detectable in human genomic DNA by PCR or by Southern hybridization. These data indicate that the sequence represents an infectiously acquired genome, provisionally called human retrovirus 5. PMID:9060643

  6. Incorporating distant sequence features and radial basis function networks to identify ubiquitin conjugation sites.

    PubMed

    Lee, Tzong-Yi; Chen, Shu-An; Hung, Hsin-Yi; Ou, Yu-Yen

    2011-03-09

    Ubiquitin (Ub) is a small protein that consists of 76 amino acids about 8.5 kDa. In ubiquitin conjugation, the ubiquitin is majorly conjugated on the lysine residue of protein by Ub-ligating (E3) enzymes. Three major enzymes participate in ubiquitin conjugation. They are E1, E2 and E3 which are responsible for activating, conjugating and ligating ubiquitin, respectively. Ubiquitin conjugation in eukaryotes is an important mechanism of the proteasome-mediated degradation of a protein and regulating the activity of transcription factors. Motivated by the importance of ubiquitin conjugation in biological processes, this investigation develops a method, UbSite, which uses utilizes an efficient radial basis function (RBF) network to identify protein ubiquitin conjugation (ubiquitylation) sites. This work not only investigates the amino acid composition but also the structural characteristics, physicochemical properties, and evolutionary information of amino acids around ubiquitylation (Ub) sites. With reference to the pathway of ubiquitin conjugation, the substrate sites for E3 recognition, which are distant from ubiquitylation sites, are investigated. The measurement of F-score in a large window size (-20∼+20) revealed a statistically significant amino acid composition and position-specific scoring matrix (evolutionary information), which are mainly located distant from Ub sites. The distant information can be used effectively to differentiate Ub sites from non-Ub sites. As determined by five-fold cross-validation, the model that was trained using the combination of amino acid composition and evolutionary information performs best in identifying ubiquitin conjugation sites. The prediction sensitivity, specificity, and accuracy are 65.5%, 74.8%, and 74.5%, respectively. Although the amino acid sequences around the ubiquitin conjugation sites do not contain conserved motifs, the cross-validation result indicates that the integration of distant sequence features of Ub

  7. Simple sequence repeat markers that identify Claviceps species and strains

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Claviceps purpurea is a pathogen that infects most members of the Pooideae subfamily and causes ergot, a floral disease in which the ovary is replaced with a sclerotium. This study was initiated to develop Simple Sequence Repeat (SSRs) markers for rapid identification of C. purpurea. SSRs were desi...

  8. DeNovoID: a web-based tool for identifying peptides from sequence and mass tags deduced from de novo peptide sequencing by mass spectroscopy.

    PubMed

    Halligan, Brian D; Ruotti, Victor; Twigger, Simon N; Greene, Andrew S

    2005-07-01

    One of the core activities of high-throughput proteomics is the identification of peptides from mass spectra. Some peptides can be identified using spectral matching programs like Sequest or Mascot, but many spectra do not produce high quality database matches. De novo peptide sequencing is an approach to determine partial peptide sequences for some of the unidentified spectra. A drawback of de novo peptide sequencing is that it produces a series of ordered and disordered sequence tags and mass tags rather than a complete, non-degenerate peptide amino acid sequence. This incomplete data is difficult to use in conventional search programs such as BLAST or FASTA. DeNovoID is a program that has been specifically designed to use degenerate amino acid sequence and mass data derived from MS experiments to search a peptide database. Since the algorithm employed depends on the amino acid composition of the peptide and not its sequence, DeNovoID does not have to consider all possible sequences, but rather a smaller number of compositions consistent with a spectrum. DeNovoID also uses a geometric indexing scheme that reduces the number of calculations required to determine the best peptide match in the database. DeNovoID is available at http://proteomics.mcw.edu/denovoid.

  9. Nucleotide sequence variation of the envelope protein gene identifies two distinct genotypes of yellow fever virus.

    PubMed Central

    Chang, G J; Cropp, B C; Kinney, R M; Trent, D W; Gubler, D J

    1995-01-01

    The evolution of yellow fever virus over 67 years was investigated by comparing the nucleotide sequences of the envelope (E) protein genes of 20 viruses isolated in Africa, the Caribbean, and South America. Uniformly weighted parsimony algorithm analysis defined two major evolutionary yellow fever virus lineages designated E genotypes I and II. E genotype I contained viruses isolated from East and Central Africa. E genotype II viruses were divided into two sublineages: IIA viruses from West Africa and IIB viruses from America, except for a 1979 virus isolated from Trinidad (TRINID79A). Unique signature patterns were identified at 111 nucleotide and 12 amino acid positions within the yellow fever virus E gene by signature pattern analysis. Yellow fever viruses from East and Central Africa contained unique signatures at 60 nucleotide and five amino acid positions, those from West Africa contained unique signatures at 25 nucleotide and two amino acid positions, and viruses from America contained such signatures at 30 nucleotide and five amino acid positions in the E gene. The dissemination of yellow fever viruses from Africa to the Americas is supported by the close genetic relatedness of genotype IIA and IIB viruses and genetic evidence of a possible second introduction of yellow fever virus from West Africa, as illustrated by the TRINID79A virus isolate. The E protein genes of American IIB yellow fever viruses had higher frequencies of amino acid substitutions than did genes of yellow fever viruses of genotypes I and IIA on the basis of comparisons with a consensus amino acid sequence for the yellow fever E gene. The great variation in the E proteins of American yellow fever virus probably results from positive selection imposed by virus interaction with different species of mosquitoes or nonhuman primates in the Americas. PMID:7637022

  10. Methods and compositions for efficient nucleic acid sequencing

    DOEpatents

    Drmanac, Radoje

    2006-07-04

    Disclosed are novel methods and compositions for rapid and highly efficient nucleic acid sequencing based upon hybridization with two sets of small oligonucleotide probes of known sequences. Extremely large nucleic acid molecules, including chromosomes and non-amplified RNA, may be sequenced without prior cloning or subcloning steps. The methods of the invention also solve various current problems associated with sequencing technology such as, for example, high noise to signal ratios and difficult discrimination, attaching many nucleic acid fragments to a surface, preparing many, longer or more complex probes and labelling more species.

  11. Methods and compositions for efficient nucleic acid sequencing

    DOEpatents

    Drmanac, Radoje

    2002-01-01

    Disclosed are novel methods and compositions for rapid and highly efficient nucleic acid sequencing based upon hybridization with two sets of small oligonucleotide probes of known sequences. Extremely large nucleic acid molecules, including chromosomes and non-amplified RNA, may be sequenced without prior cloning or subcloning steps. The methods of the invention also solve various current problems associated with sequencing technology such as, for example, high noise to signal ratios and difficult discrimination, attaching many nucleic acid fragments to a surface, preparing many, longer or more complex probes and labelling more species.

  12. Kit for detecting nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    2001-01-01

    A kit is provided for detecting a target nucleic acid sequence in a sample, the kit comprising: a first hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the first hybridization probe including a first complexing agent for forming a binding pair with a second complexing agent; and a second hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the first hybridization probe does not selectively hybridize, the second hybridization probe including a detectable marker; a third hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the third hybridization probe including the same detectable marker as the second hybridization probe; and a fourth hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the third hybridization probe does not selectively hybridize, the fourth hybridization probe including the first complexing agent for forming a binding pair with the second complexing agent; wherein the first and second hybridization probes are capable of simultaneously hybridizing to the target sequence and the third and fourth hybridization probes are capable of simultaneously hybridizing to the target sequence, the detectable marker is not present on the first or fourth hybridization probes and the first, second, third, and fourth hybridization probes each include a competitive nucleic acid sequence which is sufficiently complementary to a third portion of the target sequence that the competitive sequences of the first, second, third, and fourth hybridization probes compete with each other to hybridize to the third portion of the

  13. Antibiotic Resistance Markers in Strain Bp1651 of Burkholderia pseudomallei Identified by Genome Sequence Analysis.

    PubMed

    Bugrysheva, Julia V; Sue, David; Gee, Jay E; Elrod, Mindy G; Hoffmaster, Alex R; Randall, Linnell B; Chirakul, Sunisa; Tuanyok, Apichai; Schweizer, Herbert P; Weigel, Linda M

    2017-04-10

    Burkholderia pseudomallei Bp1651 is resistant to several classes of antibiotics that are usually effective for treatment of melioidosis including β-lactams such as penicillins (amoxicillin/clavulanic acid), cephalosporins (ceftazidime), carbapenems (imipenem and meropenem), as well as tetracyclines and sulfonamides. We sequenced, assembled, and annotated the Bp1651 genome, and analyzed the sequence using comparative genomic analyses with susceptible strains, keyword searches of the annotation, publicly available antimicrobial resistance prediction tools, and published reports. More than 100 genes in the Bp1651 sequence were identified as potentially contributing to antimicrobial resistance. Most notably, we identified three previously uncharacterized point mutations in penA, which codes for a class A β-lactamase and was previously implicated in resistance to β-lactam antibiotics. The mutations result in amino acid changes T147A, D240G, and V261I. When individually introduced into select agent-excluded B. pseudomallei strain Bp82, D240G was found to contribute to ceftazidime resistance, and T147A contributed to amoxicillin/clavulanic acid and imipenem resistance. This study provides the first evidence that mutations in penA may alter susceptibility to carbapenems in B. pseudomallei Another mutation of interest was a point mutation affecting the dihydrofolate reductase gene folA, which likely explains the trimethoprim resistance of this strain. Bp1651 was susceptible to aminoglycosides likely due to a frame shift in the amrB gene, the transporter subunit of the AmrAB-OprA efflux pump. These findings expand the role of penA to include resistance to carbapenems and may assist in development of molecular diagnostics that predict antimicrobial resistance and provide guidance for treatment of melioidosis.

  14. Analysis and Annotation of Nucleic Acid Sequence

    SciTech Connect

    States, David J.

    2004-07-28

    The aims of this project were to develop improved methods for computational genome annotation and to apply these methods to improve the annotation of genomic sequence data with a specific focus on human genome sequencing. The project resulted in a substantial body of published work. Notable contributions of this project were the identification of basecalling and lane tracking as error processes in genome sequencing and contributions to improved methods for these steps in genome sequencing. This technology improved the accuracy and throughput of genome sequence analysis. Probabilistic methods for physical map construction were developed. Improved methods for sequence alignment, alternative splicing analysis, promoter identification and NF kappa B response gene prediction were also developed.

  15. Use of HLA peptidomics and whole exome sequencing to identify human immunogenic neo-antigens

    PubMed Central

    Kalaora, Shelly; Qutob, Nouar; Teer, Jamie K.; Shimony, Nilly; Schachter, Jacob; Rosenberg, Steven A.; Samuels, Yardena

    2016-01-01

    The antigenicity of cells is demarcated by the peptides bound by their Human Leucocyte Antigen (HLA) molecules. Through this antigen presentation, T cell specificity response is controlled. As a fraction of the expressed mutated peptides is presented on the HLA, these neo-epitopes could be immunogenic. Such neoantigens have recently been identified through screening for predicted mutated peptides, using synthetic peptides or ones expressed from minigenes, combined with screening of patient tumor-infiltrating lymphocytes (TILs). Here we present a time and cost-effective method that combines whole-exome sequencing analysis with HLA peptidome mass spectrometry, to identify neo-antigens in a melanoma patient. Of the 1,019 amino acid changes identified through exome sequencing, two were confirmed by mass spectrometry to be presented by the cells. We then synthesized peptides and evaluated the two mutated neo-antigens for reactivity with autologous bulk TILs, and found that one yielded mutant-specific T-cell response. Our results demonstrate that this method can be used for immune response prediction and promise to provide an alternative approach for identifying immunogenic neo-epitopes in cancer. PMID:26819371

  16. Solid phase sequencing of double-stranded nucleic acids

    DOEpatents

    Fu, Dong-Jing; Cantor, Charles R.; Koster, Hubert; Smith, Cassandra L.

    2002-01-01

    This invention relates to methods for detecting and sequencing of target double-stranded nucleic acid sequences, to nucleic acid probes and arrays of probes useful in these methods, and to kits and systems which contain these probes. Useful methods involve hybridizing the nucleic acids or nucleic acids which represent complementary or homologous sequences of the target to an array of nucleic acid probes. These probe comprise a single-stranded portion, an optional double-stranded portion and a variable sequence within the single-stranded portion. The molecular weights of the hybridized nucleic acids of the set can be determined by mass spectroscopy, and the sequence of the target determined from the molecular weights of the fragments. Nucleic acids whose sequences can be determined include nucleic acids in biological samples such as patient biopsies and environmental samples. Probes may be fixed to a solid support such as a hybridization chip to facilitate automated determination of molecular weights and identification of the target sequence.

  17. A FRET Biosensor for ROCK Based on a Consensus Substrate Sequence Identified by KISS Technology.

    PubMed

    Li, Chunjie; Imanishi, Ayako; Komatsu, Naoki; Terai, Kenta; Amano, Mutsuki; Kaibuchi, Kozo; Matsuda, Michiyuki

    2017-01-11

    Genetically-encoded biosensors based on Förster/fluorescence resonance energy transfer (FRET) are versatile tools for studying the spatio-temporal regulation of signaling molecules within not only the cells but also tissues. Perhaps the hardest task in the development of a FRET biosensor for protein kinases is to identify the kinase-specific substrate peptide to be used in the FRET biosensor. To solve this problem, we took advantage of kinase-interacting substrate screening (KISS) technology, which deduces a consensus substrate sequence for the protein kinase of interest. Here, we show that a consensus substrate sequence for ROCK identified by KISS yielded a FRET biosensor for ROCK, named Eevee-ROCK, with high sensitivity and specificity. By treating HeLa cells with inhibitors or siRNAs against ROCK, we show that a substantial part of the basal FRET signal of Eevee-ROCK was derived from the activities of ROCK1 and ROCK2. Eevee-ROCK readily detected ROCK activation by epidermal growth factor, lysophosphatidic acid, and serum. When cells stably-expressing Eevee-ROCK were time-lapse imaged for three days, ROCK activity was found to increase after the completion of cytokinesis, concomitant with the spreading of cells. Eevee-ROCK also revealed a gradual increase in ROCK activity during apoptosis. Thus, Eevee-ROCK, which was developed from a substrate sequence predicted by the KISS technology, will pave the way to a better understanding of the function of ROCK in a physiological context.

  18. Seven Conformers of Pipecolic Acid Identified in the Gas Phase

    NASA Astrophysics Data System (ADS)

    Cabezas, Carlos; Simao, Alcides; Alonso, José L.

    2016-06-01

    The multiconformational landscape of the non-proteinogenic cyclic amino acid pipecolic acid has been explored in the gas phase. Solid pipecolic acid (m.p. 280°C) was vaporized by laser ablation (LA) and expanded in a supersonic jet where the rotational spectra of seven conformers were obtained by broadband microwave spectroscopy (CP-FTMW). All conformers were conclusively identified by comparison of the experimental spectroscopic constants with those predicted theoretically. The relative stability of the conformers rests on a delicate balance of the different intramolecular hydrogen bonds established between the carboxylic and the amino groups.

  19. Whole Exome Sequencing Identifies Atypical Welander Distal Myopathy in Patient

    PubMed Central

    Blackburn, Patrick; Jackson, Jessica; Harris, Kimberly; Selcen, Duygu; Dimberg, Elliot; Atwal, Paldeep

    2017-01-01

    Abstract Welander distal myopathy is a rare autosomal dominant disorder characterized by muscle weakness in the hands and feet. Exome sequencing of affected families discovered a segregating p.Glu384Lys pathogenic variant in TIA-1 as the main genetic cause of Welander distal myopathy. TIA-1 encodes an RNA-binding protein which serves as a key component of stress granules. This protein also regulates splicing and translation of mRNA. Our patient developed progressive weakness in his hands and feet during his late 40s that was misdiagnosed as a neuropathy that caused muscle atrophy. Follow-up genetic testing revealed a p.Glu384Lys pathogenic variant in TIA-1, and he was then diagnosed with Welander distal myopathy. Our case report underlines the importance of electrodiagnostic and genetic testing of patients. PMID:28221306

  20. The Chinese hamster Alu-equivalent sequence: a conserved highly repetitious, interspersed deoxyribonucleic acid sequence in mammals has a structure suggestive of a transposable element.

    PubMed Central

    Haynes, S R; Toomey, T P; Leinwand, L; Jelinek, W R

    1981-01-01

    A consensus sequence has been determined for a major interspersed deoxyribonucleic acid repeat in the genome of Chinese hamster ovary cells (CHO cells). This sequence is extensively homologous to (i) the human Alu sequence (P. L. Deininger et al., J. Mol. Biol., in press), (ii) the mouse B1 interspersed repetitious sequence (Krayev et al., Nucleic Acids Res. 8:1201-1215, 1980) (iii) an interspersed repetitious sequence from African green monkey deoxyribonucleic acid (Dhruva et al., Proc. Natl. Acad. Sci. U.S.A. 77:4514-4518, 1980) and (iv) the CHO and mouse 4.5S ribonucleic acid (this report; F. Harada and N. Kato, Nucleic Acids Res. 8:1273-1285, 1980). Because the CHO consensus sequence shows significant homology to the human Alu sequence it is termed the CHO Alu-equivalent sequence. A conserved structure surrounding CHO Alu-equivalent family members can be recognized. It is similar to that surrounding the human Alu and the mouse B1 sequences, and is represented as follows: direct repeat-CHO-Alu-A-rich sequence-direct repeat. A composite interspersed repetitious sequence has been identified. Its structure is represented as follows: direct repeat-residue 47 to 107 of CHO-Alu-non-Alu repetitious sequence-A-rich sequence-direct repeat. Because the Alu flanking sequences resemble those that flank known transposable elements, we think it likely that the Alu sequence dispersed throughout the mammalian genome by transposition. Images PMID:9279371

  1. SETG: Nucleic Acid Extraction and Sequencing for In Situ Life Detection on Mars

    NASA Astrophysics Data System (ADS)

    Mojarro, A.; Hachey, J.; Tani, J.; Smith, A.; Bhattaru, S. A.; Pontefract, A.; Doebler, R.; Brown, M.; Ruvkun, G.; Zuber, M. T.; Carr, C. E.

    2016-10-01

    We are developing an integrated nucleic acid extraction and sequencing instrument: the Search for Extra-Terrestrial Genomes (SETG) for in situ life detection on Mars. Our goals are to identify related or unrelated nucleic acid-based life on Mars.

  2. Novel alpha-conotoxins identified by gene sequencing from cone snails native to Hainan, and their sequence diversity.

    PubMed

    Luo, Sulan; Zhangsun, Dongting; Zhang, Ben; Quan, Yaru; Wu, Yong

    2006-11-01

    Conotoxins (CTX) from the venom of marine cone snails (genus Conus) represent large families of proteins, which show a similar precursor organization with surprisingly conserved signal sequence of the precursor peptides, but highly diverse pharmacological activities. By using the conserved sequences found within the genes that encode the alpha-conotoxin precursors, a technique based on RT-PCR was used to identify, respectively, two novel peptides (LiC22, LeD2) from the two worm-hunting Conus species Conus lividus, and Conus litteratus, and one novel peptide (TeA21) from the snail-hunting Conus species Conus textile, all native to Hainan in China. The three peptides share an alpha4/7 subfamily alpha-conotoxins common cysteine pattern (CCX(4)CX(7)C, two disulfide bonds), which are competitive antagonists of nicotinic acetylcholine receptor (nAChRs). The cDNA of LiC22N encodes a precursor of 40 residues, including a propeptide of 19 residues and a mature peptide of 21 residues. The cDNA of LeD2N encodes a precursor of 41 residues, including a propeptide of 21 residues and a mature peptide of 16 residues with three additional Gly residues. The cDNA of TeA21N encodes a precursor of 38 residues, including a propeptide of 20 residues and a mature peptide of 17 residues with an additional residue Gly. The additional residue Gly of LeD2N and TeA21N is a prerequisite for the amidation of the preceding C-terminal Cys. All three sequences are processed at the common signal site -X-Arg- immediately before the mature peptide sequences. The properties of the alpha4/7 conotoxins known so far were discussed in detail. Phylogenetic analysis of the new conotoxins in the present study and the published homologue of alpha4/7 conotoxins from the other Conus species were performed systematically. Patterns of sequence divergence for the three regions of signal, proregion, and mature peptides, both nucleotide acids and residue substitutions in DNA and peptide levels, as well as Cys codon

  3. Potential of DNA sequences to identify zoanthids (Cnidaria: Zoantharia).

    PubMed

    Sinniger, Frederic; Reimer, James D; Pawlowski, Jan

    2008-12-01

    The order Zoantharia is known for its chaotic taxonomy and difficult morphological identification. One method that potentially could help for examining such troublesome taxa is DNA barcoding, which identifies species using standard molecular markers. The mitochondrial cytochrome oxidase subunit I (COI) has been utilized to great success in groups such as birds and insects; however, its applicability in many other groups is controversial. Recently, some studies have suggested that barcoding is not applicable to anthozoans. Here, we examine the use of COI and mitochondrial 16S ribosomal DNA for zoanthid identification. Despite the absence of a clear barcoding gap, our results show that for most of 54 zoanthid samples, both markers could separate samples to the species, or species group, level, particularly when easily accessible ecological or distributional data were included. Additionally, we have used the short V5 region of mt 16S rDNA to identify eight old (13 to 50 years old) museum samples. We discuss advantages and disadvantages of COI and mt 16S rDNA as barcodes for Zoantharia, and recommend that either one or both of these markers be considered for zoanthid identification in the future.

  4. Amino acid sequence of mouse submaxillary gland renin.

    PubMed Central

    Misono, K S; Chang, J J; Inagami, T

    1982-01-01

    The complete amino acid sequences of the heavy chain and light chain of mouse submaxillary gland renin have been determined. The heavy chain consists of 288 amino acid residues having a Mr of 31,036 calculated from the sequence. The light chain contains 48 amino acid residues with a Mr of 5,458. The sequence of the heavy chain was determined by automated Edman degradations of the cyanogen bromide peptides and tryptic peptides generated after citraconylation, as well as other peptides generated therefrom. The sequence of the light chain was derived from sequence analyses of the peptides generated by cyanogen bromide cleavage or by digestion with Staphylococcus aureus protease. The sequences in the active site regions in renin containing two catalytically essential aspartyl residues 32 and 215 were found identical with those in pepsin, chymosin, and penicillopepsin. Comparison of the amino acid sequence of renin with that of porcine pepsin indicated a 42% sequence identity of the heavy chain with the amino-terminal and middle regions and a 46% identity of the light chain with the carboxyl-terminal region of the porcine pepsin sequence. Residues identical in renin and pepsin are distributed throughout the length of the molecules, suggesting a similarity in their overall structures. PMID:6812055

  5. Metadata-driven comparative analysis tool for sequences (meta-CATS): an automated process for identifying significant sequence variations that correlate with virus attributes.

    PubMed

    Pickett, B E; Liu, M; Sadat, E L; Squires, R B; Noronha, J M; He, S; Jen, W; Zaremba, S; Gu, Z; Zhou, L; Larsen, C N; Bosch, I; Gehrke, L; McGee, M; Klem, E B; Scheuermann, R H

    2013-12-01

    The Virus Pathogen Resource (ViPR; www.viprbrc.org) and Influenza Research Database (IRD; www.fludb.org) have developed a metadata-driven Comparative Analysis Tool for Sequences (meta-CATS), which performs statistical comparative analyses of nucleotide and amino acid sequence data to identify correlations between sequence variations and virus attributes (metadata). Meta-CATS guides users through: selecting a set of nucleotide or protein sequences; dividing them into multiple groups based on any associated metadata attribute (e.g. isolation location, host species); performing a statistical test at each aligned position; and identifying all residues that significantly differ between the groups. As proofs of concept, we have used meta-CATS to identify sequence biomarkers associated with dengue viruses isolated from different hemispheres, and to identify variations in the NS1 protein that are unique to each of the 4 dengue serotypes. Meta-CATS is made freely available to virology researchers to identify genotype-phenotype correlations for development of improved vaccines, diagnostics, and therapeutics.

  6. Amino Acid Sequence of Human Cholinesterase

    DTIC Science & Technology

    1985-10-01

    liquid chromatography (HPLC). Activity testing of the aged, DFP-labeled cholinesterase showed that 99.8% of the active sites had been labeled, since...acids were quantitated by ninhydrin at the AAA Labs, or by derivatization with phenylisothiocyanate at the University of Michigan. The latter method

  7. Cell-SELEX Identifies a “Sticky” RNA Aptamer Sequence

    PubMed Central

    2017-01-01

    Cell-SELEX is performed to select for cell binding aptamers. We employed an additional selection pressure by using RNAse to remove surface-binding aptamers and select for cell-internalizing aptamers. A common RNA sequence was identified from independent cell-SELEX procedures against two different pancreatic cancer cell lines, indicating a strong selection pressure towards this sequence from the large pool of other available sequences present in the aptamer library. The aptamer is not specific for the pancreatic cancer cell lines, and a similar sequence motif is present in previously published internalizing aptamers. The identified sequence forms a structural motif that binds to a surface protein, which either is highly abundant or has strong affinity for the selected aptamer sequence. Deselecting (removing) this sequence during cell-SELEX may increase the probability of identifying aptamers against cell type-specific targets on the cell surface. PMID:28194280

  8. Fatal Psychrobacter sp. infection in a pediatric patient with meningitis identified by metagenomic next-generation sequencing in cerebrospinal fluid.

    PubMed

    Ortiz-Alcántara, Joanna María; Segura-Candelas, José Miguel; Garcés-Ayala, Fabiola; Gonzalez-Durán, Elizabeth; Rodríguez-Castillo, Araceli; Alcántara-Pérez, Patricia; Wong-Arámbula, Claudia; González-Villa, Maribel; León-Ávila, Gloria; García-Chéquer, Adda Jeanette; Diaz-Quiñonez, José Alberto; Méndez-Tenorio, Alfonso; Ramírez-González, José Ernesto

    2016-03-01

    The genus Psychrobacter contains environmental, psychrophilic and halotolerant gram-negative bacteria considered rare opportunistic pathogens in humans. Metagenomics was performed on the cerebrospinal fluid (CSF) of a pediatric patient with meningitis. Nucleic acids were extracted, randomly amplified, and sequenced with the 454 GS FLX Titanium next-generation sequencing (NGS) system. Sequencing reads were assembled, and potential virulence genes were predicted. Phylogenomic and phylogenetic studies were performed. Psychrobacter sp. 310 was identified, and several virulence genes characteristic of pathogenic bacteria were found. The phylogenomic study and 16S rRNA gene phylogenetic analysis showed that the closest relative of Psychrobacter sp. 310 was Psychrobacter sanguinis. To our knowledge, this is the first report of a meningitis case associated with Psychrobacter sp. identified by NGS metagenomics in CSF from a pediatric patient. The metagenomic strategy based on NGS was a powerful tool to identify a rare unknown pathogen in a clinical case.

  9. Cystatin. Amino acid sequence and possible secondary structure.

    PubMed Central

    Schwabe, C; Anastasi, A; Crow, H; McDonald, J K; Barrett, A J

    1984-01-01

    The amino acid sequence of cystatin, the protein from chicken egg-white that is a tight-binding inhibitor of many cysteine proteinases, is reported. Cystatin is composed of 116 amino acid residues, and the Mr is calculated to be 13 143. No striking similarity to any other known sequence has been detected. The results of computer analysis of the sequence and c.d. spectrometry indicate that the secondary structure includes relatively little alpha-helix (about 20%) and that the remainder is mainly beta-structure. PMID:6712597

  10. Mouse Vk gene classification by nucleic acid sequence similarity.

    PubMed

    Strohal, R; Helmberg, A; Kroemer, G; Kofler, R

    1989-01-01

    Analyses of immunoglobulin (Ig) variable (V) region gene usage in the immune response, estimates of V gene germline complexity, and other nucleic acid hybridization-based studies depend on the extent to which such genes are related (i.e., sequence similarity) and their organization in gene families. While mouse Igh heavy chain V region (VH) gene families are relatively well-established, a corresponding systematic classification of Igk light chain V region (Vk) genes has not been reported. The present analysis, in the course of which we reviewed the known extent of the Vk germline gene repertoire and Vk gene usage in a variety of responses to foreign and self antigens, provides a classification of mouse Vk genes in gene families composed of members with greater than 80% overall nucleic acid sequence similarity. This classification differed in several aspects from that of VH genes: only some Vk gene families were as clearly separated (by greater than 25% sequence dissimilarity) as typical VH gene families; most Vk gene families were closely related and, in several instances, members from different families were very similar (greater than 80%) over large sequence portions; frequently, classification by nucleic acid sequence similarity diverged from existing classifications based on amino-terminal protein sequence similarity. Our data have implications for Vk gene analyses by nucleic acid hybridization and describe potentially important differences in sequence organization between VH and Vk genes.

  11. Newly identified essential amino acid residues affecting ^8-sphingolipid desaturase activity revealed by site-directed mutagenesis

    Technology Transfer Automated Retrieval System (TEKTRAN)

    In order to identify amino acid residues crucial for the enzymatic activity of ^8-sphingolipid desaturases, a sequence comparison was performed among ^8-sphingolipid desaturases and ^6-fatty acid desaturase from various plants. In addition to the known conserved cytb5 (cytochrome b5) HPGG motif and...

  12. Identifiability of PBPK models with applications to dimethylarsinic acid exposure.

    PubMed

    Garcia, Ramon I; Ibrahim, Joseph G; Wambaugh, John F; Kenyon, Elaina M; Setzer, R Woodrow

    2015-12-01

    Any statistical model should be identifiable in order for estimates and tests using it to be meaningful. We consider statistical analysis of physiologically-based pharmacokinetic (PBPK) models in which parameters cannot be estimated precisely from available data, and discuss different types of identifiability that occur in PBPK models and give reasons why they occur. We particularly focus on how the mathematical structure of a PBPK model and lack of appropriate data can lead to statistical models in which it is impossible to estimate at least some parameters precisely. Methods are reviewed which can determine whether a purely linear PBPK model is globally identifiable. We propose a theorem which determines when identifiability at a set of finite and specific values of the mathematical PBPK model (global discete identifiability) implies identifiability of the statistical model. However, we are unable to establish conditions that imply global discrete identifiability, and conclude that the only safe approach to analysis of PBPK models involves Bayesian analysis with truncated priors. Finally, computational issues regarding posterior simulations of PBPK models are discussed. The methodology is very general and can be applied to numerous PBPK models which can be expressed as linear time-invariant systems. A real data set of a PBPK model for exposure to dimethyl arsinic acid (DMA(V)) is presented to illustrate the proposed methodology.

  13. Whole Exome Sequencing Identifies RAI1 Mutation in a Morbidly Obese Child Diagnosed With ROHHAD Syndrome

    PubMed Central

    Esteves, Kristyn M.; Towne, Meghan C.; Brownstein, Catherine A.; James, Philip M.; Crowley, Laura; Hirschhorn, Joel N.; Elsea, Sarah H.; Beggs, Alan H.; Picker, Jonathan

    2015-01-01

    Context: The current obesity epidemic is attributed to complex interactions between genetic and environmental factors. However, a limited number of cases, especially those with early-onset severe obesity, are linked to single gene defects. Rapid-onset obesity with hypothalamic dysfunction, hypoventilation and autonomic dysregulation (ROHHAD) is one of the syndromes that presents with abrupt-onset extreme weight gain with an unknown genetic basis. Objective: To identify the underlying genetic etiology in a child with morbid early-onset obesity, hypoventilation, and autonomic and behavioral disturbances who was clinically diagnosed with ROHHAD syndrome. Design/Setting/Intervention: The index patient was evaluated at an academic medical center. Whole-exome sequencing was performed on the proband and his parents. Genetic variants were validated by Sanger sequencing. Results: We identified a novel de novo nonsense mutation, c.3265 C>T (p.R1089X), in the retinoic acid-induced 1 (RAI1) gene in the proband. Mutations in the RAI1 gene are known to cause Smith-Magenis syndrome (SMS). On further evaluation, his clinical features were not typical of either SMS or ROHHAD syndrome. Conclusions: This study identifies a de novo RAI1 mutation in a child with morbid obesity and a clinical diagnosis of ROHHAD syndrome. Although extreme early-onset obesity, autonomic disturbances, and hypoventilation are present in ROHHAD, several of the clinical findings are consistent with SMS. This case highlights the challenges in the diagnosis of ROHHAD syndrome and its potential overlap with SMS. We also propose RAI1 as a candidate gene for children with morbid obesity. PMID:25781356

  14. Forensic Loci Allele Database (FLAD): Automatically generated, permanent identifiers for sequenced forensic alleles.

    PubMed

    Van Neste, Christophe; Van Criekinge, Wim; Deforce, Dieter; Van Nieuwerburgh, Filip

    2016-01-01

    It is difficult to predict if and when massively parallel sequencing of forensic STR loci will replace capillary electrophoresis as the new standard technology in forensic genetics. The main benefits of sequencing are increased multiplexing scales and SNP detection. There is not yet a consensus on how sequenced profiles should be reported. We present the Forensic Loci Allele Database (FLAD) service, made freely available on http://forensic.ugent.be/FLAD/. It offers permanent identifiers for sequenced forensic alleles (STR or SNP) and their microvariants for use in forensic allele nomenclature. Analogous to Genbank, its aim is to provide permanent identifiers for forensically relevant allele sequences. Researchers that are developing forensic sequencing kits or are performing population studies, can register on http://forensic.ugent.be/FLAD/ and add loci and allele sequences with a short and simple application interface (API).

  15. Exome sequencing identifies truncating mutations in PRRT2 that cause paroxysmal kinesigenic dyskinesia.

    PubMed

    Chen, Wan-Jin; Lin, Yu; Xiong, Zhi-Qi; Wei, Wei; Ni, Wang; Tan, Guo-He; Guo, Shun-Ling; He, Jin; Chen, Ya-Fang; Zhang, Qi-Jie; Li, Hong-Fu; Lin, Yi; Murong, Shen-Xing; Xu, Jianfeng; Wang, Ning; Wu, Zhi-Ying

    2011-11-20

    Paroxysmal kinesigenic dyskinesia is the most common type of paroxysmal movement disorder and is often misdiagnosed clinically as epilepsy. Using whole-exome sequencing followed by Sanger sequencing, we identified three truncating mutations within PRRT2 (NM_145239.2) in eight Han Chinese families with histories of paroxysmal kinesigenic dyskinesia: c.514_517delTCTG (p.Ser172Argfs*3) in one family, c.649dupC (p.Arg217Profs*8) in six families and c.972delA (p.Val325Serfs*12) in one family. These truncating mutations co-segregated exactly with the disease in these families and were not observed in 1,000 control subjects of matched ancestry. PRRT2 is a newly discovered gene consisting of four exons encoding the proline-rich transmembrane protein 2, which encompasses 340 amino acids and contains two predicted transmembrane domains. PRRT2 is highly expressed in the developing nervous system, and a truncating mutation alters the subcellular localization of the PRRT2 protein. The function of PRRT2 and its role in paroxysmal kinesigenic dyskinesia should be further investigated.

  16. Single-chain structure of human ceruloplasmin: the complete amino acid sequence of the whole molecule.

    PubMed Central

    Takahashi, N; Ortel, T L; Putnam, F W

    1984-01-01

    We have determined the amino acid sequence of the amino-terminal 67,000-dalton (67-kDa) fragment of human ceruloplasmin and have established overlapping sequences between the 67-kDa and 50-kDa fragments and between the 50-kDa and 19-kDa fragments. The 67-kDa fragment contains 480 amino acid residues and three glucosamine oligosaccharides. These results together with our previous sequence data for the 50-kDa and 19-kDa fragments complete the amino acid sequence of human ceruloplasmin. The polypeptide chain has a total of 1,046 amino acid residues (Mr 120,085) and has attachment sites for four glucosamine oligosaccharides; together these account for the total molecular mass of human ceruloplasmin (132 kDa). The sequence analysis of the peptides overlapping the fragments showed that one additional amino acid, arginine, is present between the 67-kDa and 50-kDa fragments, and another, lysine, is between the 50-kDa and 19-kDa fragments. Only two apparent sites of amino acid interchange have been identified in the polypeptide chain. Both involve a single-point interchange of glycine and lysine that would result in a difference in charge. The results of the complete sequence analysis verified that human ceruloplasmin is composed of a single polypeptide chain and that the subunit-like fragments are produced by proteolytic cleavage during purification (and possibly also in vivo). PMID:6582496

  17. Amino acid sequence repertoire of the bacterial proteome and the occurrence of untranslatable sequences

    PubMed Central

    Navon, Sharon Penias; Kornberg, Guy; Chen, Jin; Schwartzman, Tali; Tsai, Albert; Puglisi, Elisabetta Viani; Puglisi, Joseph D.; Adir, Noam

    2016-01-01

    Bioinformatic analysis of Escherichia coli proteomes revealed that all possible amino acid triplet sequences occur at their expected frequencies, with four exceptions. Two of the four underrepresented sequences (URSs) were shown to interfere with translation in vivo and in vitro. Enlarging the URS by a single amino acid resulted in increased translational inhibition. Single-molecule methods revealed stalling of translation at the entrance of the peptide exit tunnel of the ribosome, adjacent to ribosomal nucleotides A2062 and U2585. Interaction with these same ribosomal residues is involved in regulation of translation by longer, naturally occurring protein sequences. The E. coli exit tunnel has evidently evolved to minimize interaction with the exit tunnel and maximize the sequence diversity of the proteome, although allowing some interactions for regulatory purposes. Bioinformatic analysis of the human proteome revealed no underrepresented triplet sequences, possibly reflecting an absence of regulation by interaction with the exit tunnel. PMID:27307442

  18. Amino acid sequence repertoire of the bacterial proteome and the occurrence of untranslatable sequences.

    PubMed

    Navon, Sharon Penias; Kornberg, Guy; Chen, Jin; Schwartzman, Tali; Tsai, Albert; Puglisi, Elisabetta Viani; Puglisi, Joseph D; Adir, Noam

    2016-06-28

    Bioinformatic analysis of Escherichia coli proteomes revealed that all possible amino acid triplet sequences occur at their expected frequencies, with four exceptions. Two of the four underrepresented sequences (URSs) were shown to interfere with translation in vivo and in vitro. Enlarging the URS by a single amino acid resulted in increased translational inhibition. Single-molecule methods revealed stalling of translation at the entrance of the peptide exit tunnel of the ribosome, adjacent to ribosomal nucleotides A2062 and U2585. Interaction with these same ribosomal residues is involved in regulation of translation by longer, naturally occurring protein sequences. The E. coli exit tunnel has evidently evolved to minimize interaction with the exit tunnel and maximize the sequence diversity of the proteome, although allowing some interactions for regulatory purposes. Bioinformatic analysis of the human proteome revealed no underrepresented triplet sequences, possibly reflecting an absence of regulation by interaction with the exit tunnel.

  19. Transcriptome Sequencing in Response to Salicylic Acid in Salvia miltiorrhiza

    PubMed Central

    Zhang, Xiaoru; Dong, Juane; Liu, Hailong; Wang, Jiao; Qi, Yuexin; Liang, Zongsuo

    2016-01-01

    Salvia miltiorrhiza is a traditional Chinese herbal medicine, whose quality and yield are often affected by diseases and environmental stresses during its growing season. Salicylic acid (SA) plays a significant role in plants responding to biotic and abiotic stresses, but the involved regulatory factors and their signaling mechanisms are largely unknown. In order to identify the genes involved in SA signaling, the RNA sequencing (RNA-seq) strategy was employed to evaluate the transcriptional profiles in S. miltiorrhiza cell cultures. A total of 50,778 unigenes were assembled, in which 5,316 unigenes were differentially expressed among 0-, 2-, and 8-h SA induction. The up-regulated genes were mainly involved in stimulus response and multi-organism process. A core set of candidate novel genes coding SA signaling component proteins was identified. Many transcription factors (e.g., WRKY, bHLH and GRAS) and genes involved in hormone signal transduction were differentially expressed in response to SA induction. Detailed analysis revealed that genes associated with defense signaling, such as antioxidant system genes, cytochrome P450s and ATP-binding cassette transporters, were significantly overexpressed, which can be used as genetic tools to investigate disease resistance. Our transcriptome analysis will help understand SA signaling and its mechanism of defense systems in S. miltiorrhiza. PMID:26808150

  20. Genome sequence, prevalence and quantification of the first iflavirus identified in a phytoplasma insect vector.

    PubMed

    Abbà, Simona; Galetto, Luciana; Vallino, Marta; Rossi, Marika; Turina, Massimo; Sicard, Anne; Marzachì, Cristina

    2017-03-01

    The leafhopper Euscelidius variegatus is a natural vector of chrysanthemum yellows phytoplasma (CY) and an efficient vector of flavescence dorée phytoplasma (FD) under laboratory conditions. During a transcriptome sequencing (RNA-seq) project aimed at investigating the interactions between the insect and the two phytoplasmas, a 10,616-nucleotide-long contig with high sequence similarity to known picorna-like viruses was identified among the assembled insect transcripts. The discovery came totally unexpected, because insects from the laboratory colony did not show any evident symptom that could be related to the presence of a virus. The amino acid sequence, the shape and size of viral particles, and the results of phylogenetic analysis suggest that this virus, named Euscelidius variegatus virus 1 (EVV-1), can be considered a new member of a new species in the genus Iflavirus. EVV-1 was detected in all of the tested insects from the laboratory colony used for RNA-seq, both in phytoplasma-exposed and in non-exposed insects, but the viral load measured in FD-exposed samples was significantly lower than that in non-exposed insects. This result suggests the possible existence of an intriguing cross-talk among insects, endogenous bacteria, and viruses. The identification of two other E. variegatus laboratory colonies that were free of EVV-1 could represent the key to addressing some basic virological issues, e.g., viral replication and transmission mechanisms, and offer the opportunity to use infectious clones to express heterologous genes in the leafhopper and manipulate the expression of endogenous genes by promoting virus-induced gene silencing.

  1. Amino acid sequences of proteins from Leptospira serovar pomona.

    PubMed

    Alves, S F; Lefebvre, R B; Probert, W

    2000-01-01

    This report describes a partial amino acid sequences from three putative outer envelope proteins from Leptospira serovar pomona. In order to obtain internal fragments for protein sequencing, enzymatic and chemical digestion was performed. The enzyme clostripain was used to digest the proteins 32 and 45 kDa. In situ digestion of 40 kDa molecular weight protein was accomplished using cyanogen bromide. The 32 kDa protein generated two fragments, one of 21 kDa and another of 10 kDa that yielded five residues. A fragment of 24 kDa that yielded nineteen residues of amino acids was obtained from 45 kDa protein. A fragment with a molecular weight of 20 kDa, yielding a twenty amino acids sequence from the 40 kDa protein.

  2. Extensive amino acid sequence homologies between animal lectins

    SciTech Connect

    Paroutaud, P.; Levi, G.; Teichberg, V.I.; Strosberg, A.D.

    1987-09-01

    The authors have established the amino acid sequence of the ..beta..-D-galactoside binding lectin from the electric eel and the sequences of several peptides from a similar lectin isolated from human placenta. These sequences were compared with the published sequences of peptides derived from the ..beta..-D-galactoside binding lectin from human lung and with sequences deduced from cDNAs assigned to the ..beta..-D-galactoside binding lectins from chicken embryo skin and human hepatomas. Significant homologies were observed. One of the highly conserved regions that contains a tryptophan residue and two glutamic acid resides is probably part of the ..beta..-D-galactoside binding site, which, on the basis of spectroscopic studies of the electric eel lectin, is expected to contain such residues. The similarity of the hydropathy profiles and the predicted secondary structure of the lectins from chicken skin and electric eel, in spite of differences in their amino acid sequences, strongly suggests that these proteins have maintained structural homologies during evolution and together with the other ..beta..-D-galactoside binding lectins were derived form a common ancestor gene.

  3. Amino acid sequence of porcine spleen cathepsin D.

    PubMed Central

    Shewale, J G; Tang, J

    1984-01-01

    The amino acid sequence of porcine spleen cathepsin D heavy chain has been determined and, hence, the complete structure of this enzyme is now known. The sequence of heavy chain was constructed by aligning the structures of peptides generated by cyanogen bromide, trypsin, and endo-proteinase Lys C cleavages. The structure of the light chain has been published previously. The cathepsin D molecule contains 339 amino acid residues in two polypeptide chains: a 97-residue light chain and a 242-residue heavy chain, with a combined Mr of 36,779 (without carbohydrate). There are two carbohydrate units linked to asparagine residues 70 and 192. The disulfide bond arrangement in cathepsin D is probably similar to that of pepsin, because the positions of six half-cystine residues are conserved. The active site aspartyl residues, corresponding to aspartic acid-32 and -215 of pepsin, are located at residues 33 and 224 in the cathepsin D molecule. The amino acid sequence around these aspartyl residues is strongly conserved. Cathepsin D shows a strong homology with other acid proteases. When the sequence of cathepsin D, renin, and pepsin are aligned, 32.7% of the residues are identical. The homology is observed throughout the length of the molecules, indicating that three-dimensional structures of all three molecules are similar. PMID:6587385

  4. Complete genome sequence of a novel potyvirus, callistephus mottle virus, identified in Callistephus chinensis.

    PubMed

    Seo, Eun-Young; Lim, Seungmo; Hammond, John; Moon, Jae Sun; Lim, Hyoun-Sub

    2016-11-01

    The complete genomic sequence of a novel putative member of the genus Potyvirus was detected from Callistephus chinensis (China aster) in South Korea. The genomic RNA consists of 9,859 nucleotides (excluding the 3' poly(A) tail) and contains the typical open reading frame of potyviruses, encoding a putative large polyprotein of 3,154 amino acids. The Callistephus virus is most closely related to plum pox virus and members of the ApVY subgroup which showed 50-52 % polyprotein amino acid sequence identity. These results suggest that the Callistephus virus is a novel member of the genus Potyvirus, tentatively named "callistephus mottle virus" (CalMV).

  5. Transcriptome Sequencing of Chemically Induced Aquilaria sinensis to Identify Genes Related to Agarwood Formation

    PubMed Central

    Ye, Wei; Wu, Hongqing; He, Xin; Wang, Lei; Zhang, Weimin; Li, Haohua; Fan, Yunfei; Tan, Guohui; Liu, Taomei; Gao, Xiaoxia

    2016-01-01

    Background Agarwood is a traditional Chinese medicine used as a clinical sedative, carminative, and antiemetic drug. Agarwood is formed in Aquilaria sinensis when A. sinensis trees are threatened by external physical, chemical injury or endophytic fungal irritation. However, the mechanism of agarwood formation via chemical induction remains unclear. In this study, we characterized the transcriptome of different parts of a chemically induced A. sinensis trunk sample with agarwood. The Illumina sequencing platform was used to identify the genes involved in agarwood formation. Methodology/Principal Findings A five-year-old Aquilaria sinensis treated by formic acid was selected. The white wood part (B1 sample), the transition part between agarwood and white wood (W2 sample), the agarwood part (J3 sample), and the rotten wood part (F5 sample) were collected for transcriptome sequencing. Accordingly, 54,685,634 clean reads, which were assembled into 83,467 unigenes, were obtained with a Q20 value of 97.5%. A total of 50,565 unigenes were annotated using the Nr, Nt, SWISS-PROT, KEGG, COG, and GO databases. In particular, 171,331,352 unigenes were annotated by various pathways, including the sesquiterpenoid (ko00909) and plant–pathogen interaction (ko03040) pathways. These pathways were related to sesquiterpenoid biosynthesis and defensive responses to chemical stimulation. Conclusions/Significance The transcriptome data of the different parts of the chemically induced A. sinensis trunk provide a rich source of materials for discovering and identifying the genes involved in sesquiterpenoid production and in defensive responses to chemical stimulation. This study is the first to use de novo sequencing and transcriptome assembly for different parts of chemically induced A. sinensis. Results demonstrate that the sesquiterpenoid biosynthesis pathway and WRKY transcription factor play important roles in agarwood formation via chemical induction. The comparative analysis of

  6. Active site amino acid sequence of human factor D.

    PubMed

    Davis, A E

    1980-08-01

    Factor D was isolated from human plasma by chromatography on CM-Sephadex C50, Sephadex G-75, and hydroxylapatite. Digestion of reduced, S-carboxymethylated factor D with cyanogen bromide resulted in three peptides which were isolated by chromatography on Sephadex G-75 (superfine) equilibrated in 20% formic acid. NH2-Terminal sequences were determined by automated Edman degradation with a Beckman 890C sequencer using a 0.1 M Quadrol program. The smallest peptide (CNBr III) consisted of the NH2-terminal 14 amino acids. The other two peptides had molecular weights of 17,000 (CNBr I) and 7000 (CNBr II). Overlap of the NH2-terminal sequence of factor D with the NH2-terminal sequence of CNBr I established the order of the peptides. The NH2-terminal 53 residues of factor D are somewhat more homologous with the group-specific protease of rat intestine than with other serine proteases. The NH2-terminal sequence of CNBr II revealed the active site serine of factor D. The typical serine protease active site sequence (Gly-Asp-Ser-Gly-Gly-Pro was found at residues 12-17. The region surrounding the active site serine does not appear to be more highly homologous with any one of the other serine proteases. The structural data obtained point out the similarities between factor D and the other proteases. However, complete definition of the degree of relationship between factor D and other proteases will require determination of the remainder of the primary structure.

  7. MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island.

    PubMed

    Ashton, Philip M; Nair, Satheesh; Dallman, Tim; Rubino, Salvatore; Rabsch, Wolfgang; Mwaigwisya, Solomon; Wain, John; O'Grady, Justin

    2015-03-01

    Short-read, high-throughput sequencing technology cannot identify the chromosomal position of repetitive insertion sequences that typically flank horizontally acquired genes such as bacterial virulence genes and antibiotic resistance genes. The MinION nanopore sequencer can produce long sequencing reads on a device similar in size to a USB memory stick. Here we apply a MinION sequencer to resolve the structure and chromosomal insertion site of a composite antibiotic resistance island in Salmonella Typhi Haplotype 58. Nanopore sequencing data from a single 18-h run was used to create a scaffold for an assembly generated from short-read Illumina data. Our results demonstrate the potential of the MinION device in clinical laboratories to fully characterize the epidemic spread of bacterial pathogens.

  8. The amino acid sequence of iguana (Iguana iguana) pancreatic ribonuclease.

    PubMed

    Zhao, W; Beintema, J J; Hofsteenge, J

    1994-01-15

    The pyrimidine-specific ribonuclease superfamily constitutes a group of homologous proteins so far found only in higher vertebrates. Four separate families are found in mammals, which have resulted from gene duplications in mammalian ancestors. To learn more about the evolutionary history of this superfamily, the primary structure and other characteristics of the pancreatic enzyme from iguana (Iguana iguana), a herbivorous lizard species belonging to the reptiles, have been determined. The polypeptide chain consists of 119 amino acid residues. The positions of insertions and deletions in the sequence are identical to those in the enzyme from snapping turtle. However, the two enzymes differ at 54% of the amino acid positions. Iguana ribonuclease contains no carbohydrate, although the enzyme possesses three recognition sites for carbohydrate attachment, and has a high number of acidic residues in a localized part of the sequence.

  9. A Possible Novel Prosthetic Joint Infection Pathogen, Mycoplasma salivarium, Identified by Metagenomic Shotgun Sequencing.

    PubMed

    Thoendel, Matthew; Jeraldo, Patricio; Greenwood-Quaintance, Kerryl E; Chia, Nicholas; Abdel, Matthew P; Steckelberg, James M; Osmon, Douglas R; Patel, Robin

    2017-04-01

    Defining the microbial etiology of culture-negative prosthetic joint infection (PJI) can be challenging. Metagenomic shotgun sequencing is a new tool to identify organisms undetected by conventional methods. We present a case where metagenomics was used to identify Mycoplasma salivarium as a novel PJI pathogen in a hypogammaglobulinemic individual.

  10. Functional Brain Activation Differences in Stuttering Identified with a Rapid fMRI Sequence

    ERIC Educational Resources Information Center

    Loucks, Torrey; Kraft, Shelly Jo; Choo, Ai Leen; Sharma, Harish; Ambrose, Nicoline G.

    2011-01-01

    The purpose of this study was to investigate whether brain activity related to the presence of stuttering can be identified with rapid functional MRI (fMRI) sequences that involved overt and covert speech processing tasks. The long-term goal is to develop sensitive fMRI approaches with developmentally appropriate tasks to identify deviant speech…

  11. Amino acid sequence of bovine heart coupling factor 6.

    PubMed Central

    Fang, J K; Jacobs, J W; Kanner, B I; Racker, E; Bradshaw, R A

    1984-01-01

    The amino acid sequence of bovine heart mitochondrial coupling factor 6 (F6) has been determined by automated Edman degradation of the whole protein and derived peptides. Preparations based on heat precipitation and ethanol extraction showed allotypic variation at three positions while material further purified by HPLC yielded only one sequence that also differed by a Phe-Thr replacement at residue 62. The mature protein contains 76 amino acids with a calculated molecular weight of 9006 and a pI of approximately equal to 5, in good agreement with experimentally measured values. The charged amino acids are mainly clustered at the termini and in one section in the middle; these three polar segments are separated by two segments relatively rich in nonpolar residues. Chou-Fasman analysis suggests three stretches of alpha-helix coinciding (or within) the high-charge-density sequences with a single beta-turn at the first polar-nonpolar junction. Comparison of the F6 sequence with those of other proteins did not reveal any homologous structures. PMID:6149548

  12. Amino acid sequence and comparative antigenicity of chicken metallothionein.

    PubMed Central

    McCormick, C C; Fullmer, C S; Garvey, J S

    1988-01-01

    The complete amino acid sequence of metallothionein (MT) from chicken liver is reported. The primary structure was determined by automated sequence analysis of peptides produced by limited acid hydrolysis and by trypsin digestion. The comparative antigenicity of chicken MT was determined by radioimmunoassay using rabbit anti-rat MT polyclonal antibody. Chicken MT consists of 63 amino acids as compared to 61 found in MTs from mammals. One insertion (and two substitutions) occurs in the amino-terminal region, a region considered invariant among mammalian MTs. Eighteen of the 20 cysteines in chicken MT were aligned with cysteines from other mammalian sequences. Two cysteines near the carboxyl terminus are shifted by one residue due to the insertion of proline in that region. Overall, the chicken protein showed approximately equal to 68% sequence identity in a comparison with various mammalian MTs. The affinity of the polyclonal antibody for chicken MT was decreased by 2 orders of magnitude in comparison to that of a mammalian MT (rat MT isoforms). This reduced affinity is attributed to major substitutions in chicken MT in the regions of the principal determinants of mammalian MTs. Theoretical analysis of the primary structure predicted the secondary structure to consist of reverse turns and random coils with no stable beta or helix conformations. There is no evidence that chicken MT differs functionally from mammalian MTs. PMID:2448773

  13. Transitive Homology-Guided Structural Studies Lead to Discovery of Cro Proteins With 40% Sequence Identify But Different Folds

    SciTech Connect

    Roessler, C.G.; Hall, B.M.; Anderson, W.J.; Ingram, W.M.; Roberts, S.A.; Montfort, W.R.; Cordes, M.H.J.

    2009-05-27

    Proteins that share common ancestry may differ in structure and function because of divergent evolution of their amino acid sequences. For a typical diverse protein superfamily, the properties of a few scattered members are known from experiment. A satisfying picture of functional and structural evolution in relation to sequence changes, however, may require characterization of a larger, well chosen subset. Here, we employ a 'stepping-stone' method, based on transitive homology, to target sequences intermediate between two related proteins with known divergent properties. We apply the approach to the question of how new protein folds can evolve from preexisting folds and, in particular, to an evolutionary change in secondary structure and oligomeric state in the Cro family of bacteriophage transcription factors, initially identified by sequence-structure comparison of distant homologs from phages P22 and {lambda}. We report crystal structures of two Cro proteins, Xfaso 1 and Pfl 6, with sequences intermediate between those of P22 and {lambda}. The domains show 40% sequence identity but differ by switching of {alpha}-helix to {beta}-sheet in a C-terminal region spanning {approx}25 residues. Sedimentation analysis also suggests a correlation between helix-to-sheet conversion and strengthened dimerization.

  14. Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification

    PubMed Central

    Sinclair, Robert M.; Ravantti, Janne J.

    2017-01-01

    ABSTRACT Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids

  15. Sequences Of Amino Acids For Human Serum Albumin

    NASA Technical Reports Server (NTRS)

    Carter, Daniel C.

    1992-01-01

    Sequences of amino acids defined for use in making polypeptides one-third to one-sixth as large as parent human serum albumin molecule. Smaller, chemically stable peptides have diverse applications including service as artificial human serum and as active components of biosensors and chromatographic matrices. In applications involving production of artificial sera from new sequences, little or no concern about viral contaminants. Smaller genetically engineered polypeptides more easily expressed and produced in large quantities, making commercial isolation and production more feasible and profitable.

  16. Identifying driver mutations in sequenced cancer genomes: computational approaches to enable precision medicine

    PubMed Central

    2014-01-01

    High-throughput DNA sequencing is revolutionizing the study of cancer and enabling the measurement of the somatic mutations that drive cancer development. However, the resulting sequencing datasets are large and complex, obscuring the clinically important mutations in a background of errors, noise, and random mutations. Here, we review computational approaches to identify somatic mutations in cancer genome sequences and to distinguish the driver mutations that are responsible for cancer from random, passenger mutations. First, we describe approaches to detect somatic mutations from high-throughput DNA sequencing data, particularly for tumor samples that comprise heterogeneous populations of cells. Next, we review computational approaches that aim to predict driver mutations according to their frequency of occurrence in a cohort of samples, or according to their predicted functional impact on protein sequence or structure. Finally, we review techniques to identify recurrent combinations of somatic mutations, including approaches that examine mutations in known pathways or protein-interaction networks, as well as de novo approaches that identify combinations of mutations according to statistical patterns of mutual exclusivity. These techniques, coupled with advances in high-throughput DNA sequencing, are enabling precision medicine approaches to the diagnosis and treatment of cancer. PMID:24479672

  17. Whole Genome Sequencing Demonstrates Limited Transmission within Identified Mycobacterium tuberculosis Clusters in New South Wales, Australia

    PubMed Central

    Gurjav, Ulziijargal; Outhred, Alexander C.; Jelfs, Peter; McCallum, Nadine; Wang, Qinning; Hill-Cawthorne, Grant A.; Marais, Ben J.; Sintchenko, Vitali

    2016-01-01

    Australia has a low tuberculosis incidence rate with most cases occurring among recent immigrants. Given suboptimal cluster resolution achieved with 24-locus mycobacterium interspersed repetitive unit (MIRU-24) genotyping, the added value of whole genome sequencing was explored. MIRU-24 profiles of all Mycobacterium tuberculosis culture-confirmed tuberculosis cases diagnosed between 2009 and 2013 in New South Wales (NSW), Australia, were examined and clusters identified. The relatedness of cases within the largest MIRU-24 clusters was assessed using whole genome sequencing and phylogenetic analyses. Of 1841 culture-confirmed TB cases, 91.9% (1692/1841) had complete demographic and genotyping data. East-African Indian (474; 28.0%) and Beijing (470; 27.8%) lineage strains predominated. The overall rate of MIRU-24 clustering was 20.1% (340/1692) and was highest among Beijing lineage strains (35.7%; 168/470). One Beijing and three East-African Indian (EAI) clonal complexes were responsible for the majority of observed clusters. Whole genome sequencing of the 4 largest clusters (30 isolates) demonstrated diverse single nucleotide polymorphisms (SNPs) within identified clusters. All sequenced EAI strains and 70% of Beijing lineage strains clustered by MIRU-24 typing demonstrated distinct SNP profiles. The superior resolution provided by whole genome sequencing demonstrated limited M. tuberculosis transmission within NSW, even within identified MIRU-24 clusters. Routine whole genome sequencing could provide valuable public health guidance in low burden settings. PMID:27737005

  18. Nanopores and nucleic acids: prospects for ultrarapid sequencing

    NASA Technical Reports Server (NTRS)

    Deamer, D. W.; Akeson, M.

    2000-01-01

    DNA and RNA molecules can be detected as they are driven through a nanopore by an applied electric field at rates ranging from several hundred microseconds to a few milliseconds per molecule. The nanopore can rapidly discriminate between pyrimidine and purine segments along a single-stranded nucleic acid molecule. Nanopore detection and characterization of single molecules represents a new method for directly reading information encoded in linear polymers. If single-nucleotide resolution can be achieved, it is possible that nucleic acid sequences can be determined at rates exceeding a thousand bases per second.

  19. Identifying New Drug Targets for Potent Phospholipase D Inhibitors: Combining Sequence Alignment, Molecular Docking, and Enzyme Activity/Binding Assays.

    PubMed

    Djakpa, Helene; Kulkarni, Aditya; Barrows-Murphy, Scheneque; Miller, Greg; Zhou, Weihong; Cho, Hyejin; Török, Béla; Stieglitz, Kimberly

    2016-05-01

    Phospholipase D enzymes cleave phospholipid substrates generating choline and phosphatidic acid. Phospholipase D from Streptomyces chromofuscus is a non-HKD (histidine, lysine, and aspartic acid) phospholipase D as the enzyme is more similar to members of the diverse family of metallo-phosphodiesterase/phosphatase enzymes than phospholipase D enzymes with active site HKD repeats. A highly efficient library of phospholipase D inhibitors based on 1,3-disubstituted-4-amino-pyrazolopyrimidine core structure was utilized to evaluate the inhibition of purified S. chromofuscus phospholipase D. The molecules exhibited inhibition of phospholipase D activity (IC50 ) in the nanomolar range with monomeric substrate diC4 PC and micromolar range with phospholipid micelles and vesicles. Binding studies with vesicle substrate and phospholipase D strongly indicate that these inhibitors directly block enzyme vesicle binding. Following these compelling results as a starting point, sequence searches and alignments with S. chromofuscus phospholipase D have identified potential new drug targets. Using AutoDock, inhibitors were docked into the enzymes selected from sequence searches and alignments (when 3D co-ordinates were available) and results analyzed to develop next-generation inhibitors for new targets. In vitro enzyme activity assays with several human phosphatases demonstrated that the predictive protocol was accurate. The strategy of combining sequence comparison, docking, and high-throughput screening assays has helped to identify new drug targets and provided some insight into how to make potential inhibitors more specific to desired targets.

  20. New mutations in flagellar motors identified by whole genome sequencing in Chlamydomonas

    PubMed Central

    2013-01-01

    Background The building of a cilium or flagellum requires molecular motors and associated proteins that allow the relocation of proteins from the cell body to the distal end and the return of proteins to the cell body in a process termed intraflagellar transport (IFT). IFT trains are carried out by kinesin and back to the cell body by dynein. Methods We used whole genome sequencing to identify the causative mutations for two temperature-sensitive flagellar assembly mutants in Chlamydomonas and validated the changes using reversion analysis. We examined the effect of these mutations on the localization of IFT81, an IFT complex B protein, the cytoplasmic dynein heavy chain (DHC1b), and the dynein light intermediate chain (D1bLIC). Results The strains, fla18 and fla24, have mutations in kinesin-2 and cytoplasmic dynein, respectively. The fla18 mutation alters the same glutamic acid (E24G) mutated in the fla10-14 allele (E24K). The fla18 strain loses flagella at 32?C more rapidly than the E24K allele but less rapidly than the fla10-1 allele. The fla18 mutant loses its flagella by detachment rather than by shortening. The fla24 mutation falls in cytoplasmic dynein and changes a completely conserved amino acid (L3243P) in an alpha helix in the AAA5 domain. The fla24 mutant loses its flagella by shortening within 6 hours at 32?C. DHC1b protein is reduced by 18-fold and D1bLIC is reduced by 16-fold at 21?C compared to wild-type cells. We identified two pseudorevertants (L3243S and L3243R), which remain flagellated at 32?C. Although fla24 cells assemble full-length flagella at 21?C, IFT81 protein localization is dramatically altered. Instead of localizing at the basal body and along the flagella, IFT81 is concentrated at the proximal end of the flagella. The pseudorevertants show wild-type IFT81 localization at 21?C, but proximal end localization of IFT81 at 32?C. Conclusions The change in the AAA5 domain of the cytoplasmic dynein in fla24 may block the recycling of IFT

  1. Close Sequence Comparisons are Sufficient to Identify Humancis-Regulatory Elements

    SciTech Connect

    Prabhakar, Shyam; Poulin, Francis; Shoukry, Malak; Afzal, Veena; Rubin, Edward M.; Couronne, Olivier; Pennacchio, Len A.

    2005-12-01

    Cross-species DNA sequence comparison is the primary method used to identify functional noncoding elements in human and other large genomes. However, little is known about the relative merits of evolutionarily close and distant sequence comparisons, due to the lack of a universal metric for sequence conservation, and also the paucity of empirically defined benchmark sets of cis-regulatory elements. To address this problem, we developed a general-purpose algorithm (Gumby) that detects slowly-evolving regions in primate, mammalian and more distant comparisons without requiring adjustment of parameters, and ranks conserved elements by P-value using Karlin-Altschul statistics. We benchmarked Gumby predictions against previously identified cis-regulatory elements at diverse genomic loci, and also tested numerous extremely conserved human-rodent sequences for transcriptional enhancer activity using reporter-gene assays in transgenic mice. Human regulatory elements were identified with acceptable sensitivity and specificity by comparison with 1-5 other eutherian mammals or 6 other simian primates. More distant comparisons (marsupial, avian, amphibian and fish) failed to identify many of the empirically defined functional noncoding elements. We derived an intuitive relationship between ancient and recent noncoding sequence conservation from whole genome comparative analysis, which explains some of these findings. Lastly, we determined that, in addition to strength of conservation, genomic location and/or density of surrounding conserved elements must also be considered in selecting candidate enhancers for testing at embryonic time points.

  2. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion

    PubMed Central

    Thomsen, Martin Christen Frølund; Nielsen, Morten

    2012-01-01

    Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally valuable information related to amino acid depletion. Seq2logo aims at resolving these issues allowing the user to include sequence weighting to correct for data redundancy, pseudo counts to correct for low number of observations and different logotype representations each capturing different aspects related to amino acid enrichment and depletion. Besides allowing input in the format of peptides and MSA, Seq2Logo accepts input as Blast sequence profiles, providing easy access for non-expert end-users to characterize and identify functionally conserved/variable amino acids in any given protein of interest. The output from the server is a sequence logo and a PSSM. Seq2Logo is available at http://www.cbs.dtu.dk/biotools/Seq2Logo (14 May 2012, date last accessed). PMID:22638583

  3. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion.

    PubMed

    Thomsen, Martin Christen Frølund; Nielsen, Morten

    2012-07-01

    Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally valuable information related to amino acid depletion. Seq2logo aims at resolving these issues allowing the user to include sequence weighting to correct for data redundancy, pseudo counts to correct for low number of observations and different logotype representations each capturing different aspects related to amino acid enrichment and depletion. Besides allowing input in the format of peptides and MSA, Seq2Logo accepts input as Blast sequence profiles, providing easy access for non-expert end-users to characterize and identify functionally conserved/variable amino acids in any given protein of interest. The output from the server is a sequence logo and a PSSM. Seq2Logo is available at http://www.cbs.dtu.dk/biotools/Seq2Logo (14 May 2012, date last accessed).

  4. Internet-Accessible DNA Sequence Database for Identifying Fusaria from Human and Animal Infections ▿

    PubMed Central

    O'Donnell, Kerry; Sutton, Deanna A.; Rinaldi, Michael G.; Sarver, Brice A. J.; Balajee, S. Arunmozhi; Schroers, Hans-Josef; Summerbell, Richard C.; Robert, Vincent A. R. G.; Crous, Pedro W.; Zhang, Ning; Aoki, Takayuki; Jung, Kyongyong; Park, Jongsun; Lee, Yong-Hwan; Kang, Seogchan; Park, Bongsoo; Geiser, David M.

    2010-01-01

    Because less than one-third of clinically relevant fusaria can be accurately identified to species level using phenotypic data (i.e., morphological species recognition), we constructed a three-locus DNA sequence database to facilitate molecular identification of the 69 Fusarium species associated with human or animal mycoses encountered in clinical microbiology laboratories. The database comprises partial sequences from three nuclear genes: translation elongation factor 1α (EF-1α), the largest subunit of RNA polymerase (RPB1), and the second largest subunit of RNA polymerase (RPB2). These three gene fragments can be amplified by PCR and sequenced using primers that are conserved across the phylogenetic breadth of Fusarium. Phylogenetic analyses of the combined data set reveal that, with the exception of two monotypic lineages, all clinically relevant fusaria are nested in one of eight variously sized and strongly supported species complexes. The monophyletic lineages have been named informally to facilitate communication of an isolate's clade membership and genetic diversity. To identify isolates to the species included within the database, partial DNA sequence data from one or more of the three genes can be used as a BLAST query against the database which is Web accessible at FUSARIUM-ID (http://isolate.fusariumdb.org) and the Centraalbureau voor Schimmelcultures (CBS-KNAW) Fungal Biodiversity Center (http://www.cbs.knaw.nl/fusarium). Alternatively, isolates can be identified via phylogenetic analysis by adding sequences of unknowns to the DNA sequence alignment, which can be downloaded from the two aforementioned websites. The utility of this database should increase significantly as members of the clinical microbiology community deposit in internationally accessible culture collections (e.g., CBS-KNAW or the Fusarium Research Center) cultures of novel mycosis-associated fusaria, along with associated, corrected sequence chromatograms and data, so that the

  5. Trichomonas vaginalis acidic phospholipase A2: isolation and partial amino acid sequence.

    PubMed

    Escobedo-Guajardo, Brenda L; González-Salazar, Francisco; Palacios-Corona, Rebeca; Torres de la Cruz, Víctor M; Morales-Vallarta, Mario; Mata-Cárdenas, Benito D; Garza-González, Jesús N; Rivera-Silva, Gerardo; Vargas-Villarreal, Javier

    2013-12-01

    Sexually transmitted diseases are a major cause of acute disease worldwide, and trichomoniasis is the most common and curable disease, generating more than 170 million cases annually worldwide. Trichomonas vaginalis is the causal agent of trichomoniasis and has the ability to destroy in vitro cell monolayers of the vaginal mucosa, where the phospholipases A2 (PLA2) have been reported as potential virulence factors. These enzymes have been partially characterized from the subcellular fraction S30 of pathogenic T. vaginalis strains. The main objective of this study was to purify a phospholipase A2 from T. vaginalis, make a partial characterization, obtain a partial amino acid sequence, and determine its enzymatic participation as hemolytic factor causing lysis of erythrocytes. Trichomonas S30, RF30 and UFF30 sub-fractions from GT-15 strain have the capacity to hydrolyze [2-(14)C-PA]-PC at pH 6.0. Proteins from the UFF30 sub-fraction were separated by affinity chromatography into two eluted fractions with detectable PLA A2 activity. The EDTA-eluted fraction was analyzed by HPLC using on-line HPLC-tandem mass spectrometry and two protein peaks were observed at 8.2 and 13 kDa. Peptide sequences were identified from the proteins present in the eluted EDTA UFF30 fraction; bioinformatic analysis using Protein Link Global Server charged with T. vaginalis protein database suggests that eluted peptides correspond a putative ubiquitin protein in the 8.2 kDa fraction and a phospholipase preserved in the 13 kDa fraction. The EDTA-eluted fraction hydrolyzed [2-(14)C-PA]-PC lyses erythrocytes from Sprague-Dawley in a time and dose-dependent manner. The acidic hemolytic activity decreased by 84% with the addition of 100 μM of Rosenthal's inhibitor.

  6. Identifying recombinants in human and primate immunodeficiency virus sequence alignments using quartet scanning

    PubMed Central

    Lemey, Philippe; Lott, Martin; Martin, Darren P; Moulton, Vincent

    2009-01-01

    Background Recombination has a profound impact on the evolution of viruses, but characterizing recombination patterns in molecular sequences remains a challenging endeavor. Despite its importance in molecular evolutionary studies, identifying the sequences that exhibit such patterns has received comparatively less attention in the recombination detection framework. Here, we extend a quartet-mapping based recombination detection method to enable identification of recombinant sequences without prior specifications of either query and reference sequences. Through simulations we evaluate different recombinant identification statistics and significance tests. We compare the quartet approach with triplet-based methods that employ additional heuristic tests to identify parental and recombinant sequences. Results Analysis of phylogenetic simulations reveal that identifying the descendents of relatively old recombination events is a challenging task for all methods available, and that quartet scanning performs relatively well compared to the triplet based methods. The use of quartet scanning is further demonstrated by analyzing both well-established and putative HIV-1 recombinant strains. In agreement with recent findings, we provide evidence that the presumed circulating recombinant CRF02_AG is a 'pure' lineage, whereas the presumed parental lineage subtype G has a recombinant origin. We also demonstrate HIV-1 intrasubtype recombination, confirm the hybrid origin of SIV in chimpanzees and further disentangle the recombinant history of SIV lineages in a primate immunodeficiency virus data set. Conclusion Quartet scanning makes a valuable addition to triplet-based methods for identifying recombinant sequences without prior specifications of either query and reference sequences. The new method is available in the VisRD v.3.0 package . PMID:19397803

  7. Complete genome sequence of Lactococcus lactis IO-1, a lactic acid bacterium that utilizes xylose and produces high levels of L-lactic acid.

    PubMed

    Kato, Hiroaki; Shiwa, Yuh; Oshima, Kenshiro; Machii, Miki; Araya-Kojima, Tomoko; Zendo, Takeshi; Shimizu-Kadota, Mariko; Hattori, Masahira; Sonomoto, Kenji; Yoshikawa, Hirofumi

    2012-04-01

    We report the complete genome sequence of Lactococcus lactis IO-1 (= JCM7638). It is a nondairy lactic acid bacterium, produces nisin Z, ferments xylose, and produces predominantly L-lactic acid at high xylose concentrations. From ortholog analysis with other five L. lactis strains, IO-1 was identified as L. lactis subsp. lactis.

  8. Respiratory syncytial virus fusion glycoprotein: nucleotide sequence of mRNA, identification of cleavage activation site and amino acid sequence of N-terminus of F1 subunit.

    PubMed Central

    Elango, N; Satake, M; Coligan, J E; Norrby, E; Camargo, E; Venkatesan, S

    1985-01-01

    The amino acid sequence of respiratory syncytial virus fusion protein (Fo) was deduced from the sequence of a partial cDNA clone of mRNA and from the 5' mRNA sequence obtained by primer extension and dideoxysequencing. The encoded protein of 574 amino acids is extremely hydrophobic and has a molecular weight of 63371 daltons. The site of proteolytic cleavage within this protein was accurately mapped by determining a partial amino acid sequence of the N-terminus of the larger subunit (F1) purified by radioimmunoprecipitation using monoclonal antibodies. Alignment of the N-terminus of the F1 subunit within the deduced amino acid sequence of Fo permitted us to identify a sequence of lys-lys-arg-lys-arg-arg at the C-terminus of the smaller N-terminal F2 subunit that appears to represent the cleavage/activation domain. Five potential sites of glycosylation, four within the F2 subunit, were also identified. Three extremely hydrophobic domains are present in the protein; a) the N-terminal signal sequence, b) the N-terminus of the F1 subunit that is analogous to the N-terminus of the paramyxovirus F1 subunit and the HA2 subunit of influenza virus hemagglutinin, and c) the putative membrane anchorage domain near the C-terminus of F1. Images PMID:2987829

  9. Identifying Human Genome-Wide CNV, LOH and UPD by Targeted Sequencing of Selected Regions.

    PubMed

    Wang, Yu; Li, Wei; Xia, Yingying; Wang, Chongzhi; Tang, Y Tom; Guo, Wenying; Li, Jinliang; Zhao, Xia; Sun, Yepeng; Hu, Juan; Zhen, Hefu; Zhang, Xiandong; Chen, Chao; Shi, Yujian; Li, Lin; Cao, Hongzhi; Du, Hongli; Li, Jian

    2014-01-01

    Copy-number variations (CNV), loss of heterozygosity (LOH), and uniparental disomy (UPD) are large genomic aberrations leading to many common inherited diseases, cancers, and other complex diseases. An integrated tool to identify these aberrations is essential in understanding diseases and in designing clinical interventions. Previous discovery methods based on whole-genome sequencing (WGS) require very high depth of coverage on the whole genome scale, and are cost-wise inefficient. Another approach, whole exome genome sequencing (WEGS), is limited to discovering variations within exons. Thus, we are lacking efficient methods to detect genomic aberrations on the whole genome scale using next-generation sequencing technology. Here we present a method to identify genome-wide CNV, LOH and UPD for the human genome via selectively sequencing a small portion of genome termed Selected Target Regions (SeTRs). In our experiments, the SeTRs are covered by 99.73%~99.95% with sufficient depth. Our developed bioinformatics pipeline calls genome-wide CNVs with high confidence, revealing 8 credible events of LOH and 3 UPD events larger than 5M from 15 individual samples. We demonstrate that genome-wide CNV, LOH and UPD can be detected using a cost-effective SeTRs sequencing approach, and that LOH and UPD can be identified using just a sample grouping technique, without using a matched sample or familial information.

  10. An Evolutionarily Young Polar Bear (Ursus maritimus) Endogenous Retrovirus Identified from Next Generation Sequence Data.

    PubMed

    Tsangaras, Kyriakos; Mayer, Jens; Alquezar-Planas, David E; Greenwood, Alex D

    2015-11-24

    Transcriptome analysis of polar bear (Ursus maritimus) tissues identified sequences with similarity to Porcine Endogenous Retroviruses (PERV). Based on these sequences, four proviral copies and 15 solo long terminal repeats (LTRs) of a newly described endogenous retrovirus were characterized from the polar bear draft genome sequence. Closely related sequences were identified by PCR analysis of brown bear (Ursus arctos) and black bear (Ursus americanus) but were absent in non-Ursinae bear species. The virus was therefore designated UrsusERV. Two distinct groups of LTRs were observed including a recombinant ERV that contained one LTR belonging to each group indicating that genomic invasions by at least two UrsusERV variants have recently occurred. Age estimates based on proviral LTR divergence and conservation of integration sites among ursids suggest the viral group is only a few million years old. The youngest provirus was polar bear specific, had intact open reading frames (ORFs) and could potentially encode functional proteins. Phylogenetic analyses of UrsusERV consensus protein sequences suggest that it is part of a pig, gibbon and koala retrovirus clade. The young age estimates and lineage specificity of the virus suggests UrsusERV is a recent cross species transmission from an unknown reservoir and places the viral group among the youngest of ERVs identified in mammals.

  11. Identifying Human Genome-Wide CNV, LOH and UPD by Targeted Sequencing of Selected Regions

    PubMed Central

    Guo, Wenying; Li, Jinliang; Zhao, Xia; Sun, Yepeng; Hu, Juan; Zhen, Hefu; Zhang, Xiandong; Chen, Chao; Shi, Yujian; Li, Lin; Cao, Hongzhi; Du, Hongli; Li, Jian

    2015-01-01

    Copy-number variations (CNV), loss of heterozygosity (LOH), and uniparental disomy (UPD) are large genomic aberrations leading to many common inherited diseases, cancers, and other complex diseases. An integrated tool to identify these aberrations is essential in understanding diseases and in designing clinical interventions. Previous discovery methods based on whole-genome sequencing (WGS) require very high depth of coverage on the whole genome scale, and are cost-wise inefficient. Another approach, whole exome genome sequencing (WEGS), is limited to discovering variations within exons. Thus, we are lacking efficient methods to detect genomic aberrations on the whole genome scale using next-generation sequencing technology. Here we present a method to identify genome-wide CNV, LOH and UPD for the human genome via selectively sequencing a small portion of genome termed Selected Target Regions (SeTRs). In our experiments, the SeTRs are covered by 99.73%~99.95% with sufficient depth. Our developed bioinformatics pipeline calls genome-wide CNVs with high confidence, revealing 8 credible events of LOH and 3 UPD events larger than 5M from 15 individual samples. We demonstrate that genome-wide CNV, LOH and UPD can be detected using a cost-effective SeTRs sequencing approach, and that LOH and UPD can be identified using just a sample grouping technique, without using a matched sample or familial information. PMID:25919136

  12. An Evolutionarily Young Polar Bear (Ursus maritimus) Endogenous Retrovirus Identified from Next Generation Sequence Data

    PubMed Central

    Tsangaras, Kyriakos; Mayer, Jens; Alquezar-Planas, David E.; Greenwood, Alex D.

    2015-01-01

    Transcriptome analysis of polar bear (Ursus maritimus) tissues identified sequences with similarity to Porcine Endogenous Retroviruses (PERV). Based on these sequences, four proviral copies and 15 solo long terminal repeats (LTRs) of a newly described endogenous retrovirus were characterized from the polar bear draft genome sequence. Closely related sequences were identified by PCR analysis of brown bear (Ursus arctos) and black bear (Ursus americanus) but were absent in non-Ursinae bear species. The virus was therefore designated UrsusERV. Two distinct groups of LTRs were observed including a recombinant ERV that contained one LTR belonging to each group indicating that genomic invasions by at least two UrsusERV variants have recently occurred. Age estimates based on proviral LTR divergence and conservation of integration sites among ursids suggest the viral group is only a few million years old. The youngest provirus was polar bear specific, had intact open reading frames (ORFs) and could potentially encode functional proteins. Phylogenetic analyses of UrsusERV consensus protein sequences suggest that it is part of a pig, gibbon and koala retrovirus clade. The young age estimates and lineage specificity of the virus suggests UrsusERV is a recent cross species transmission from an unknown reservoir and places the viral group among the youngest of ERVs identified in mammals. PMID:26610552

  13. Sequence analysis of expressed sequence tags from an ABA-treated cDNA library identifies stress response genes in the moss Physcomitrella patens.

    PubMed

    Machuka, J; Bashiardes, S; Ruben, E; Spooner, K; Cuming, A; Knight, C; Cove, D

    1999-04-01

    Partial cDNA sequencing was used to obtain 169 expressed sequence tags (ESTs) in the moss, Physcomitrella patens. The source of ESTs was a random cDNA library constructed from 7 day-old protonemata following treatment with 10(-4) M abscisic acid (ABA). Analysis of the ESTs identified 69% with homology to known sequences, 61% of which had significant homology to sequences of plant origin. More importantly, at least 11 ESTs had significant similarities to genes which are implicated in plant stress-responses, including responses which may involve ABA. These included a cDNA associated with desiccation tolerance, two heat shock protein genes, one cold acclimation protein cDNA and five others that may be involved in either oxidative or chemical stress or both, i.e., Zn/Cu-superoxide dismutase, NADPH protochlorophyllide oxidoreductase (PorB), selenium binding protein, glutathione peroxidase and glutathione S transferase. Analysis of codon usage between P. patens and seed plants indicated that although mosses and higher plants are to a large extent similar, minor variations also exists that may represent the distinctiveness of each group.

  14. The complementary deoxyribonucleic acid sequence of guinea pig endometrial prorelaxin.

    PubMed

    Lee, Y A; Bryant-Greenwood, G D; Mandel, M; Greenwood, F C

    1992-03-01

    The nucleotide sequence of the relaxin gene transcript in the endometrium of the late pregnant guinea pig has been determined. The strategy used was a combination of polymerase chain reaction (PCR) with primers designed from the mRNA sequence of porcine preprorelaxin, rapid amplification of cDNA ends-PCR, and blunt end cloning in M13 mp18. With heterologous primers, a 226-basepair (bp) segment of the guinea pig relaxin gene sequence was obtained and was used to design a guinea pig-specific primer for use with the rapid amplification of cDNA ends-PCR method. The latter allowed completion of the sequence of 336 bp, with a 96-bp overlap. The sequence obtained shows greater homology at both the nucleotide and amino acid levels with porcine and human relaxins H1 and H2 than with rat relaxin, supporting the thesis that the guinea pig is not a rodent. The transcription of the guinea pig endometrial relaxin gene during pregnancy was confirmed by Northern analysis of guinea pig endometrial tissues with a species-specific cDNA probe. The endometrial relaxin gene is transcribed during pregnancy, but not in lactation, consistent with the observed immunostaining for relaxin.

  15. Quantum-Sequencing: Biophysics of quantum tunneling through nucleic acids

    NASA Astrophysics Data System (ADS)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    Tunneling microscopy and spectroscopy has extensively been used in physical surface sciences to study quantum tunneling to measure electronic local density of states of nanomaterials and to characterize adsorbed species. Quantum-Sequencing (Q-Seq) is a new method based on tunneling microscopy for electronic sequencing of single molecule of nucleic acids. A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free single-molecule sequencing method. Here, we present the unique ``electronic fingerprints'' for all nucleotides on DNA and RNA using Q-Seq along their intrinsic biophysical parameters. We have analyzed tunneling spectra for the nucleotides at different pH conditions and analyzed the HOMO, LUMO and energy gap for all of them. In addition we show a number of biophysical parameters to further characterize all nucleobases (electron and hole transition voltage and energy barriers). These results highlight the robustness of Q-Seq as a technique for next-generation sequencing.

  16. In silico comparative analysis of DNA and amino acid sequences for prion protein gene.

    PubMed

    Kim, Y; Lee, J; Lee, C

    2008-01-01

    Genetic variability might contribute to species specificity of prion diseases in various organisms. In this study, structures of the prion protein gene (PRNP) and its amino acids were compared among species of which sequence data were available. Comparisons of PRNP DNA sequences among 12 species including human, chimpanzee, monkey, bovine, ovine, dog, mouse, rat, wallaby, opossum, chicken and zebrafish allowed us to identify candidate regulatory regions in intron 1 and 3'-untranslated region (UTR) in addition to the coding region. Highly conserved putative binding sites for transcription factors, such as heat shock factor 2 (HSF2) and myocite enhancer factor 2 (MEF2), were discovered in the intron 1. In 3'-UTR, the functional sequence (ATTAAA) for nucleus-specific polyadenylation was found in all the analysed species. The functional sequence (TTTTTAT) for maturation-specific polyadenylation was identically observed only in ovine, and one or two nucleotide mismatches in the other species. A comparison of the amino acid sequences in 53 species revealed a large sequence identity. Especially the octapeptide repeat region was observed in all the species but frog and zebrafish. Functional changes and susceptibility to prion diseases with various isoforms of prion protein could be caused by numeric variability and conformational changes discovered in the repeat sequences.

  17. The sensitivity of exome sequencing in identifying pathogenic mutations for LGMD in the United States

    PubMed Central

    Reddy, Hemakumar M.; Cho, Kyung-Ah; Lek, Monkol; Estrella, Elicia; Valkanas, Elise; Jones, Michael D.; Mitsuhashi, Satomi; Darras, Basil T.; Amato, Anthony A.; Lidov, Hart G.W.; Brownstein, Catherine A.; Margulies, David M.; Yu, Timothy W.; Salih, Mustafa A.; Kunkel, Louis M.; MacArthur, Daniel G.; Kang, Peter B.

    2016-01-01

    The current study characterizes a cohort of limb-girdle muscular dystrophy (LGMD) in the United States using whole exome sequencing. Fifty-five families affected by LGMD were recruited using an institutionally-approved protocol. Exome sequencing was performed on probands and selected parental samples. Pathogenic mutations and co-segregation patterns were confirmed by Sanger sequencing. Twenty-two families (40%) had novel and previously reported pathogenic mutations, primarily in LGMD genes, but also in genes for Duchenne muscular dystrophy, facioscapulohumeral muscular dystrophy, congenital myopathy, myofibrillar myopathy, inclusion body myopathy, and Pompe disease. One family was diagnosed via clinical testing. Dominant mutations were identified in COL6A1, COL6A3, FLNC, LMNA, RYR1, SMCHD1, and VCP, recessive mutations in ANO5, CAPN3, GAA, LAMA2, SGCA, and SGCG, and X-linked mutations in DMD. A previously reported variant in DMD was confirmed to be benign. Exome sequencing is a powerful diagnostic tool for LGMD. Despite careful phenotypic screening, pathogenic mutations were found in other muscle disease genes, largely accounting for the increased sensitivity of exome sequencing. Our experience suggests that broad sequencing panels are useful for these analyses due to the phenotypic overlap of many neuromuscular conditions. The confirmation of a benign DMD variant illustrates the potential of exome sequencing to help determine pathogenicity. PMID:27708273

  18. Amino acid sequences of heterotrophic and photosynthetic ferredoxins from the tomato plant (Lycopersicon esculentum Mill.).

    PubMed

    Kamide, K; Sakai, H; Aoki, K; Sanada, Y; Wada, K; Green, L S; Yee, B C; Buchanan, B B

    1995-11-01

    Several forms (isoproteins) of ferredoxin in roots, leaves, and green and red pericarps in tomato plants (Lycopersicon esculentum Mill.) were earlier identified on the basis of N-terminal amino acid sequence and chromatographic behavior (Green et al. 1991). In the present study, a large scale preparation made possible determination of the full length amino acid sequence of the two ferredoxins from leaves. The ferredoxins characteristic of fruit and root were sequenced from the amino terminus to the 30th residue or beyond. The leaf ferredoxins were confirmed to be expressed in pericarp of both green and red fruit. The ferredoxins characteristic of fruit and root appeared to be restricted to those tissue. The results extend earlier findings in demonstrating that ferredoxin occurs in the major organs of the tomato plant where it appears to function irrespective of photosynthetic competence.

  19. An Internet-Accessible DNA Sequence Database for Identifying Fusaria from Human and Animal Infections

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Because less than one-third of clinically relevant fusaria can be accurately identified to species level using phenotypic data (i.e., morphological species recognition), we constructed a three-locus DNA sequence database to facilitate molecular identification of the 69 Fusarium species associated wi...

  20. Identifying the Critical Time Period for Information Extraction when Recognizing Sequences of Play

    ERIC Educational Resources Information Center

    North, Jamie S.; Williams, A. Mark

    2008-01-01

    The authors attempted to determine the critical time period for information extraction when recognizing play sequences in soccer. Although efforts have been made to identify the perceptual information underpinning such decisions, no researchers have attempted to determine "when" this information may be extracted from the display. The authors…

  1. Tsukamurella pulmonis Bloodstream Infection Identified by secA1 Gene Sequencing

    PubMed Central

    Cano, María E.; García de la Fuente, Celia; Martínez-Martínez, Luis; López, Mónica; Fernández-Mazarrasa, Carlos

    2014-01-01

    Recurrent bloodstream infections caused by a Gram-positive bacterium affected an immunocompromised child. Tsukamurella pulmonis was the microorganism identified by secA1 gene sequencing. Antibiotic treatment in combination with removal of the subcutaneous port healed the patient. PMID:25520439

  2. A library screening approach identifies naturally occurring RNA sequences for a G-quadruplex binding ligand.

    PubMed

    Mirihana Arachchilage, Gayan; Morris, Mark J; Basu, Soumitra

    2014-02-07

    An RNA G-quadruplex library was synthesised and screened against kanamycin A as the ligand. Naturally occurring G-quadruplex forming sequences that differentially bind to kanamycin A were identified and characterized. This provides a simple and effective strategy for identification of potential intracellular G-quadruplex targets for a ligand.

  3. An integrated system for identifying the hidden assassins in traditional medicines containing aristolochic acids

    PubMed Central

    Wu, Lan; Sun, Wei; Wang, Bo; Zhao, Haiyu; Li, Yaoli; Cai, Shaoqing; Xiang, Li; Zhu, Yingjie; Yao, Hui; Song, Jingyuan; Cheng, Yung-Chi; Chen, Shilin

    2015-01-01

    Traditional herbal medicines adulterated and contaminated with plant materials from the Aristolochiaceae family, which contain aristolochic acids (AAs), cause aristolochic acid nephropathy. Approximately 256 traditional Chinese patent medicines, containing Aristolochiaceous materials, are still being sold in Chinese markets today. In order to protect consumers from health risks due to AAs, the hidden assassins, efficient methods to differentiate Aristolochiaceous herbs from their putative substitutes need to be established. In this study, 158 Aristolochiaceous samples representing 46 species and four genera as well as 131 non-Aristolochiaceous samples representing 33 species, 20 genera and 12 families were analyzed using DNA barcodes based on the ITS2 and psbA-trnH sequences. Aristolochiaceous materials and their non-Aristolochiaceous substitutes were successfully identified using BLAST1, the nearest distance method and the neighbor-joining (NJ) tree. In addition, based on sequence information of ITS2, we developed a Real-Time PCR assay which successfully identified herbal material from the Aristolochiaceae family. Using Ultra High Performance Liquid Chromatography-Mass Spectrometer (UHPLC-HR-MS), we demonstrated that most representatives from the Aristolochiaceae family contain toxic AAs. Therefore, integrated DNA barcodes, Real-Time PCR assays using TaqMan probes and UHPLC-HR-MS system provides an efficient and reliable authentication system to protect consumers from health risks due to the hidden assassins (AAs). PMID:26270958

  4. An integrated system for identifying the hidden assassins in traditional medicines containing aristolochic acids

    NASA Astrophysics Data System (ADS)

    Wu, Lan; Sun, Wei; Wang, Bo; Zhao, Haiyu; Li, Yaoli; Cai, Shaoqing; Xiang, Li; Zhu, Yingjie; Yao, Hui; Song, Jingyuan; Cheng, Yung-Chi; Chen, Shilin

    2015-08-01

    Traditional herbal medicines adulterated and contaminated with plant materials from the Aristolochiaceae family, which contain aristolochic acids (AAs), cause aristolochic acid nephropathy. Approximately 256 traditional Chinese patent medicines, containing Aristolochiaceous materials, are still being sold in Chinese markets today. In order to protect consumers from health risks due to AAs, the hidden assassins, efficient methods to differentiate Aristolochiaceous herbs from their putative substitutes need to be established. In this study, 158 Aristolochiaceous samples representing 46 species and four genera as well as 131 non-Aristolochiaceous samples representing 33 species, 20 genera and 12 families were analyzed using DNA barcodes based on the ITS2 and psbA-trnH sequences. Aristolochiaceous materials and their non-Aristolochiaceous substitutes were successfully identified using BLAST1, the nearest distance method and the neighbor-joining (NJ) tree. In addition, based on sequence information of ITS2, we developed a Real-Time PCR assay which successfully identified herbal material from the Aristolochiaceae family. Using Ultra High Performance Liquid Chromatography-Mass Spectrometer (UHPLC-HR-MS), we demonstrated that most representatives from the Aristolochiaceae family contain toxic AAs. Therefore, integrated DNA barcodes, Real-Time PCR assays using TaqMan probes and UHPLC-HR-MS system provides an efficient and reliable authentication system to protect consumers from health risks due to the hidden assassins (AAs).

  5. An integrated system for identifying the hidden assassins in traditional medicines containing aristolochic acids.

    PubMed

    Wu, Lan; Sun, Wei; Wang, Bo; Zhao, Haiyu; Li, Yaoli; Cai, Shaoqing; Xiang, Li; Zhu, Yingjie; Yao, Hui; Song, Jingyuan; Cheng, Yung-Chi; Chen, Shilin

    2015-08-13

    Traditional herbal medicines adulterated and contaminated with plant materials from the Aristolochiaceae family, which contain aristolochic acids (AAs), cause aristolochic acid nephropathy. Approximately 256 traditional Chinese patent medicines, containing Aristolochiaceous materials, are still being sold in Chinese markets today. In order to protect consumers from health risks due to AAs, the hidden assassins, efficient methods to differentiate Aristolochiaceous herbs from their putative substitutes need to be established. In this study, 158 Aristolochiaceous samples representing 46 species and four genera as well as 131 non-Aristolochiaceous samples representing 33 species, 20 genera and 12 families were analyzed using DNA barcodes based on the ITS2 and psbA-trnH sequences. Aristolochiaceous materials and their non-Aristolochiaceous substitutes were successfully identified using BLAST1, the nearest distance method and the neighbor-joining (NJ) tree. In addition, based on sequence information of ITS2, we developed a Real-Time PCR assay which successfully identified herbal material from the Aristolochiaceae family. Using Ultra High Performance Liquid Chromatography-Mass Spectrometer (UHPLC-HR-MS), we demonstrated that most representatives from the Aristolochiaceae family contain toxic AAs. Therefore, integrated DNA barcodes, Real-Time PCR assays using TaqMan probes and UHPLC-HR-MS system provides an efficient and reliable authentication system to protect consumers from health risks due to the hidden assassins (AAs).

  6. Role of the two-component leader sequence and mature amino acid sequences in extracellular export of endoglucanase EGL from Pseudomonas solanacearum.

    PubMed Central

    Huang, J Z; Schell, M A

    1992-01-01

    The egl gene of Pseudomonas solanacearum encodes a 43-kDa extracellular endoglucanase (mEGL) involved in wilt disease caused by this phytopathogen. Egl is initially translated with a 45-residue, two-part leader sequence. The first 19 residues are apparently removed by signal peptidase II during export of Egl across the inner membrane (IM); the remaining residues of the leader sequence (modified with palmitate) are removed during export across the outer membrane (OM). Localization of Egl-PhoA fusion proteins showed that the first 26 residues of the Egl leader sequence are required and sufficient to direct lipid modification, processing, and export of Egl or PhoA across the IM but not the OM. Fusions of the complete 45-residue leader sequence or of the leader and increasing portions of mEgl sequences to PhoA did not cause its export across the OM. In-frame deletion of portions of mEGL-coding sequences blocked export of the truncated polypeptides across the OM without affecting export across the IM. These results indicate that the first part of the leader sequence functions independently to direct export of Egl across the IM while the second part and sequences and structures in mEGL are involved in export across the OM. Computer analysis of the mEgl amino acid sequence obtained from its nucleotide sequence identified a region of mEGL similar in amino acid sequence to regions in other prokaryotic endoglucanases. Images PMID:1735723

  7. Identifying the ligated amino acid of archaeal tRNAs based on positions outside the anticodon

    PubMed Central

    Galili, Tal; Gingold, Hila; Shaul, Shaul; Benjamini, Yoav

    2016-01-01

    Proper recognition of tRNAs by their aminoacyl-tRNA synthetase is essential for translation accuracy. Following evidence that the enzymes can recognize the correct tRNA even when anticodon information is masked, we search for additional nucleotide positions within the tRNA molecule that potentially contain information for amino acid identification. Analyzing 3936 sequences of tRNA genes from 86 archaeal species, we show that the tRNAs’ cognate amino acids can be identified by the information embedded in the tRNAs’ nucleotide positions without relying on the anticodon information. We present a small set of six to 10 informative positions along the tRNA, which allow for amino acid identification accuracy of 90.6% to 97.4%, respectively. We inspected tRNAs for each of the 20 amino acid types for such informative positions and found that tRNA genes for some amino acids are distinguishable from others by as few as one or two positions. The informative nucleotide positions are in agreement with nucleotide positions that were experimentally shown to affect the loaded amino acid identity. Interestingly, the knowledge gained from the tRNA genes of one archaeal phylum does not extrapolate well to another phylum. Furthermore, each species has a unique ensemble of nucleotides in the informative tRNA positions, and the similarity between the sets of positions of two distinct species reflects their evolutionary distance. Hence, we term this set of informative positions a “tRNA cipher.” It is tempting to suggest that the diverging code identified here might also serve the aminoacyl tRNA synthetase in the task of tRNA recognition. PMID:27516383

  8. Molecular cloning and amino acid sequence of human 5-lipoxygenase

    SciTech Connect

    Matsumoto, T.; Funk, C.D.; Radmark, O.; Hoeoeg, J.O.; Joernvall, H.; Samuelsson, B.

    1988-01-01

    5-Lipoxygenase (EC 1.13.11.34), a Ca/sup 2 +/- and ATP-requiring enzyme, catalyzes the first two steps in the biosynthesis of the peptidoleukotrienes and the chemotactic factor leukotriene B/sub 4/. A cDNA clone corresponding to 5-lipoxygenase was isolated from a human lung lambda gt11 expression library by immunoscreening with a polyclonal antibody. Additional clones from a human placenta lambda gt11 cDNA library were obtained by plaque hybridization with the /sup 32/P-labeled lung cDNA clone. Sequence data obtained from several overlapping clones indicate that the composite DNAs contain the complete coding region for the enzyme. From the deduced primary structure, 5-lipoxygenase encodes a 673 amino acid protein with a calculated molecular weight of 77,839. Direct analysis of the native protein and its proteolytic fragments confirmed the deduced composition, the amino-terminal amino acid sequence, and the structure of many internal segments. 5-Lipoxygenase has no apparent sequence homology with leukotriene A/sub 4/ hydrolase or Ca/sup 2 +/-binding proteins. RNA blot analysis indicated substantial amounts of an mRNA species of approx. = 2700 nucleotides in leukocytes, lung, and placenta.

  9. Nucleic acid sequence detection using multiplexed oligonucleotide PCR

    DOEpatents

    Nolan, John P.; White, P. Scott

    2006-12-26

    Methods for rapidly detecting single or multiple sequence alleles in a sample nucleic acid are described. Provided are all of the oligonucleotide pairs capable of annealing specifically to a target allele and discriminating among possible sequences thereof, and ligating to each other to form an oligonucleotide complex when a particular sequence feature is present (or, alternatively, absent) in the sample nucleic acid. The design of each oligonucleotide pair permits the subsequent high-level PCR amplification of a specific amplicon when the oligonucleotide complex is formed, but not when the oligonucleotide complex is not formed. The presence or absence of the specific amplicon is used to detect the allele. Detection of the specific amplicon may be achieved using a variety of methods well known in the art, including without limitation, oligonucleotide capture onto DNA chips or microarrays, oligonucleotide capture onto beads or microspheres, electrophoresis, and mass spectrometry. Various labels and address-capture tags may be employed in the amplicon detection step of multiplexed assays, as further described herein.

  10. The amino acid sequence of chymopapain from Carica papaya.

    PubMed Central

    Watson, D C; Yaguchi, M; Lynn, K R

    1990-01-01

    Chymopapain is a polypeptide of 218 amino acid residues. It has considerable structural similarity with papain and papaya proteinase omega, including conservation of the catalytic site and of the disulphide bonding. Chymopapain is like papaya proteinase omega in carrying four extra residues between papain positions 168 and 169, but differs from both papaya proteinases in the composition of its S2 subsite, as well as in having a second thiol group, Cys-117. Some evidence for the amino acid sequence of chymopapain has been deposited as Supplementary Publication SUP 50153 (12 pages) at the British Library Document Supply Centre, Boston Spa., Wetherby, West Yorkshire LS23 7BQ, U.K., from whom copies may be obtained on the terms indicated in Biochem. J. (1990) 265, 5. The information comprises Supplement Tables 1-4, which contain, in order, amino acid compositions of peptides from tryptic, peptic, CNBr and mild acid cleavages, Supplement Fig. 1, showing re-fractionation of selected peaks from Fig. 2 of the main paper. Supplement Fig. 2, showing cation-exchange chromatography of the earliest-eluted peak of Fig. 3 of the main paper, Supplement Fig. 3, showing reverse-phase h.p.l.c. of the later-eluted peak from Fig. 3 of the main paper, and Supplement Fig. 4, showing the separation of peptides after mild acid hydrolysis of CNBr-cleavage fragment CB3. PMID:2106878

  11. Accuracy of the high-throughput amplicon sequencing to identify species within the genus Aspergillus.

    PubMed

    Lee, Seungeun; Yamamoto, Naomichi

    2015-12-01

    This study characterized the accuracy of high-throughput amplicon sequencing to identify species within the genus Aspergillus. To this end, we sequenced the internal transcribed spacer 1 (ITS1), β-tubulin (BenA), and calmodulin (CaM) gene encoding sequences as DNA markers from eight reference Aspergillus strains with known identities using 300-bp sequencing on the Illumina MiSeq platform, and compared them with the BLASTn outputs. The identifications with the sequences longer than 250 bp were accurate at the section rank, with some ambiguities observed at the species rank due to mostly cross detection of sibling species. Additionally, in silico analysis was performed to predict the identification accuracy for all species in the genus Aspergillus, where 107, 210, and 187 species were predicted to be identifiable down to the species rank based on ITS1, BenA, and CaM, respectively. Finally, air filter samples were analysed to quantify the relative abundances of Aspergillus species in outdoor air. The results were reproducible across biological duplicates both at the species and section ranks, but not strongly correlated between ITS1 and BenA, suggesting the Aspergillus detection can be taxonomically biased depending on the selection of the DNA markers and/or primers.

  12. Targeted sequencing approach to identify genetic mutations in Nasu-Hakola disease

    PubMed Central

    Satoh, Jun-ichi; Yanaizu, Motoaki; Tosaki, Youhei; Sakai, Kenji; Kino, Yoshihiro

    2016-01-01

    Summary Nasu-Hakola disease (NHD) is a rare autosomal recessive disorder characterized by sclerosing leukoencephalopathy and multifocal bone cysts, caused by a loss-of-function mutation of either TYROBP (DAP12) or TREM2. TREM2 and DAP12 constitute a receptor/adaptor signaling complex expressed exclusively on osteoclasts, dendritic cells, macrophages, and microglia. Premortem molecular diagnosis of NHD requires genetic analysis of both TYROBP and TREM2, in which 20 distinct NHD-causing mutations have been reported. Due to genetic heterogeneity, it is often difficult to identify the exact mutation responsible for NHD. Recently, the revolution of the next-generation sequencing (NGS) technology has greatly advanced the field of genome research. A targeted sequencing approach allows us to investigate a selected set of disease-causing genes and mutations in a number of samples within several days. By targeted sequencing using the TruSight One Sequencing Panel, we resequenced genetic mutations of seven NHD cases with known molecular diagnosis and two control subjects. We identified homozygous variants of TYROBP or TREM2 in all NHD cases, composed of a frameshift mutation of c.141delG in exon 3 of TYROBP in four cases, a missense mutation of c.2T>C in exon 1 of TYROBP in two cases, or a splicing mutation of c.482+2T>C in intron 3 of TREM2 in one case. The results of targeted resequencing corresponded to those of Sanger sequencing. In contrast, causative variants were not detected in control subjects. These results indicate that targeted sequencing is a useful approach to precisely identify genetic mutations responsible for NHD in a comprehensive manner. PMID:27904822

  13. The amino acid sequence of rabbit cardiac troponin I.

    PubMed Central

    Grand, R J; Wilkinson, J M

    1976-01-01

    The complete amino acid sequence of troponin I from rabbit cardiac muscle was determined by the isolation of four unique CNBr fragments, together with overlapping tryptic peptides containing radioactive methionine residues. Overlap data for residues 35-36, 93-94 and 140-145 are incomplete, the sequence at these positions being based on homology with the sequence of the fast-skeletal-muscle protein. Cardiac troponin I is a single polypeptide chain of 206 residues with mol.wt. 23550 and an extinction coefficient, E 1%,1cm/280, of 4.37. The protein has a net positive charge of 14 and is thus somewhat more basic than troponin I from fast-skeletal muscle. Comparison of the sequences of troponin I from cardiac and fast skeletal muscle show that the cardiac protein has 26 extra residues at the N-terminus which account for the larger size of the protein. In the remainder of sequence there is a considerable degree of homology, this being greater in the C-terminal two-thirds of the molecule. The region in the cardiac protein corresponding to the peptide with inhibitory activity from the fast-skeletal-muscle protein is very similar and it seems unlikely that this is the cause of the difference in inhibitory activity between the two proteins. The region responsible for binding troponin C, however, possesses a lower degree of homology. Detailed evidence on which the sequence is based has been deposited as Supplementary Publication SUP 50072 (20 pages), at the British Library Lending Division, Boston Spa, Wetherby, West Yorkshire LS23 7QB, U.K., from whom copies may be obtained on the terms given in Biochem. J. (1976) 153, 5. PMID:1008822

  14. Whole-Exome Sequencing Identifies Rare and Low-Frequency Coding Variants Associated with LDL Cholesterol

    PubMed Central

    Lange, Leslie A.; Hu, Youna; Zhang, He; Xue, Chenyi; Schmidt, Ellen M.; Tang, Zheng-Zheng; Bizon, Chris; Lange, Ethan M.; Smith, Joshua D.; Turner, Emily H.; Jun, Goo; Kang, Hyun Min; Peloso, Gina; Auer, Paul; Li, Kuo-ping; Flannick, Jason; Zhang, Ji; Fuchsberger, Christian; Gaulton, Kyle; Lindgren, Cecilia; Locke, Adam; Manning, Alisa; Sim, Xueling; Rivas, Manuel A.; Holmen, Oddgeir L.; Gottesman, Omri; Lu, Yingchang; Ruderfer, Douglas; Stahl, Eli A.; Duan, Qing; Li, Yun; Durda, Peter; Jiao, Shuo; Isaacs, Aaron; Hofman, Albert; Bis, Joshua C.; Correa, Adolfo; Griswold, Michael E.; Jakobsdottir, Johanna; Smith, Albert V.; Schreiner, Pamela J.; Feitosa, Mary F.; Zhang, Qunyuan; Huffman, Jennifer E.; Crosby, Jacy; Wassel, Christina L.; Do, Ron; Franceschini, Nora; Martin, Lisa W.; Robinson, Jennifer G.; Assimes, Themistocles L.; Crosslin, David R.; Rosenthal, Elisabeth A.; Tsai, Michael; Rieder, Mark J.; Farlow, Deborah N.; Folsom, Aaron R.; Lumley, Thomas; Fox, Ervin R.; Carlson, Christopher S.; Peters, Ulrike; Jackson, Rebecca D.; van Duijn, Cornelia M.; Uitterlinden, André G.; Levy, Daniel; Rotter, Jerome I.; Taylor, Herman A.; Gudnason, Vilmundur; Siscovick, David S.; Fornage, Myriam; Borecki, Ingrid B.; Hayward, Caroline; Rudan, Igor; Chen, Y. Eugene; Bottinger, Erwin P.; Loos, Ruth J.F.; Sætrom, Pål; Hveem, Kristian; Boehnke, Michael; Groop, Leif; McCarthy, Mark; Meitinger, Thomas; Ballantyne, Christie M.; Gabriel, Stacey B.; O’Donnell, Christopher J.; Post, Wendy S.; North, Kari E.; Reiner, Alexander P.; Boerwinkle, Eric; Psaty, Bruce M.; Altshuler, David; Kathiresan, Sekar; Lin, Dan-Yu; Jarvik, Gail P.; Cupples, L. Adrienne; Kooperberg, Charles; Wilson, James G.; Nickerson, Deborah A.; Abecasis, Goncalo R.; Rich, Stephen S.; Tracy, Russell P.; Willer, Cristen J.; Gabriel, Stacey B.; Altshuler, David M.; Abecasis, Gonçalo R.; Allayee, Hooman; Cresci, Sharon; Daly, Mark J.; de Bakker, Paul I.W.; DePristo, Mark A.; Do, Ron; Donnelly, Peter; Farlow, Deborah N.; Fennell, Tim; Garimella, Kiran; Hazen, Stanley L.; Hu, Youna; Jordan, Daniel M.; Jun, Goo; Kathiresan, Sekar; Kang, Hyun Min; Kiezun, Adam; Lettre, Guillaume; Li, Bingshan; Li, Mingyao; Newton-Cheh, Christopher H.; Padmanabhan, Sandosh; Peloso, Gina; Pulit, Sara; Rader, Daniel J.; Reich, David; Reilly, Muredach P.; Rivas, Manuel A.; Schwartz, Steve; Scott, Laura; Siscovick, David S.; Spertus, John A.; Stitziel, Nathaniel O.; Stoletzki, Nina; Sunyaev, Shamil R.; Voight, Benjamin F.; Willer, Cristen J.; Rich, Stephen S.; Akylbekova, Ermeg; Atwood, Larry D.; Ballantyne, Christie M.; Barbalic, Maja; Barr, R. Graham; Benjamin, Emelia J.; Bis, Joshua; Boerwinkle, Eric; Bowden, Donald W.; Brody, Jennifer; Budoff, Matthew; Burke, Greg; Buxbaum, Sarah; Carr, Jeff; Chen, Donna T.; Chen, Ida Y.; Chen, Wei-Min; Concannon, Pat; Crosby, Jacy; Cupples, L. Adrienne; D’Agostino, Ralph; DeStefano, Anita L.; Dreisbach, Albert; Dupuis, Josée; Durda, J. Peter; Ellis, Jaclyn; Folsom, Aaron R.; Fornage, Myriam; Fox, Caroline S.; Fox, Ervin; Funari, Vincent; Ganesh, Santhi K.; Gardin, Julius; Goff, David; Gordon, Ora; Grody, Wayne; Gross, Myron; Guo, Xiuqing; Hall, Ira M.; Heard-Costa, Nancy L.; Heckbert, Susan R.; Heintz, Nicholas; Herrington, David M.; Hickson, DeMarc; Huang, Jie; Hwang, Shih-Jen; Jacobs, David R.; Jenny, Nancy S.; Johnson, Andrew D.; Johnson, Craig W.; Kawut, Steven; Kronmal, Richard; Kurz, Raluca; Lange, Ethan M.; Lange, Leslie A.; Larson, Martin G.; Lawson, Mark; Lewis, Cora E.; Levy, Daniel; Li, Dalin; Lin, Honghuang; Liu, Chunyu; Liu, Jiankang; Liu, Kiang; Liu, Xiaoming; Liu, Yongmei; Longstreth, William T.; Loria, Cay; Lumley, Thomas; Lunetta, Kathryn; Mackey, Aaron J.; Mackey, Rachel; Manichaikul, Ani; Maxwell, Taylor; McKnight, Barbara; Meigs, James B.; Morrison, Alanna C.; Musani, Solomon K.; Mychaleckyj, Josyf C.; Nettleton, Jennifer A.; North, Kari; O’Donnell, Christopher J.; O’Leary, Daniel; Ong, Frank; Palmas, Walter; Pankow, James S.; Pankratz, Nathan D.; Paul, Shom; Perez, Marco; Person, Sharina D.; Polak, Joseph; Post, Wendy S.; Psaty, Bruce M.; Quinlan, Aaron R.; Raffel, Leslie J.; Ramachandran, Vasan S.; Reiner, Alexander P.; Rice, Kenneth; Rotter, Jerome I.; Sanders, Jill P.; Schreiner, Pamela; Seshadri, Sudha; Shea, Steve; Sidney, Stephen; Silverstein, Kevin; Smith, Nicholas L.; Sotoodehnia, Nona; Srinivasan, Asoke; Taylor, Herman A.; Taylor, Kent; Thomas, Fridtjof; Tracy, Russell P.; Tsai, Michael Y.; Volcik, Kelly A.; Wassel, Chrstina L.; Watson, Karol; Wei, Gina; White, Wendy; Wiggins, Kerri L.; Wilk, Jemma B.; Williams, O. Dale; Wilson, Gregory; Wilson, James G.; Wolf, Phillip; Zakai, Neil A.; Hardy, John; Meschia, James F.; Nalls, Michael; Singleton, Andrew; Worrall, Brad; Bamshad, Michael J.; Barnes, Kathleen C.; Abdulhamid, Ibrahim; Accurso, Frank; Anbar, Ran; Beaty, Terri; Bigham, Abigail; Black, Phillip; Bleecker, Eugene; Buckingham, Kati; Cairns, Anne Marie; Caplan, Daniel; Chatfield, Barbara; Chidekel, Aaron; Cho, Michael; Christiani, David C.; Crapo, James D.; Crouch, Julia; Daley, Denise; Dang, Anthony; Dang, Hong; De Paula, Alicia; DeCelie-Germana, Joan; Drumm, Allen DozorMitch; Dyson, Maynard; Emerson, Julia; Emond, Mary J.; Ferkol, Thomas; Fink, Robert; Foster, Cassandra; Froh, Deborah; Gao, Li; Gershan, William; Gibson, Ronald L.; Godwin, Elizabeth; Gondor, Magdalen; Gutierrez, Hector; Hansel, Nadia N.; Hassoun, Paul M.; Hiatt, Peter; Hokanson, John E.; Howenstine, Michelle; Hummer, Laura K.; Kanga, Jamshed; Kim, Yoonhee; Knowles, Michael R.; Konstan, Michael; Lahiri, Thomas; Laird, Nan; Lange, Christoph; Lin, Lin; Lin, Xihong; Louie, Tin L.; Lynch, David; Make, Barry; Martin, Thomas R.; Mathai, Steve C.; Mathias, Rasika A.; McNamara, John; McNamara, Sharon; Meyers, Deborah; Millard, Susan; Mogayzel, Peter; Moss, Richard; Murray, Tanda; Nielson, Dennis; Noyes, Blakeslee; O’Neal, Wanda; Orenstein, David; O’Sullivan, Brian; Pace, Rhonda; Pare, Peter; Parker, H. Worth; Passero, Mary Ann; Perkett, Elizabeth; Prestridge, Adrienne; Rafaels, Nicholas M.; Ramsey, Bonnie; Regan, Elizabeth; Ren, Clement; Retsch-Bogart, George; Rock, Michael; Rosen, Antony; Rosenfeld, Margaret; Ruczinski, Ingo; Sanford, Andrew; Schaeffer, David; Sell, Cindy; Sheehan, Daniel; Silverman, Edwin K.; Sin, Don; Spencer, Terry; Stonebraker, Jackie; Tabor, Holly K.; Varlotta, Laurie; Vergara, Candelaria I.; Weiss, Robert; Wigley, Fred; Wise, Robert A.; Wright, Fred A.; Wurfel, Mark M.; Zanni, Robert; Zou, Fei; Nickerson, Deborah A.; Rieder, Mark J.; Green, Phil; Shendure, Jay; Akey, Joshua M.; Bustamante, Carlos D.; Crosslin, David R.; Eichler, Evan E.; Fox, P. Keolu; Fu, Wenqing; Gordon, Adam; Gravel, Simon; Jarvik, Gail P.; Johnsen, Jill M.; Kan, Mengyuan; Kenny, Eimear E.; Kidd, Jeffrey M.; Lara-Garduno, Fremiet; Leal, Suzanne M.; Liu, Dajiang J.; McGee, Sean; O’Connor, Timothy D.; Paeper, Bryan; Robertson, Peggy D.; Smith, Joshua D.; Staples, Jeffrey C.; Tennessen, Jacob A.; Turner, Emily H.; Wang, Gao; Yi, Qian; Jackson, Rebecca; Peters, Ulrike; Carlson, Christopher S.; Anderson, Garnet; Anton-Culver, Hoda; Assimes, Themistocles L.; Auer, Paul L.; Beresford, Shirley; Bizon, Chris; Black, Henry; Brunner, Robert; Brzyski, Robert; Burwen, Dale; Caan, Bette; Carty, Cara L.; Chlebowski, Rowan; Cummings, Steven; Curb, J. David; Eaton, Charles B.; Ford, Leslie; Franceschini, Nora; Fullerton, Stephanie M.; Gass, Margery; Geller, Nancy; Heiss, Gerardo; Howard, Barbara V.; Hsu, Li; Hutter, Carolyn M.; Ioannidis, John; Jiao, Shuo; Johnson, Karen C.; Kooperberg, Charles; Kuller, Lewis; LaCroix, Andrea; Lakshminarayan, Kamakshi; Lane, Dorothy; Lasser, Norman; LeBlanc, Erin; Li, Kuo-Ping; Limacher, Marian; Lin, Dan-Yu; Logsdon, Benjamin A.; Ludlam, Shari; Manson, JoAnn E.; Margolis, Karen; Martin, Lisa; McGowan, Joan; Monda, Keri L.; Kotchen, Jane Morley; Nathan, Lauren; Ockene, Judith; O’Sullivan, Mary Jo; Phillips, Lawrence S.; Prentice, Ross L.; Robbins, John; Robinson, Jennifer G.; Rossouw, Jacques E.; Sangi-Haghpeykar, Haleh; Sarto, Gloria E.; Shumaker, Sally; Simon, Michael S.; Stefanick, Marcia L.; Stein, Evan; Tang, Hua; Taylor, Kira C.; Thomson, Cynthia A.; Thornton, Timothy A.; Van Horn, Linda; Vitolins, Mara; Wactawski-Wende, Jean; Wallace, Robert; Wassertheil-Smoller, Sylvia; Zeng, Donglin; Applebaum-Bowden, Deborah; Feolo, Michael; Gan, Weiniu; Paltoo, Dina N.; Sholinsky, Phyliss; Sturcke, Anne

    2014-01-01

    Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98th or <2nd percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments. PMID:24507775

  15. Identifying natural substrates for chaperonins using a sequence-based approach

    PubMed Central

    Stan, George; Brooks, Bernard R.; Lorimer, George H.; Thirumalai, D.

    2005-01-01

    The Escherichia coli chaperonin machinery, GroEL, assists the folding of a number of proteins. We describe a sequence-based approach to identify the natural substrate proteins (SPs) for GroEL. Our method is based on the hypothesis that natural SPs are those that contain patterns of residues similar to those found in either GroES mobile loop and/or strongly binding peptide in complex with GroEL. The method is validated by comparing the predicted results with experimentally determined natural SPs for GroEL. We have searched for such patterns in five genomes. In the E. coli genome, we identify 1422 (about one-third) sequences that are putative natural SPs. In Saccharomyces cerevisiae, 2885 (32%) of sequences can be natural substrates for Hsp60, which is the analog of GroEL. The precise number of natural SPs is shown to be a function of the number of contacts an SP makes with the apical domain (NC) and the number of binding sites (NB) in the oligomer with which it interacts. For known SPs for GroEL, we find ~4 < NC < 5 and 2 ≤ NB ≤ 4. A limited analysis of the predicted binding sequences shows that they do not adopt any preferred secondary structure. Our method also predicts the putative binding regions in the identified SPs. The results of our study show that a variety of SPs, associated with diverse functions, can interact with GroEL. PMID:15576562

  16. Whole-exome sequencing identifies rare and low-frequency coding variants associated with LDL cholesterol.

    PubMed

    Lange, Leslie A; Hu, Youna; Zhang, He; Xue, Chenyi; Schmidt, Ellen M; Tang, Zheng-Zheng; Bizon, Chris; Lange, Ethan M; Smith, Joshua D; Turner, Emily H; Jun, Goo; Kang, Hyun Min; Peloso, Gina; Auer, Paul; Li, Kuo-Ping; Flannick, Jason; Zhang, Ji; Fuchsberger, Christian; Gaulton, Kyle; Lindgren, Cecilia; Locke, Adam; Manning, Alisa; Sim, Xueling; Rivas, Manuel A; Holmen, Oddgeir L; Gottesman, Omri; Lu, Yingchang; Ruderfer, Douglas; Stahl, Eli A; Duan, Qing; Li, Yun; Durda, Peter; Jiao, Shuo; Isaacs, Aaron; Hofman, Albert; Bis, Joshua C; Correa, Adolfo; Griswold, Michael E; Jakobsdottir, Johanna; Smith, Albert V; Schreiner, Pamela J; Feitosa, Mary F; Zhang, Qunyuan; Huffman, Jennifer E; Crosby, Jacy; Wassel, Christina L; Do, Ron; Franceschini, Nora; Martin, Lisa W; Robinson, Jennifer G; Assimes, Themistocles L; Crosslin, David R; Rosenthal, Elisabeth A; Tsai, Michael; Rieder, Mark J; Farlow, Deborah N; Folsom, Aaron R; Lumley, Thomas; Fox, Ervin R; Carlson, Christopher S; Peters, Ulrike; Jackson, Rebecca D; van Duijn, Cornelia M; Uitterlinden, André G; Levy, Daniel; Rotter, Jerome I; Taylor, Herman A; Gudnason, Vilmundur; Siscovick, David S; Fornage, Myriam; Borecki, Ingrid B; Hayward, Caroline; Rudan, Igor; Chen, Y Eugene; Bottinger, Erwin P; Loos, Ruth J F; Sætrom, Pål; Hveem, Kristian; Boehnke, Michael; Groop, Leif; McCarthy, Mark; Meitinger, Thomas; Ballantyne, Christie M; Gabriel, Stacey B; O'Donnell, Christopher J; Post, Wendy S; North, Kari E; Reiner, Alexander P; Boerwinkle, Eric; Psaty, Bruce M; Altshuler, David; Kathiresan, Sekar; Lin, Dan-Yu; Jarvik, Gail P; Cupples, L Adrienne; Kooperberg, Charles; Wilson, James G; Nickerson, Deborah A; Abecasis, Goncalo R; Rich, Stephen S; Tracy, Russell P; Willer, Cristen J

    2014-02-06

    Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98(th) or <2(nd) percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments.

  17. Amino acid sequence of a mouse immunoglobulin mu chain.

    PubMed Central

    Kehry, M; Sibley, C; Fuhrman, J; Schilling, J; Hood, L E

    1979-01-01

    The complete amino acid sequence of the mouse mu chain from the BALB/c myeloma tumor MOPC 104E is reported. The C mu region contains four consecutive homology regions of approximately 110 residues and a COOH-terminal region of 19 residues. A comparison of this mu chain from mouse with a complete mu sequence from human (Ou) and a partial mu chain sequence from dog (Moo) reveals a striking gradient of increasing homology from the NH2-terminal to the COOH-terminal portion of these mu chains, with the former being the least and the latter the most highly conserved. Four of the five sites of carbohydrate attachment appear to be at identical residue positions when the constant regions of the mouse and human mu chains are compared. The mu chain of MOPC 104E has a carbohydrate moiety attached in the second hypervariable region. This is particularly interesting in view of the fact that MOPC 104E binds alpha-(1 leads to 3)-dextran, a simple carbohydrate. The structural and functional constraints imposed by these comparative sequence analyses are discussed. PMID:111247

  18. NullSeq: A Tool for Generating Random Coding Sequences with Desired Amino Acid and GC Contents

    PubMed Central

    Liu, Sophia S.; Hockenberry, Adam J.; Lancichinetti, Andrea; Jewett, Michael C.

    2016-01-01

    The existence of over- and under-represented sequence motifs in genomes provides evidence of selective evolutionary pressures on biological mechanisms such as transcription, translation, ligand-substrate binding, and host immunity. In order to accurately identify motifs and other genome-scale patterns of interest, it is essential to be able to generate accurate null models that are appropriate for the sequences under study. While many tools have been developed to create random nucleotide sequences, protein coding sequences are subject to a unique set of constraints that complicates the process of generating appropriate null models. There are currently no tools available that allow users to create random coding sequences with specified amino acid composition and GC content for the purpose of hypothesis testing. Using the principle of maximum entropy, we developed a method that generates unbiased random sequences with pre-specified amino acid and GC content, which we have developed into a python package. Our method is the simplest way to obtain maximally unbiased random sequences that are subject to GC usage and primary amino acid sequence constraints. Furthermore, this approach can easily be expanded to create unbiased random sequences that incorporate more complicated constraints such as individual nucleotide usage or even di-nucleotide frequencies. The ability to generate correctly specified null models will allow researchers to accurately identify sequence motifs which will lead to a better understanding of biological processes as well as more effective engineering of biological systems. PMID:27835644

  19. Ultrasensitive nucleic acid sequence detection by single-molecule electrophoresis

    SciTech Connect

    Castro, A; Shera, E.B.

    1996-09-01

    This is the final report of a one-year laboratory-directed research and development project at Los Alamos National Laboratory. There has been considerable interest in the development of very sensitive clinical diagnostic techniques over the last few years. Many pathogenic agents are often present in extremely small concentrations in clinical samples, especially at the initial stages of infection, making their detection very difficult. This project sought to develop a new technique for the detection and accurate quantification of specific bacterial and viral nucleic acid sequences in clinical samples. The scheme involved the use of novel hybridization probes for the detection of nucleic acids combined with our recently developed technique of single-molecule electrophoresis. This project is directly relevant to the DOE`s Defense Programs strategic directions in the area of biological warfare counter-proliferation.

  20. BLAT2DOLite: An Online System for Identifying Significant Relationships between Genetic Sequences and Diseases

    PubMed Central

    Cheng, Liang; Zhang, Shuo; Hu, Yang

    2016-01-01

    The significantly related diseases of sequences could play an important role in understanding the functions of these sequences. In this paper, we introduced BLAT2DOLite, an online system for annotating human genes and diseases and identifying the significant relationships between sequences and diseases. Currently, BLAT2DOLite integrates Entrez Gene database and Disease Ontology Lite (DOLite), which contain loci of gene and relationships between genes and diseases. It utilizes hypergeometric test to calculate P-values between genes and diseases of DOLite. The system can be accessed from: http://123.59.132.21:8080/BLAT2DOLite. The corresponding web service is described in: http://123.59.132.21:8080/BLAT2DOLite/BLAT2DOLiteIDMappingPort?wsdl. PMID:27315278

  1. Population sequencing of two endocannabinoid metabolic genes identifies rare and common regulatory variants associated with extreme obesity and metabolite level

    PubMed Central

    2010-01-01

    Background Targeted re-sequencing of candidate genes in individuals at the extremes of a quantitative phenotype distribution is a method of choice to gain information on the contribution of rare variants to disease susceptibility. The endocannabinoid system mediates signaling in the brain and peripheral tissues involved in the regulation of energy balance, is highly active in obese patients, and represents a strong candidate pathway to examine for genetic association with body mass index (BMI). Results We sequenced two intervals (covering 188 kb) encoding the endocannabinoid metabolic enzymes fatty-acid amide hydrolase (FAAH) and monoglyceride lipase (MGLL) in 147 normal controls and 142 extremely obese cases. After applying quality filters, we called 1,393 high quality single nucleotide variants, 55% of which are rare, and 143 indels. Using single marker tests and collapsed marker tests, we identified four intervals associated with BMI: the FAAH promoter, the MGLL promoter, MGLL intron 2, and MGLL intron 3. Two of these intervals are composed of rare variants and the majority of the associated variants are located in promoter sequences or in predicted transcriptional enhancers, suggesting a regulatory role. The set of rare variants in the FAAH promoter associated with BMI is also associated with increased level of FAAH substrate anandamide, further implicating a functional role in obesity. Conclusions Our study, which is one of the first reports of a sequence-based association study using next-generation sequencing of candidate genes, provides insights into study design and analysis approaches and demonstrates the importance of examining regulatory elements rather than exclusively focusing on exon sequences. PMID:21118518

  2. Duplication of the fusion of TMPRSS2 to ERG sequences identifies fatal human prostate cancer

    PubMed Central

    Attard, G; Clark, J; Ambroisine, L; Fisher, G; Kovacs, G; Flohr, P; Berney, D; Foster, CS; Fletcher, A; Gerald, WL; Moller, H; Reuter, V; De Bono, JS; Scardino, P; Cuzick, J; Cooper, CS

    2009-01-01

    New predictive markers for managing prostate cancer are urgently required because of the highly variable natural history of this disease. At the time of diagnosis, Gleason score provides the gold standard for assessing the aggressiveness of prostate cancer. However, the recent discovery of TMPRSS2 fusions to the ERG gene in prostate cancer raises the possibility of using alterations at the ERG locus as additional mechanism-based prognostic indicators. Fluorescence in situ hybridization (FISH) assays were used to assess ERG gene status in a cohort of 445 prostate cancers from patients who had been conservatively managed. The FISH assays detected separation of 5′ (labelled green) and 3′ (labelled red) ERG sequences, which is a consequence of the TMPRSS2–ERG fusion, and additionally identify interstitial deletion of genomic sequences between the tandemly located TMPRSS2 and ERG gene sequences on chromosome 21. Cancers lacking ERG alterations exhibited favourable cause-specific survival (90% survival at 8 years). We identify a novel category of prostate cancers, characterized by duplication of the fusion of TMPRSS2 to ERG sequences together with interstitial deletion of sequences 5′ to ERG (called ‘2+Edel’), which by comparison exhibited extremely poor cause-specific survival (hazard ratio = 6.10, 95% confidence ratio = 3.33–11.15, P < 0.001, 25% survival at 8 years). In multivariate analysis, ‘2+Edel’ provided significant prognostic information (P = 0.003) in addition to that provided by Gleason score and prostate-specific antigen level at diagnosis. Other individual categories of ERG alteration were associated with intermediate or good prognosis. We conclude that determination of ERG gene status, including duplication of the fusion of TMPRSS2 to ERG sequences in 2+Edel, allows stratification of prostate cancer into distinct survival categories. PMID:17637754

  3. UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy.

    PubMed

    Smith, Tom; Heger, Andreas; Sudbery, Ian

    2017-03-01

    Unique Molecular Identifiers (UMIs) are random oligonucleotide barcodes that are increasingly used in high-throughput sequencing experiments. Through a UMI, identical copies arising from distinct molecules can be distinguished from those arising through PCR amplification of the same molecule. However, bioinformatic methods to leverage the information from UMIs have yet to be formalized. In particular, sequencing errors in the UMI sequence are often ignored or else resolved in an ad hoc manner. We show that errors in the UMI sequence are common and introduce network-based methods to account for these errors when identifying PCR duplicates. Using these methods, we demonstrate improved quantification accuracy both under simulated conditions and real iCLIP and single-cell RNA-seq data sets. Reproducibility between iCLIP replicates and single-cell RNA-seq clustering are both improved using our proposed network-based method, demonstrating the value of properly accounting for errors in UMIs. These methods are implemented in the open source UMI-tools software package.

  4. UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy

    PubMed Central

    2017-01-01

    Unique Molecular Identifiers (UMIs) are random oligonucleotide barcodes that are increasingly used in high-throughput sequencing experiments. Through a UMI, identical copies arising from distinct molecules can be distinguished from those arising through PCR amplification of the same molecule. However, bioinformatic methods to leverage the information from UMIs have yet to be formalized. In particular, sequencing errors in the UMI sequence are often ignored or else resolved in an ad hoc manner. We show that errors in the UMI sequence are common and introduce network-based methods to account for these errors when identifying PCR duplicates. Using these methods, we demonstrate improved quantification accuracy both under simulated conditions and real iCLIP and single-cell RNA-seq data sets. Reproducibility between iCLIP replicates and single-cell RNA-seq clustering are both improved using our proposed network-based method, demonstrating the value of properly accounting for errors in UMIs. These methods are implemented in the open source UMI-tools software package. PMID:28100584

  5. Exome sequencing in amyotrophic lateral sclerosis identifies risk genes and pathways

    PubMed Central

    Cirulli, Elizabeth T.; Lasseigne, Brittany N.; Petrovski, Slavé; Sapp, Peter C.; Dion, Patrick A.; Leblond, Claire S.; Couthouis, Julien; Lu, Yi-Fan; Wang, Quanli; Krueger, Brian J.; Ren, Zhong; Keebler, Jonathan; Han, Yujun; Levy, Shawn E.; Boone, Braden E.; Wimbish, Jack R.; Waite, Lindsay L.; Jones, Angela L.; Carulli, John P.; Day-Williams, Aaron G.; Staropoli, John F.; Xin, Winnie W.; Chesi, Alessandra; Raphael, Alya R.; McKenna-Yasek, Diane; Cady, Janet; de Jong, J.M.B. Vianney; Kenna, Kevin P.; Smith, Bradley N.; Topp, Simon; Miller, Jack; Gkazi, Athina; Al-Chalabi, Ammar; van den Berg, Leonard H.; Veldink, Jan; Silani, Vincenzo; Ticozzi, Nicola; Shaw, Christopher E.; Baloh, Robert H.; Appel, Stanley; Simpson, Ericka; Lagier-Tourenne, Clotilde; Pulst, Stefan M.; Gibson, Summer; Trojanowski, John Q.; Elman, Lauren; McCluskey, Leo; Grossman, Murray; Shneider, Neil A.; Chung, Wendy K.; Ravits, John M.; Glass, Jonathan D.; Sims, Katherine B.; Van Deerlin, Vivianna M.; Maniatis, Tom; Hayes, Sebastian D.; Ordureau, Alban; Swarup, Sharan; Landers, John; Baas, Frank; Allen, Andrew S.; Bedlack, Richard S.; Harper, J. Wade; Gitler, Aaron D.; Rouleau, Guy A.; Brown, Robert; Harms, Matthew B.; Cooper, Gregory M.; Harris, Tim; Myers, Richard M.; Goldstein, David B.

    2015-01-01

    Amyotrophic lateral sclerosis (ALS) is a devastating neurological disease with no effective treatment. Here we report the results of a moderate-scale sequencing study aimed at identifying new genes contributing to predisposition for ALS. We performed whole exome sequencing of 2,874 ALS patients and compared them to 6,405 controls. Several known ALS genes were found to be associated, and the non-canonical IκB kinase family TANK-Binding Kinase 1 (TBK1) was identified as an ALS gene. TBK1 is known to bind to and phosphorylate a number of proteins involved in innate immunity and autophagy, including optineurin (OPTN) and p62 (SQSTM1/sequestosome), both of which have also been implicated in ALS. These observations reveal a key role of the autophagic pathway in ALS and suggest specific targets for therapeutic intervention. PMID:25700176

  6. Whole-exome sequencing identifies somatic ATRX mutations in pheochromocytomas and paragangliomas.

    PubMed

    Fishbein, Lauren; Khare, Sanika; Wubbenhorst, Bradley; DeSloover, Daniel; D'Andrea, Kurt; Merrill, Shana; Cho, Nam Woo; Greenberg, Roger A; Else, Tobias; Montone, Kathleen; LiVolsi, Virginia; Fraker, Douglas; Daber, Robert; Cohen, Debbie L; Nathanson, Katherine L

    2015-01-21

    Pheochromocytomas and paragangliomas (PCC/PGL) are the solid tumour type most commonly associated with an inherited susceptibility syndrome. However, very little is known about the somatic genetic changes leading to tumorigenesis or malignant transformation. Here we perform whole-exome sequencing on a discovery set of 21 PCC/PGL and identify somatic ATRX mutations in two SDHB-associated tumours. Targeted sequencing of a separate validation set of 103 PCC/PGL identifies somatic ATRX mutations in 12.6% of PCC/PGL. PCC/PGL with somatic ATRX mutations are associated with alternative lengthening of telomeres and clinically aggressive behaviour. This finding suggests that loss of ATRX, an SWI/SNF chromatin remodelling protein, is important in the development of clinically aggressive PCC/PGL.

  7. Whole exome sequencing identifies a troponin T mutation hot spot in familial dilated cardiomyopathy.

    PubMed

    Campbell, Nzali; Sinagra, Gianfranco; Jones, Kenneth L; Slavov, Dobromir; Gowan, Katherine; Merlo, Marco; Carniel, Elisa; Fain, Pamela R; Aragona, Pierluigi; Di Lenarda, Andrea; Mestroni, Luisa; Taylor, Matthew R G

    2013-01-01

    Dilated cardiomyopathy (DCM) commonly causes heart failure and shows extensive genetic heterogeneity that may be amenable to newly developed next-generation DNA sequencing of the exome. In this study we report the successful use of exome sequencing to identify a pathogenic variant in the TNNT2 gene using segregation analysis in a large DCM family. Exome sequencing was performed on three distant relatives from a large family with a clear DCM phenotype. Missense, nonsense, and splice variants were analyzed for segregation among the three affected family members and confirmed in other relatives by direct sequencing. A c.517T C>T, Arg173Trp TNNT2 variant segregated with all affected family members and was also detected in one additional DCM family in our registry. The inclusion of segregation analysis using distant family members markedly improved the bioinformatics filtering process by removing from consideration variants that were not shared by all affected subjects. Haplotype analysis confirmed that the variant found in both DCM families was located on two distinct haplotypes, supporting the notion of independent mutational events in each family. In conclusion, an exome sequencing strategy that includes segregation analysis using distant affected relatives within a family represents a viable diagnostic strategy in a genetically heterogeneous disease like DCM.

  8. Whole exome sequencing identifies genetic variants in inherited thrombocytopenia with secondary qualitative function defects

    PubMed Central

    Johnson, Ben; Lowe, Gillian C.; Futterer, Jane; Lordkipanidzé, Marie; MacDonald, David; Simpson, Michael A.; Sanchez-Guiú, Isabel; Drake, Sian; Bem, Danai; Leo, Vincenzo; Fletcher, Sarah J.; Dawood, Ban; Rivera, José; Allsup, David; Biss, Tina; Bolton-Maggs, Paula HB; Collins, Peter; Curry, Nicola; Grimley, Charlotte; James, Beki; Makris, Mike; Motwani, Jayashree; Pavord, Sue; Talks, Katherine; Thachil, Jecko; Wilde, Jonathan; Williams, Mike; Harrison, Paul; Gissen, Paul; Mundell, Stuart; Mumford, Andrew; Daly, Martina E.; Watson, Steve P.; Morgan, Neil V.

    2016-01-01

    Inherited thrombocytopenias are a heterogeneous group of disorders characterized by abnormally low platelet counts which can be associated with abnormal bleeding. Next-generation sequencing has previously been employed in these disorders for the confirmation of suspected genetic abnormalities, and more recently in the discovery of novel disease-causing genes. However its full potential has not yet been exploited. Over the past 6 years we have sequenced the exomes from 55 patients, including 37 index cases and 18 additional family members, all of whom were recruited to the UK Genotyping and Phenotyping of Platelets study. All patients had inherited or sustained thrombocytopenia of unknown etiology with platelet counts varying from 11×109/L to 186×109/L. Of the 51 patients phenotypically tested, 37 (73%), had an additional secondary qualitative platelet defect. Using whole exome sequencing analysis we have identified “pathogenic” or “likely pathogenic” variants in 46% (17/37) of our index patients with thrombocytopenia. In addition, we report variants of uncertain significance in 12 index cases, including novel candidate genetic variants in previously unreported genes in four index cases. These results demonstrate that whole exome sequencing is an efficient method for elucidating potential pathogenic genetic variants in inherited thrombocytopenia. Whole exome sequencing also has the added benefit of discovering potentially pathogenic genetic variants for further study in novel genes not previously implicated in inherited thrombocytopenia. PMID:27479822

  9. Whole Exome Sequencing Identifies a Troponin T Mutation Hot Spot in Familial Dilated Cardiomyopathy

    PubMed Central

    Campbell, Nzali; Sinagra, Gianfranco; Jones, Kenneth L.; Slavov, Dobromir; Gowan, Katherine; Merlo, Marco; Carniel, Elisa; Fain, Pamela R.; Aragona, Pierluigi; Di Lenarda, Andrea; Mestroni, Luisa; Taylor, Matthew R. G.

    2013-01-01

    Dilated cardiomyopathy (DCM) commonly causes heart failure and shows extensive genetic heterogeneity that may be amenable to newly developed next-generation DNA sequencing of the exome. In this study we report the successful use of exome sequencing to identify a pathogenic variant in the TNNT2 gene using segregation analysis in a large DCM family. Exome sequencing was performed on three distant relatives from a large family with a clear DCM phenotype. Missense, nonsense, and splice variants were analyzed for segregation among the three affected family members and confirmed in other relatives by direct sequencing. A c.517T C>T, Arg173Trp TNNT2 variant segregated with all affected family members and was also detected in one additional DCM family in our registry. The inclusion of segregation analysis using distant family members markedly improved the bioinformatics filtering process by removing from consideration variants that were not shared by all affected subjects. Haplotype analysis confirmed that the variant found in both DCM families was located on two distinct haplotypes, supporting the notion of independent mutational events in each family. In conclusion, an exome sequencing strategy that includes segregation analysis using distant affected relatives within a family represents a viable diagnostic strategy in a genetically heterogeneous disease like DCM. PMID:24205113

  10. Identify sequence of events likely to result in severe crash outcomes.

    PubMed

    Wu, Kun-Feng; Thor, Craig P; Ardiansyah, Muhammad Nashir

    2016-11-01

    The current practice of crash characterization in highway engineering reduces multiple dimensions of crash contributing factors and their relative sequential connections, crash sequences, into broad definitions, resulting in crash categories such as head-on, sideswipe, rear-end, angle, and fixed-object. As a result, crashes that are classified in the same category may contain many different crash sequences. This makes it difficult to develop effective countermeasures because these crash categorizations are based on the outcomes rather than the preceding events. Consequently, the efficacy of a countermeasure designed for a specific type of crash may not be appropriate due to different pre-crash sequences. This research seeks to explore the use of event sequence to characterize crashes. Additionally, this research seeks to identify crash sequences that are likely to result in severe crash outcomes so that researchers can develop effective countermeasures to reduce severe crashes. This study utilizes the sequence of events from roadway departure crashes in the Fatality Analysis Reporting System (FARS), and converts the information to form a new categorization called "crash sequences." The similarity distance between each pair of crash sequences were calculated using the Optimal Matching approach. Cluster analysis was applied to group crash sequences that are etiologically similar in terms of the similarity distance. A hybrid model was constructed to mitigate the potential sample selection bias of FARS data, which is biased toward more severe crashes. The major findings include: (1) in terms of a roadway departure crash, the crash sequences that are most likely to result in high crash severity include a vehicle that first crosses the median or centerline, runs-off-road on the left, and then collides with a roadside fixed-object; (2) seat-belt and airbag usage reduces the probability of dying in a roadway departure crash by 90%; and (3) occupants who are seated on the

  11. Deep sequencing identifies viral and wasp genes with potential roles in replication of Microplitis demolitor Bracovirus.

    PubMed

    Burke, Gaelen R; Strand, Michael R

    2012-03-01

    Viruses in the genus Bracovirus (BV) (Polydnaviridae) are symbionts of parasitoid wasps that specifically replicate in the ovaries of females. Recent analysis of expressed sequence tags from two wasp species, Cotesia congregata and Chelonus inanitus, identified transcripts related to 24 different nudivirus genes. These results together with other data strongly indicate that BVs evolved from a nudivirus ancestor. However, it remains unclear whether BV-carrying wasps contain other nudivirus-like genes and what types of wasp genes may also be required for BV replication. Microplitis demolitor carries Microplitis demolitor bracovirus (MdBV). Here we characterized MdBV replication and performed massively parallel sequencing of M. demolitor ovary transcripts. Our results indicated that MdBV replication begins in stage 2 pupae and continues in adults. Analysis of prereplication- and active-replication-stage ovary RNAs yielded 22 Gb of sequence that assembled into 66,425 transcripts. This breadth of sampling indicated that a large percentage of genes in the M. demolitor genome were sequenced. A total of 41 nudivirus-like transcripts were identified, of which a majority were highly expressed during MdBV replication. Our results also identified a suite of wasp genes that were highly expressed during MdBV replication. Among these products were several transcripts with conserved roles in regulating locus-specific DNA amplification by eukaryotes. Overall, our data set together with prior results likely identify the majority of nudivirus-related genes that are transcriptionally functional during BV replication. Our results also suggest that amplification of proviral DNAs for packaging into BV virions may depend upon the replication machinery of wasps.

  12. Identify the key amino acid of BAFF binding with TACI.

    PubMed

    Wang, Renxi; Wang, Ru; Ma, Ning; Guo, Yueling; Xiao, He; Chen, Guojiang; Han, Gencheng; Hou, Chunmei; Shen, Beifen; Feng, Jiannan; Li, Yan

    2013-01-01

    B-cell activating factor (BAFF) has been used as a therapeutic target. To develop BAFF-specific small molecular inhibitors, it is necessary to know the key amino acid in the BAFF binding with its receptor. The key binding amino acid of BAFF interacting with its receptor TACI (trans-membrane activator and calcium modulator and cyclophilin ligand interactor) was analyzed based on the computer-guided molecular modeling method. According to theoretical prediction, a series of key amino acid mutants of BAFF, including M204 (Lys(204) to Ala), M208 (Met(208) to Ala), M209 (Gly(209) to Ala), M210 (His(210) to Ala), M234 (Gln(234) to Ala), M236 (Met(236) to Ala), and M237 (Pro(237) to Ala) were designed and evaluated with biological experiments. The results show that M208, M209, M236, and M237 of BAFF were the key amino acids and in accord with the theoretical results. The results highlight clues for the further development of BAFF-specific small molecular inhibitors.

  13. Nucleic acid (cDNA) and amino acid sequences of alpha-type gliadins from wheat (Triticum aestivum).

    PubMed Central

    Kasarda, D D; Okita, T W; Bernardin, J E; Baecker, P A; Nimmo, C C; Lew, E J; Dietler, M D; Greene, F C

    1984-01-01

    The complete amino acid sequence for an alpha-type gliadin protein of wheat (Triticum aestivum Linnaeus) endosperm has been derived from a cloned cDNA sequence. An additional cDNA clone that corresponds to about 75% of a similar alpha-type gliadin has been sequenced and shows some important differences. About 97% of the composite sequence of A-gliadin (an alpha-type gliadin fraction) has also been obtained by direct amino acid sequencing. This sequence shows a high degree of similarity with amino acid sequences derived from both cDNA clones and is virtually identical to one of them. On the basis of sequence information, after loss of the signal sequence, the mature alpha-type gliadins may be divided into five different domains, two of which may have evolved from an ancestral gliadin gene, whereas the remaining three contain repeating sequences that may have developed independently. Images PMID:6589619

  14. Structural gene and complete amino acid sequence of Vibrio alginolyticus collagenase.

    PubMed Central

    Takeuchi, H; Shibano, Y; Morihara, K; Fukushima, J; Inami, S; Keil, B; Gilles, A M; Kawamoto, S; Okuda, K

    1992-01-01

    The DNA encoding the collagenase of Vibrio alginolyticus was cloned, and its complete nucleotide sequence was determined. When the cloned gene was ligated to pUC18, the Escherichia coli expression vector, bacteria carrying the gene exhibited both collagenase antigen and collagenase activity. The open reading frame from the ATG initiation codon was 2442 bp in length for the collagenase structural gene. The amino acid sequence, deduced from the nucleotide sequence, revealed that the mature collagenase consists of 739 amino acids with an Mr of 81875. The amino acid sequences of 20 polypeptide fragments were completely identical with the deduced amino acid sequences of the collagenase gene. The amino acid composition predicted from the DNA sequence was similar to the chemically determined composition of purified collagenase reported previously. The analyses of both the DNA and amino acid sequences of the collagenase gene were rigorously performed, but we could not detect any significant sequence similarity to other collagenases. Images Fig. 2. PMID:1311172

  15. Amino acid sequence of band-3 protein from rainbow trout erythrocytes derived from cDNA.

    PubMed Central

    Hübner, S; Michel, F; Rudloff, V; Appelhans, H

    1992-01-01

    In this report we present the first complete band-3 cDNA sequence of a poikilothermic lower vertebrate. The primary structure of the anion-exchange protein band 3 (AE1) from rainbow trout erythrocytes was determined by nucleotide sequencing of cDNA clones. The overlapping clones have a total length of 3827 bp with a 5'-terminal untranslated region of 150 bp, a 2754 bp open reading frame and a 3'-untranslated region of 924 bp. Band-3 protein from trout erythrocytes consists of 918 amino acid residues with a calculated molecular mass of 101 827 Da. Comparison of its amino acid sequence revealed a 60-65% identity within the transmembrane spanning sequence of band-3 proteins published so far. An additional insertion of 24 amino acid residues within the membrane-associated domain of trout band-3 protein was identified, which until now was thought to be a general feature only of mammalian band-3-related proteins. PMID:1637296

  16. Whole-exome sequencing identifies novel homozygous mutation in NPAS2 in family with nonobstructive azoospermia

    PubMed Central

    Ramasamy, Ranjith; Bakircioğlu, M. Emre; Cengiz, Cenk; Karaca, Ender; Scovell, Jason; Jhangiani, Shalini N.; Akdemir, Zeynep C.; Bainbridge, Matthew; Yu, Yao; Huff, Chad; Gibbs, Richard A.; Lupski, James R.; Lamb, Dolores J.

    2015-01-01

    Objective To investigate the genetic cause of nonobstructive azoospermia (NOA) in a consanguineous Turkish family through homozygosity mapping followed by targeted exon/whole-exome sequencing to identify genetic variations. Design Whole-exome sequencing Setting Research laboratory Patient(s) We sequenced the exomes of two siblings in a consanguineous family with NOA. Intervention(s) All variants passing filter criteria were validated with Sanger sequencing to confirm familial segregation and absence in the control population. Main Outcome Measure Discovery of a mutation that could potentially cause NOA Results A novel non-synonymous mutation in neuronal PAS 2 domain (NPAS2) was identified in a consanguineous family from Turkey. This mutation in exon 14 (chr2: 101592000 C>G) of NPAS2 is likely a disease-causing mutation as it is predicted to be damaging, is a novel variant, and segregates with the disease. Family segregation of the variants showed the presence of homozygous mutation in the three brothers with NOA and heterozygous mutation in mother, one brother and one sister who were both fertile. The mutation is not found in the single nucleotide polymorphism (SNP) database, the 1000 Genomes Project, Baylor College of Medicine cohort of 500 Turkish patients (not a population specific polymorphism) or matching 50 fertile controls. Conclusions Using WES, we identified a novel homozygous mutation in NPAS2 as a likely disease-causing variant in a Turkish family diagnosed with NOA. Our data reinforce the clinical role of WES in the molecular diagnosis of highly heterogeneous genetic diseases which conventional genetic approaches have previously failed to conclude a molecular diagnosis. PMID:25956372

  17. A Flexible, Efficient Binomial Mixed Model for Identifying Differential DNA Methylation in Bisulfite Sequencing Data

    PubMed Central

    Lea, Amanda J.

    2015-01-01

    Identifying sources of variation in DNA methylation levels is important for understanding gene regulation. Recently, bisulfite sequencing has become a popular tool for investigating DNA methylation levels. However, modeling bisulfite sequencing data is complicated by dramatic variation in coverage across sites and individual samples, and because of the computational challenges of controlling for genetic covariance in count data. To address these challenges, we present a binomial mixed model and an efficient, sampling-based algorithm (MACAU: Mixed model association for count data via data augmentation) for approximate parameter estimation and p-value computation. This framework allows us to simultaneously account for both the over-dispersed, count-based nature of bisulfite sequencing data, as well as genetic relatedness among individuals. Using simulations and two real data sets (whole genome bisulfite sequencing (WGBS) data from Arabidopsis thaliana and reduced representation bisulfite sequencing (RRBS) data from baboons), we show that our method provides well-calibrated test statistics in the presence of population structure. Further, it improves power to detect differentially methylated sites: in the RRBS data set, MACAU detected 1.6-fold more age-associated CpG sites than a beta-binomial model (the next best approach). Changes in these sites are consistent with known age-related shifts in DNA methylation levels, and are enriched near genes that are differentially expressed with age in the same population. Taken together, our results indicate that MACAU is an efficient, effective tool for analyzing bisulfite sequencing data, with particular salience to analyses of structured populations. MACAU is freely available at www.xzlab.org/software.html. PMID:26599596

  18. Genomic islands of divergence in hybridizing Heliconius butterflies identified by large-scale targeted sequencing

    PubMed Central

    Nadeau, Nicola J.; Whibley, Annabel; Jones, Robert T.; Davey, John W.; Dasmahapatra, Kanchon K.; Baxter, Simon W.; Quail, Michael A.; Joron, Mathieu; ffrench-Constant, Richard H.; Blaxter, Mark L.; Mallet, James; Jiggins, Chris D.

    2012-01-01

    Heliconius butterflies represent a recent radiation of species, in which wing pattern divergence has been implicated in speciation. Several loci that control wing pattern phenotypes have been mapped and two were identified through sequencing. These same gene regions play a role in adaptation across the whole Heliconius radiation. Previous studies of population genetic patterns at these regions have sequenced small amplicons. Here, we use targeted next-generation sequence capture to survey patterns of divergence across these entire regions in divergent geographical races and species of Heliconius. This technique was successful both within and between species for obtaining high coverage of almost all coding regions and sufficient coverage of non-coding regions to perform population genetic analyses. We find major peaks of elevated population differentiation between races across hybrid zones, which indicate regions under strong divergent selection. These ‘islands’ of divergence appear to be more extensive between closely related species, but there is less clear evidence for such islands between more distantly related species at two further points along the ‘speciation continuum’. We also sequence fosmid clones across these regions in different Heliconius melpomene races. We find no major structural rearrangements but many relatively large (greater than 1 kb) insertion/deletion events (including gain/loss of transposable elements) that are variable between races. PMID:22201164

  19. Newborn Screening Quality Assurance Program for CFTR Mutation Detection and Gene Sequencing to Identify Cystic Fibrosis

    PubMed Central

    Hendrix, Miyono M.; Foster, Stephanie L.; Cordovado, Suzanne K.

    2016-01-01

    All newborn screening laboratories in the United States and many worldwide screen for cystic fibrosis. Most laboratories use a second-tier genotyping assay to identify a panel of mutations in the CF transmembrane regulator (CFTR) gene. Centers for Disease Control and Prevention’s Newborn Screening Quality Assurance Program houses a dried blood spot repository of samples containing CFTR mutations to assist newborn screening laboratories and ensure high-quality mutation detection in a high-throughput environment. Recently, CFTR mutation detection has increased in complexity with expanded genotyping panels and gene sequencing. To accommodate the growing quality assurance needs, the repository samples were characterized with several multiplex genotyping methods, Sanger sequencing, and 3 next-generation sequencing assays using a high-throughput, low-concentration DNA extraction method. The samples performed well in all of the assays, providing newborn screening laboratories with a resource for complex CFTR mutation detection and next-generation sequencing as they transition to new methods. PMID:28261631

  20. Exome Sequencing Identifies Potential Risk Variants for Mendelian Disorders at High Prevalence in Qatar

    PubMed Central

    Rodriguez-Flores, Juan L.; Fakhro, Khalid; Hackett, Neil R.; Salit, Jacqueline; Fuller, Jennifer; Agosto-Perez, Francisco; Gharbiah, Maey; Malek, Joel A.; Zirie, Mahmoud; Jayyousi, Amin; Badii, Ramin; Al-Marri, Ajayeb Al-Nabet; Chouchane, Lotfi; Stadler, Dora J.; Hunter-Zinck, Haley; Mezey, Jason G.; Crystal, Ronald G.

    2013-01-01

    Exome sequencing of families of related individuals has been highly successful in identifying genetic polymorphisms responsible for Mendelian disorders. Here, we demonstrate the value of the reverse approach, where we use exome sequencing of a sample of unrelated individuals to analyze allele frequencies of known causal mutations for Mendelian diseases. We sequenced the exomes of 100 individuals representing the three major genetic subgroups of the Qatari population (Q1 Bedouin, Q2 Persian-South Asian, Q3 African) and identified 37 variants in 33 genes with effects on 36 clinically significant Mendelian diseases. These include variants not present in 1000 Genomes and variants at high frequency when compared to 1000 Genomes populations. Several of these Mendelian variants were only segregating in one Qatari subpopulation, where the observed subpopulation specificity trends were confirmed in an independent population of 386 Qataris. Pre-marital genetic screening in Qatar tests for only 4 out of the 37, such that this study provides a set of Mendelian disease variants with potential impact on the epidemiological profile of the population that could be incorporated into the testing program if further experimental and clinical characterization confirms high penetrance. PMID:24123366

  1. Expressed sequences tags of the anther smut fungus, Microbotryum violaceum, identify mating and pathogenicity genes

    PubMed Central

    Yockteng, Roxana; Marthey, Sylvain; Chiapello, Hélène; Gendrault, Annie; Hood, Michael E; Rodolphe, François; Devier, Benjamin; Wincker, Patrick; Dossat, Carole; Giraud, Tatiana

    2007-01-01

    Background The basidiomycete fungus Microbotryum violaceum is responsible for the anther-smut disease in many plants of the Caryophyllaceae family and is a model in genetics and evolutionary biology. Infection is initiated by dikaryotic hyphae produced after the conjugation of two haploid sporidia of opposite mating type. This study describes M. violaceum ESTs corresponding to nuclear genes expressed during conjugation and early hyphal production. Results A normalized cDNA library generated 24,128 sequences, which were assembled into 7,765 unique genes; 25.2% of them displayed significant similarity to annotated proteins from other organisms, 74.3% a weak similarity to the same set of known proteins, and 0.5% were orphans. We identified putative pheromone receptors and genes that in other fungi are involved in the mating process. We also identified many sequences similar to genes known to be involved in pathogenicity in other fungi. The M. violaceum EST database, MICROBASE, is available on the Web and provides access to the sequences, assembled contigs, annotations and programs to compare similarities against MICROBASE. Conclusion This study provides a basis for cloning the mating type locus, for further investigation of pathogenicity genes in the anther smut fungi, and for comparative genomics. PMID:17692127

  2. Genetic mapping and exome sequencing identify variants associated with five novel diseases.

    PubMed

    Puffenberger, Erik G; Jinks, Robert N; Sougnez, Carrie; Cibulskis, Kristian; Willert, Rebecca A; Achilly, Nathan P; Cassidy, Ryan P; Fiorentini, Christopher J; Heiken, Kory F; Lawrence, Johnny J; Mahoney, Molly H; Miller, Christopher J; Nair, Devika T; Politi, Kristin A; Worcester, Kimberly N; Setton, Roni A; Dipiazza, Rosa; Sherman, Eric A; Eastman, James T; Francklyn, Christopher; Robey-Bond, Susan; Rider, Nicholas L; Gabriel, Stacey; Morton, D Holmes; Strauss, Kevin A

    2012-01-01

    The Clinic for Special Children (CSC) has integrated biochemical and molecular methods into a rural pediatric practice serving Old Order Amish and Mennonite (Plain) children. Among the Plain people, we have used single nucleotide polymorphism (SNP) microarrays to genetically map recessive disorders to large autozygous haplotype blocks (mean = 4.4 Mb) that contain many genes (mean = 79). For some, uninformative mapping or large gene lists preclude disease-gene identification by Sanger sequencing. Seven such conditions were selected for exome sequencing at the Broad Institute; all had been previously mapped at the CSC using low density SNP microarrays coupled with autozygosity and linkage analyses. Using between 1 and 5 patient samples per disorder, we identified sequence variants in the known disease-causing genes SLC6A3 and FLVCR1, and present evidence to strongly support the pathogenicity of variants identified in TUBGCP6, BRAT1, SNIP1, CRADD, and HARS. Our results reveal the power of coupling new genotyping technologies to population-specific genetic knowledge and robust clinical data.

  3. Complete Genome Sequence of Clostridium estertheticum DSM 8809, a Microbe Identified in Spoiled Vacuum Packed Beef

    PubMed Central

    Yu, Zhongyi; Gunn, Lynda; Brennan, Evan; Reid, Rachael; Wall, Patrick G.; Gaora, Peadar Ó.; Hurley, Daniel; Bolton, Declan; Fanning, Séamus

    2016-01-01

    Blown pack spoilage (BPS) is a major issue for the beef industry. Etiological agents of BPS involve members of a group of Clostridium species, including Clostridium estertheticum which has the ability to produce gas, mostly carbon dioxide, under anaerobic psychotrophic growth conditions. This spore-forming bacterium grows slowly under laboratory conditions, and it can take up to 3 months to produce a workable culture. These characteristics have limited the study of this commercially challenging bacterium. Consequently information on this bacterium is limited and no effective controls are currently available to confidently detect and manage this production risk. In this study the complete genome of C. estertheticum DSM 8809 was determined by SMRT® sequencing. The genome consists of a circular chromosome of 4.7 Mbp along with a single plasmid carrying a potential tellurite resistance gene tehB and a Tn3-like resolvase-encoding gene tnpR. The genome sequence was searched for central metabolic pathways that would support its biochemical profile and several enzymes contributing to this phenotype were identified. Several putative antibiotic/biocide/metal resistance-encoding genes and virulence factors were also identified in the genome, a feature that requires further research. The availability of the genome sequence will provide a basic blueprint from which to develop valuable biomarkers that could support and improve the detection and control of this bacterium along the beef production chain. PMID:27891116

  4. Genetic profile for suspected dysferlinopathy identified by targeted next-generation sequencing

    PubMed Central

    Izumi, Rumiko; Niihori, Tetsuya; Takahashi, Toshiaki; Suzuki, Naoki; Tateyama, Maki; Watanabe, Chigusa; Sugie, Kazuma; Nakanishi, Hirotaka; Sobue, Gen; Kato, Masaaki; Warita, Hitoshi; Aoki, Yoko

    2015-01-01

    Objective: To investigate the genetic causes of suspected dysferlinopathy and to reveal the genetic profile for myopathies with dysferlin deficiency. Methods: Using next-generation sequencing, we analyzed 42 myopathy-associated genes, including DYSF, in 64 patients who were clinically or pathologically suspected of having dysferlinopathy. Putative pathogenic mutations were confirmed by Sanger sequencing. In addition, copy-number variations in DYSF were investigated using multiplex ligation-dependent probe amplification. We also analyzed the genetic profile for 90 patients with myopathy with dysferlin deficiency, as indicated by muscle specimen immunohistochemistry, including patients from a previous cohort. Results: We identified putative pathogenic mutations in 38 patients (59% of all investigated patients). Twenty-three patients had DYSF mutations, including 6 novel mutations. The remaining 16 patients, including a single patient who also carried the DYSF mutation, harbored putative pathogenic mutations in other genes. The genetic profile for 90 patients with dysferlin deficiency revealed that 70% had DYSF mutations (n = 63), 10% had CAPN3 mutations (n = 9), 2% had CAV3 mutations (n = 2), 3% had mutations in other genes (in single patients), and 16% did not have any identified mutations (n = 14). Conclusions: This study clarified the heterogeneous genetic profile for myopathies with dysferlin deficiency. Our results demonstrate the importance of a comprehensive analysis of related genes in improving the genetic diagnosis of dysferlinopathy as one of the most common subtypes of limb-girdle muscular dystrophy. Unresolved diagnoses should be investigated using whole-genome or whole-exome sequencing. PMID:27066573

  5. Genetic Mapping and Exome Sequencing Identify Variants Associated with Five Novel Diseases

    PubMed Central

    Puffenberger, Erik G.; Jinks, Robert N.; Sougnez, Carrie; Cibulskis, Kristian; Willert, Rebecca A.; Achilly, Nathan P.; Cassidy, Ryan P.; Fiorentini, Christopher J.; Heiken, Kory F.; Lawrence, Johnny J.; Mahoney, Molly H.; Miller, Christopher J.; Nair, Devika T.; Politi, Kristin A.; Worcester, Kimberly N.; Setton, Roni A.; DiPiazza, Rosa; Sherman, Eric A.; Eastman, James T.; Francklyn, Christopher; Robey-Bond, Susan; Rider, Nicholas L.; Gabriel, Stacey; Morton, D. Holmes; Strauss, Kevin A.

    2012-01-01

    The Clinic for Special Children (CSC) has integrated biochemical and molecular methods into a rural pediatric practice serving Old Order Amish and Mennonite (Plain) children. Among the Plain people, we have used single nucleotide polymorphism (SNP) microarrays to genetically map recessive disorders to large autozygous haplotype blocks (mean = 4.4 Mb) that contain many genes (mean = 79). For some, uninformative mapping or large gene lists preclude disease-gene identification by Sanger sequencing. Seven such conditions were selected for exome sequencing at the Broad Institute; all had been previously mapped at the CSC using low density SNP microarrays coupled with autozygosity and linkage analyses. Using between 1 and 5 patient samples per disorder, we identified sequence variants in the known disease-causing genes SLC6A3 and FLVCR1, and present evidence to strongly support the pathogenicity of variants identified in TUBGCP6, BRAT1, SNIP1, CRADD, and HARS. Our results reveal the power of coupling new genotyping technologies to population-specific genetic knowledge and robust clinical data. PMID:22279524

  6. Whole Exome Sequencing Identifies de Novo Mutations in GATA6 Associated with Congenital Diaphragmatic Hernia

    PubMed Central

    Yu, Lan; Bennett, James T.; Wynn, Julia; Carvill, Gemma L.; Cheung, Yee Him; Shen, Yufeng; Mychaliska, George B.; Azarow, Kenneth S.; Crombleholme, Timothy M.; Chung, Dai H.; Potoka, Douglas; Warner, Brad W.; Bucher, Brian; Lim, Foong-Yen; Pietsch, John; Stolar, Charles; Aspelund, Gudrun; Arkovitz, Marc S.; Mefford, Heather; Chung, Wendy K.

    2014-01-01

    Background Congenital diaphragmatic hernia (CDH) is a common birth defect affecting 1 in 3,000 births. It is characterized by herniation of abdominal viscera through an incompletely formed diaphragm. Although chromosomal anomalies and mutations in several genes have been implicated, the cause for most patients is unknown. Methods We used whole exome sequencing in two families with CDH and congenital heart disease, and identified mutations in GATA6 in both. Results In the first family, we identified a de novo missense mutation (c.1366C>T, p.R456C) in a sporadic CDH patient with tetralogy of Fallot. In the second, a nonsense mutation (c.712G>T, p.G238*) was identified in two siblings with CDH and a large ventricular septal defect. The G238* mutation was inherited from their mother, who was clinically affected with congenital absence of the pericardium, patent ductus arteriosus, and intestinal malrotation. Deep sequencing of blood and saliva derived DNA from the mother suggested somatic mosaicism as an explanation for her milder phenotype, with only approximately 15% mutant alleles. To determine the frequency of GATA6 mutations in CDH, we sequenced the gene in 378 patients with CDH. We identified one additional de novo mutation (c.1071delG, p.V358Cfs34*). Conclusions Mutations in GATA6 have been previously associated with pancreatic agenesis and congenital heart disease. We conclude that, in addition to the heart and the pancreas, GATA6 is involved in development of two additional organs, the diaphragm and the pericardium. In addition we have shown that de novo mutations can contribute to the development of CDH, a common birth defect. PMID:24385578

  7. Identifiability of PBPK Models with Applications to Dimethylarsinic Acid Exposure

    EPA Science Inventory

    Any statistical model should be identifiable in order for estimates and tests using it to be meaningful. We consider statistical analysis of physiologically-based pharmacokinetic (PBPK) models in which parameters cannot be estimated precisely from available data, and discuss diff...

  8. Exome sequencing in amyotrophic lateral sclerosis identifies risk genes and pathways.

    PubMed

    Cirulli, Elizabeth T; Lasseigne, Brittany N; Petrovski, Slavé; Sapp, Peter C; Dion, Patrick A; Leblond, Claire S; Couthouis, Julien; Lu, Yi-Fan; Wang, Quanli; Krueger, Brian J; Ren, Zhong; Keebler, Jonathan; Han, Yujun; Levy, Shawn E; Boone, Braden E; Wimbish, Jack R; Waite, Lindsay L; Jones, Angela L; Carulli, John P; Day-Williams, Aaron G; Staropoli, John F; Xin, Winnie W; Chesi, Alessandra; Raphael, Alya R; McKenna-Yasek, Diane; Cady, Janet; Vianney de Jong, J M B; Kenna, Kevin P; Smith, Bradley N; Topp, Simon; Miller, Jack; Gkazi, Athina; Al-Chalabi, Ammar; van den Berg, Leonard H; Veldink, Jan; Silani, Vincenzo; Ticozzi, Nicola; Shaw, Christopher E; Baloh, Robert H; Appel, Stanley; Simpson, Ericka; Lagier-Tourenne, Clotilde; Pulst, Stefan M; Gibson, Summer; Trojanowski, John Q; Elman, Lauren; McCluskey, Leo; Grossman, Murray; Shneider, Neil A; Chung, Wendy K; Ravits, John M; Glass, Jonathan D; Sims, Katherine B; Van Deerlin, Vivianna M; Maniatis, Tom; Hayes, Sebastian D; Ordureau, Alban; Swarup, Sharan; Landers, John; Baas, Frank; Allen, Andrew S; Bedlack, Richard S; Harper, J Wade; Gitler, Aaron D; Rouleau, Guy A; Brown, Robert; Harms, Matthew B; Cooper, Gregory M; Harris, Tim; Myers, Richard M; Goldstein, David B

    2015-03-27

    Amyotrophic lateral sclerosis (ALS) is a devastating neurological disease with no effective treatment. We report the results of a moderate-scale sequencing study aimed at increasing the number of genes known to contribute to predisposition for ALS. We performed whole-exome sequencing of 2869 ALS patients and 6405 controls. Several known ALS genes were found to be associated, and TBK1 (the gene encoding TANK-binding kinase 1) was identified as an ALS gene. TBK1 is known to bind to and phosphorylate a number of proteins involved in innate immunity and autophagy, including optineurin (OPTN) and p62 (SQSTM1/sequestosome), both of which have also been implicated in ALS. These observations reveal a key role of the autophagic pathway in ALS and suggest specific targets for therapeutic intervention.

  9. Comparative analysis identifies exonic splicing regulatory sequences--The complex definition of enhancers and silencers.

    PubMed

    Goren, Amir; Ram, Oren; Amit, Maayan; Keren, Hadas; Lev-Maor, Galit; Vig, Ida; Pupko, Tal; Ast, Gil

    2006-06-23

    Exonic splicing regulatory sequences (ESRs) are cis-acting factor binding sites that regulate constitutive and alternative splicing. A computational method based on the conservation level of wobble positions and the overabundance of sequence motifs between 46,103 human and mouse orthologous exons was developed, identifying 285 putative ESRs. Alternatively spliced exons that are either short in length or contain weak splice sites show the highest conservation level of those ESRs, especially toward the edges of exons. ESRs that are abundant in those subgroups show a different distribution between constitutively and alternatively spliced exons. Representatives of these ESRs and two SR protein binding sites were shown, experimentally, to display variable regulatory effects on alternative splicing, depending on their relative locations in the exon. This finding signifies the delicate positional effect of ESRs on alternative splicing regulation.

  10. Characterization of the microbial acid mine drainage microbial community using culturing and direct sequencing techniques.

    PubMed

    Auld, Ryan R; Myre, Maxine; Mykytczuk, Nadia C S; Leduc, Leo G; Merritt, Thomas J S

    2013-05-01

    We characterized the bacterial community from an AMD tailings pond using both classical culturing and modern direct sequencing techniques and compared the two methods. Acid mine drainage (AMD) is produced by the environmental and microbial oxidation of minerals dissolved from mining waste. Surprisingly, we know little about the microbial communities associated with AMD, despite the fundamental ecological roles of these organisms and large-scale economic impact of these waste sites. AMD microbial communities have classically been characterized by laboratory culturing-based techniques and more recently by direct sequencing of marker gene sequences, primarily the 16S rRNA gene. In our comparison of the techniques, we find that their results are complementary, overall indicating very similar community structure with similar dominant species, but with each method identifying some species that were missed by the other. We were able to culture the majority of species that our direct sequencing results indicated were present, primarily species within the Acidithiobacillus and Acidiphilium genera, although estimates of relative species abundance were only obtained from direct sequencing. Interestingly, our culture-based methods recovered four species that had been overlooked from our sequencing results because of the rarity of the marker gene sequences, likely members of the rare biosphere. Further, direct sequencing indicated that a single genus, completely missed in our culture-based study, Legionella, was a dominant member of the microbial community. Our results suggest that while either method does a reasonable job of identifying the dominant members of the AMD microbial community, together the methods combine to give a more complete picture of the true diversity of this environment.

  11. Exome Sequencing Identifies a Rare HSPG2 Variant Associated with Familial Idiopathic Scoliosis

    PubMed Central

    Baschal, Erin E.; Wethey, Cambria I.; Swindle, Kandice; Baschal, Robin M.; Gowan, Katherine; Tang, Nelson L.S.; Alvarado, David M.; Haller, Gabe E.; Dobbs, Matthew B.; Taylor, Matthew R.G.; Gurnett, Christina A.; Jones, Kenneth L.; Miller, Nancy H.

    2014-01-01

    Idiopathic scoliosis occurs in 3% of individuals and has an unknown etiology. The objective of this study was to identify rare variants that contribute to the etiology of idiopathic scoliosis by using exome sequencing in a multigenerational family with idiopathic scoliosis. Exome sequencing was completed for three members of this multigenerational family with idiopathic scoliosis, resulting in the identification of a variant in the HSPG2 gene as a potential contributor to the phenotype. The HSPG2 gene was sequenced in a separate cohort of 100 unrelated individuals affected with idiopathic scoliosis and also was examined in an independent idiopathic scoliosis population. The exome sequencing and subsequent bioinformatics filtering resulted in 16 potentially damaging and rare coding variants. One of these variants, p.Asn786Ser, is located in the HSPG2 gene. The variant p.Asn786Ser also is overrepresented in a larger cohort of idiopathic scoliosis cases compared with a control population (P = 0.024). Furthermore, we identified additional rare HSPG2 variants that are predicted to be damaging in two independent cohorts of individuals with idiopathic scoliosis. The HSPG2 gene encodes for a ubiquitous multifunctional protein within the extracellular matrix in which loss of function mutation are known to result in a musculoskeletal phenotype in both mouse and humans. Based on these results, we conclude that rare variants in the HSPG2 gene potentially contribute to the idiopathic scoliosis phenotype in a subset of patients with idiopathic scoliosis. Further studies must be completed to confirm the effect of the HSPG2 gene on the idiopathic scoliosis phenotype. PMID:25504735

  12. Design, synthesis, and characterization of a protein sequencing reagent yielding amino acid derivatives with enhanced detectability by mass spectrometry.

    PubMed Central

    Aebersold, R.; Bures, E. J.; Namchuk, M.; Goghari, M. H.; Shushan, B.; Covey, T. C.

    1992-01-01

    We report the design, chemical synthesis, and structural and functional characterization of a novel reagent for protein sequence analysis by the Edman degradation, yielding amino acid derivatives rapidly detectable at high sensitivity by ion-evaporation mass spectrometry. We demonstrate that the reagent 3-[4'(ethylene-N,N,N-trimethylamino)phenyl]-2-isothiocyanate is chemically stable and shows coupling and cyclization/cleavage yields comparable to phenylisothiocyanate, the standard reagent in chemical sequence analysis, under conditions typically encountered in manual or automated sequence analysis. Amino acid derivatives generated with this reagent were detectable by ion-evaporation mass spectrometry at the subfemtomole sensitivity level at a pace of one sample per minute. Furthermore, derivatives were identified by their mass, thus permitting the rapid and highly sensitive determination of the molecular nature of modified amino acids. Derivatives of amino acids with acidic, basic, polar, or hydrophobic side chains were reproducibly detectable at comparable sensitivities. The polar nature of the reagent required covalent immobilization of polypeptides prior to automated sequence analysis. This reagent, used in automated sequence analysis, has the potential for overcoming the limitations in sensitivity, speed, and the ability to characterize modified amino acid residues inherent in the chemical sequencing methods that are currently used. PMID:1304351

  13. Genetic Variants Identified from Epilepsy of Unknown Etiology in Chinese Children by Targeted Exome Sequencing.

    PubMed

    Wang, Yimin; Du, Xiaonan; Bin, Rao; Yu, Shanshan; Xia, Zhezhi; Zheng, Guo; Zhong, Jianmin; Zhang, Yunjian; Jiang, Yong-Hui; Wang, Yi

    2017-01-11

    Genetic factors play a major role in the etiology of epilepsy disorders. Recent genomics studies using next generation sequencing (NGS) technique have identified a large number of genetic variants including copy number (CNV) and single nucleotide variant (SNV) in a small set of genes from individuals with epilepsy. These discoveries have contributed significantly to evaluate the etiology of epilepsy in clinic and lay the foundation to develop molecular specific treatment. However, the molecular basis for a majority of epilepsy patients remains elusive, and furthermore, most of these studies have been conducted in Caucasian children. Here we conducted a targeted exome-sequencing of 63 trios of Chinese epilepsy families using a custom-designed NGS panel that covers 412 known and candidate genes for epilepsy. We identified pathogenic and likely pathogenic variants in 15 of 63 (23.8%) families in known epilepsy genes including SCN1A, CDKL5, STXBP1, CHD2, SCN3A, SCN9A, TSC2, MBD5, POLG and EFHC1. More importantly, we identified likely pathologic variants in several novel candidate genes such as GABRE, MYH1, and CLCN6. Our results provide the evidence supporting the application of custom-designed NGS panel in clinic and indicate a conserved genetic susceptibility for epilepsy between Chinese and Caucasian children.

  14. Genetic Variants Identified from Epilepsy of Unknown Etiology in Chinese Children by Targeted Exome Sequencing

    PubMed Central

    Wang, Yimin; Du, Xiaonan; Bin, Rao; Yu, Shanshan; Xia, Zhezhi; Zheng, Guo; Zhong, Jianmin; Zhang, Yunjian; Jiang, Yong-hui; Wang, Yi

    2017-01-01

    Genetic factors play a major role in the etiology of epilepsy disorders. Recent genomics studies using next generation sequencing (NGS) technique have identified a large number of genetic variants including copy number (CNV) and single nucleotide variant (SNV) in a small set of genes from individuals with epilepsy. These discoveries have contributed significantly to evaluate the etiology of epilepsy in clinic and lay the foundation to develop molecular specific treatment. However, the molecular basis for a majority of epilepsy patients remains elusive, and furthermore, most of these studies have been conducted in Caucasian children. Here we conducted a targeted exome-sequencing of 63 trios of Chinese epilepsy families using a custom-designed NGS panel that covers 412 known and candidate genes for epilepsy. We identified pathogenic and likely pathogenic variants in 15 of 63 (23.8%) families in known epilepsy genes including SCN1A, CDKL5, STXBP1, CHD2, SCN3A, SCN9A, TSC2, MBD5, POLG and EFHC1. More importantly, we identified likely pathologic variants in several novel candidate genes such as GABRE, MYH1, and CLCN6. Our results provide the evidence supporting the application of custom-designed NGS panel in clinic and indicate a conserved genetic susceptibility for epilepsy between Chinese and Caucasian children. PMID:28074849

  15. An SF1 affinity model to identify branch point sequences in human introns

    PubMed Central

    Pastuszak, Alexander W.; Joachimiak, Marcin P.; Blanchette, Marco; Rio, Donald C.; Brenner, Steven E.; Frankel, Alan D.

    2011-01-01

    Splicing factor 1 (SF1) binds to the branch point sequence (BPS) of mammalian introns and is believed to be important for the splicing of some, but not all, introns. To help identify BPSs, particularly those that depend on SF1, we generated a BPS profile model in which SF1 binding affinity data, validated by branch point mapping, were iteratively incorporated into computational models. We searched a data set of 117 499 human introns for best matches to the SF1 Affinity Model above a threshold, and counted the number of matches at each intronic position. After subtracting a background value, we found that 87.9% of remaining high-scoring matches identified were located in a region upstream of 3′-splice sites where BPSs are typically found. Since U2AF65 recognizes the polypyrimidine tract (PPT) and forms a cooperative RNA complex with SF1, we combined the SF1 model with a PPT model computed from high affinity binding sequences for U2AF65. The combined model, together with binding site location constraints, accurately identified introns bound by SF1 that are candidates for SF1-dependent splicing. PMID:21071404

  16. X-exome sequencing of 405 unresolved families identifies seven novel intellectual disability genes.

    PubMed

    Hu, H; Haas, S A; Chelly, J; Van Esch, H; Raynaud, M; de Brouwer, A P M; Weinert, S; Froyen, G; Frints, S G M; Laumonnier, F; Zemojtel, T; Love, M I; Richard, H; Emde, A-K; Bienek, M; Jensen, C; Hambrock, M; Fischer, U; Langnick, C; Feldkamp, M; Wissink-Lindhout, W; Lebrun, N; Castelnau, L; Rucci, J; Montjean, R; Dorseuil, O; Billuart, P; Stuhlmann, T; Shaw, M; Corbett, M A; Gardner, A; Willis-Owen, S; Tan, C; Friend, K L; Belet, S; van Roozendaal, K E P; Jimenez-Pocquet, M; Moizard, M-P; Ronce, N; Sun, R; O'Keeffe, S; Chenna, R; van Bömmel, A; Göke, J; Hackett, A; Field, M; Christie, L; Boyle, J; Haan, E; Nelson, J; Turner, G; Baynam, G; Gillessen-Kaesbach, G; Müller, U; Steinberger, D; Budny, B; Badura-Stronka, M; Latos-Bieleńska, A; Ousager, L B; Wieacker, P; Rodríguez Criado, G; Bondeson, M-L; Annerén, G; Dufke, A; Cohen, M; Van Maldergem, L; Vincent-Delorme, C; Echenne, B; Simon-Bouy, B; Kleefstra, T; Willemsen, M; Fryns, J-P; Devriendt, K; Ullmann, R; Vingron, M; Wrogemann, K; Wienker, T F; Tzschach, A; van Bokhoven, H; Gecz, J; Jentsch, T J; Chen, W; Ropers, H-H; Kalscheuer, V M

    2016-01-01

    X-linked intellectual disability (XLID) is a clinically and genetically heterogeneous disorder. During the past two decades in excess of 100 X-chromosome ID genes have been identified. Yet, a large number of families mapping to the X-chromosome remained unresolved suggesting that more XLID genes or loci are yet to be identified. Here, we have investigated 405 unresolved families with XLID. We employed massively parallel sequencing of all X-chromosome exons in the index males. The majority of these males were previously tested negative for copy number variations and for mutations in a subset of known XLID genes by Sanger sequencing. In total, 745 X-chromosomal genes were screened. After stringent filtering, a total of 1297 non-recurrent exonic variants remained for prioritization. Co-segregation analysis of potential clinically relevant changes revealed that 80 families (20%) carried pathogenic variants in established XLID genes. In 19 families, we detected likely causative protein truncating and missense variants in 7 novel and validated XLID genes (CLCN4, CNKSR2, FRMPD4, KLHL15, LAS1L, RLIM and USP27X) and potentially deleterious variants in 2 novel candidate XLID genes (CDK16 and TAF1). We show that the CLCN4 and CNKSR2 variants impair protein functions as indicated by electrophysiological studies and altered differentiation of cultured primary neurons from Clcn4(-/-) mice or after mRNA knock-down. The newly identified and candidate XLID proteins belong to pathways and networks with established roles in cognitive function and intellectual disability in particular. We suggest that systematic sequencing of all X-chromosomal genes in a cohort of patients with genetic evidence for X-chromosome locus involvement may resolve up to 58% of Fragile X-negative cases.

  17. Biodegradation of 5-chloro-2-picolinic acid by novel identified co-metabolizing degrader Achromobacter sp. f1.

    PubMed

    Wu, Zhi-Guo; Wang, Fang; Ning, Li-Qun; Stedtfeld, Robert D; Yang, Zong-Zheng; Cao, Jing-Guo; Sheng, Hong-Jie; Jiang, Xin

    2017-02-02

    Several bacteria have been isolated to degrade 4-chloronitrobenzene. Degradation of 4-chloronitrobenzene by Cupriavidus sp. D4 produces 5-chloro-2-picolinic acid as a dead-end by-product, a potential pollutant. To date, no bacterium that degrades 5-chloro-2-picolinic acid has been reported. Strain f1, isolated from a soil polluted by 4-chloronitrobenzene, was able to co-metabolize 5-chloro-2-picolinic acid in the presence of ethanol or other appropriate carbon sources. The strain was identified as Achromobacter sp. based on its physiological, biochemical characteristics, and 16S rRNA gene sequence analysis. The organism completely degraded 50, 100 and 200 mg L(-1) of 5-chloro-2-picolinic acid within 48, 60, and 72 h, respectively. During the degradation of 5-chloro-2-picolinic acid, Cl(-) was released. The initial metabolic product of 5-chloro-2-picolinic acid was identified as 6-hydroxy-5-chloro-2-picolinic acid by LC-MS and NMR. Using a mixed culture of Achromobacter sp. f1 and Cupriavidus sp. D4 for degradation of 4-chloronitrobenzen, 5-chloro-2-picolinic acid did not accumulate. Results infer that Achromobacter sp. f1 can be used for complete biodegradation of 4-chloronitrobenzene in remedial applications.

  18. Utility of next-generation RNA-sequencing in identifying chimeric transcription involving human endogenous retroviruses.

    PubMed

    Sokol, Martin; Jessen, Karen Margrethe; Pedersen, Finn Skou

    2016-01-01

    Several studies have shown that human endogenous retroviruses and endogenous retrovirus-like repeats (here collectively HERVs) impose direct regulation on human genes through enhancer and promoter motifs present in their long terminal repeats (LTRs). Although chimeric transcription in which novel gene isoforms containing retroviral and human sequence are transcribed from viral promoters are commonly associated with disease, regulation by HERVs is beneficial in other settings; for example, in human testis chimeric isoforms of TP63 induced by an ERV9 LTR protect the male germ line upon DNA damage by inducing apoptosis, whereas in the human globin locus the γ- and β-globin switch during normal hematopoiesis is mediated by complex interactions of an ERV9 LTR and surrounding human sequence. The advent of deep sequencing or next-generation sequencing (NGS) has revolutionized the way researchers solve important scientific questions and develop novel hypotheses in relation to human genome regulation. We recently applied next-generation paired-end RNA-sequencing (RNA-seq) together with chromatin immunoprecipitation with sequencing (ChIP-seq) to examine ERV9 chimeric transcription in human reference cell lines from Encyclopedia of DNA Elements (ENCODE). This led to the discovery of advanced regulation mechanisms by ERV9s and other HERVs across numerous human loci including transcription of large gene-unannotated genomic regions, as well as cooperative regulation by multiple HERVs and non-LTR repeats such as Alu elements. In this article, well-established examples of human gene regulation by HERVs are reviewed followed by a description of paired-end RNA-seq, and its application in identifying chimeric transcription genome-widely. Based on integrative analyses of RNA-seq and ChIP-seq, data we then present novel examples of regulation by ERV9s of tumor suppressor genes CADM2 and SEMA3A, as well as transcription of an unannotated region. Taken together, this article highlights

  19. Molecular characterization of a new soybean-infecting member of the genus Nepovirus identified by high-throughput sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The complete nucleotide sequence of a new soybean-infecting member of the Nepovirus genus (provisionally named Soybean latent spherical virus [SLSV]) was identified by high-throughput sequencing of RNAs from soybean leaf samples from North Dakota, USA. The sequences of RNAs 1 (8,190 nt) and 2 (5,788...

  20. Nucleic acid (cDNA) and amino acid sequences of the maize endosperm protein glutelin-2.

    PubMed Central

    Prat, S; Cortadas, J; Puigdomènech, P; Palau, J

    1985-01-01

    The cDNA coding for a glutelin-2 protein from maize endosperm has been cloned and the complete amino acid sequence of the protein derived for the first time. An immature maize endosperm cDNA bank was screened for the expression of a beta-lactamase:glutelin-2 (G2) fusion polypeptide by using antibodies against the purified 28 kd G2 protein. A clone corresponding to the 28 kd G2 protein was sequenced and the primary structure of this protein was derived. Five regions can be defined in the protein sequence: an 11 residue N-terminal part, a repeated region formed by eight units of the sequence Pro-Pro-Pro-Val-His-Leu, an alternating Pro-X stretch 21 residues long, a Cys rich domain and a C-terminal part rich in Gln. The protein sequence is preceded by 19 residues which have the characteristics of the signal peptide found in secreted proteins. Unlike zeins, the main maize storage proteins, 28 kd glutelin-2 has several homologous sequences in common with other cereal storage proteins. Images PMID:3839076

  1. Exome sequencing of hepatocellular carcinomas identifies new mutational signatures and potential therapeutic targets.

    PubMed

    Schulze, Kornelius; Imbeaud, Sandrine; Letouzé, Eric; Alexandrov, Ludmil B; Calderaro, Julien; Rebouissou, Sandra; Couchy, Gabrielle; Meiller, Clément; Shinde, Jayendra; Soysouvanh, Frederic; Calatayud, Anna-Line; Pinyol, Roser; Pelletier, Laura; Balabaud, Charles; Laurent, Alexis; Blanc, Jean-Frederic; Mazzaferro, Vincenzo; Calvo, Fabien; Villanueva, Augusto; Nault, Jean-Charles; Bioulac-Sage, Paulette; Stratton, Michael R; Llovet, Josep M; Zucman-Rossi, Jessica

    2015-05-01

    Genomic analyses promise to improve tumor characterization to optimize personalized treatment for patients with hepatocellular carcinoma (HCC). Exome sequencing analysis of 243 liver tumors identified mutational signatures associated with specific risk factors, mainly combined alcohol and tobacco consumption and exposure to aflatoxin B1. We identified 161 putative driver genes associated with 11 recurrently altered pathways. Associations of mutations defined 3 groups of genes related to risk factors and centered on CTNNB1 (alcohol), TP53 (hepatitis B virus, HBV) and AXIN1. Analyses according to tumor stage progression identified TERT promoter mutation as an early event, whereas FGF3, FGF4, FGF19 or CCND1 amplification and TP53 and CDKN2A alterations appeared at more advanced stages in aggressive tumors. In 28% of the tumors, we identified genetic alterations potentially targetable by US Food and Drug Administration (FDA)-approved drugs. In conclusion, we identified risk factor-specific mutational signatures and defined the extensive landscape of altered genes and pathways in HCC, which will be useful to design clinical trials for targeted therapy.

  2. Exome sequencing of hepatocellular carcinomas identifies new mutational signatures and potential therapeutic targets

    SciTech Connect

    Schulze, Kornelius; Imbeaud, Sandrine; Letouzé, Eric; Alexandrov, Ludmil B.; Calderaro, Julien; Rebouissou, Sandra; Couchy, Gabrielle; Meiller, Clément; Shinde, Jayendra; Soysouvanh, Frederic; Calatayud, Anna-Line; Pinyol, Roser; Pelletier, Laura; Balabaud, Charles; Laurent, Alexis; Blanc, Jean-Frederic; Mazzaferro, Vincenzo; Calvo, Fabien; Villanueva, Augusto; Nault, Jean-Charles; Bioulac-Sage, Paulette; Stratton, Michael R.; Llovet, Josep M.; Zucman-Rossi, Jessica

    2015-03-30

    Our genomic analyses promise to improve tumor characterization to optimize personalized treatment for patients with hepatocellular carcinoma (HCC). Exome sequencing analysis of 243 liver tumors identified mutational signatures associated with specific risk factors, mainly combined alcohol and tobacco consumption and exposure to aflatoxin B1. We identified 161 putative driver genes associated with 11 recurrently altered pathways. Associations of mutations defined 3 groups of genes related to risk factors and centered on CTNNB1 (alcohol), TP53 (hepatitis B virus, HBV) and AXIN1. These analyses according to tumor stage progression identified TERT promoter mutation as an early event, whereasFGF3, FGF4, FGF19 or CCND1 amplification and TP53 and CDKN2A alterations appeared at more advanced stages in aggressive tumors. In 28% of the tumors, we identified genetic alterations potentially targetable by US Food and Drug Administration (FDA)–approved drugs. Finally, we identified risk factor–specific mutational signatures and defined the extensive landscape of altered genes and pathways in HCC, which will be useful to design clinical trials for targeted therapy.

  3. Exome sequencing of hepatocellular carcinomas identifies new mutational signatures and potential therapeutic targets

    DOE PAGES

    Schulze, Kornelius; Imbeaud, Sandrine; Letouzé, Eric; ...

    2015-03-30

    Our genomic analyses promise to improve tumor characterization to optimize personalized treatment for patients with hepatocellular carcinoma (HCC). Exome sequencing analysis of 243 liver tumors identified mutational signatures associated with specific risk factors, mainly combined alcohol and tobacco consumption and exposure to aflatoxin B1. We identified 161 putative driver genes associated with 11 recurrently altered pathways. Associations of mutations defined 3 groups of genes related to risk factors and centered on CTNNB1 (alcohol), TP53 (hepatitis B virus, HBV) and AXIN1. These analyses according to tumor stage progression identified TERT promoter mutation as an early event, whereasFGF3, FGF4, FGF19 or CCND1more » amplification and TP53 and CDKN2A alterations appeared at more advanced stages in aggressive tumors. In 28% of the tumors, we identified genetic alterations potentially targetable by US Food and Drug Administration (FDA)–approved drugs. Finally, we identified risk factor–specific mutational signatures and defined the extensive landscape of altered genes and pathways in HCC, which will be useful to design clinical trials for targeted therapy.« less

  4. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2010-07-01 2010-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide...

  5. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2012-07-01 2012-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide...

  6. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2014-07-01 2014-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide...

  7. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2011-07-01 2011-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide...

  8. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2013-07-01 2013-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide...

  9. An amino acid sequence motif sufficient for subnuclear localization of an arginine/serine-rich splicing factor.

    PubMed

    Hedley, M L; Amrein, H; Maniatis, T

    1995-12-05

    We have identified an amino acid sequence in the Drosophila Transformer (Tra) protein that is capable of directing a heterologous protein to nuclear speckles, regions of the nucleus previously shown to contain high concentrations of spliceosomal small nuclear RNAs and splicing factors. This sequence contains a nucleoplasmin-like bipartite nuclear localization signal (NLS) and a repeating arginine/serine (RS) dipeptide sequence adjacent to a short stretch of basic amino acids. Sequence comparisons from a number of other splicing factors that colocalize to nuclear speckles reveal the presence of one or more copies of this motif. We propose a two-step subnuclear localization mechanism for splicing factors. The first step is transport across the nuclear envelope via the nucleoplasmin-like NLS, while the second step is association with components in the speckled domain via the RS dipeptide sequence.

  10. Fast and Sequence-Specific Palladium-Mediated Cross-Coupling Reaction Identified from Phage Display

    PubMed Central

    2015-01-01

    Fast and specific bioorthogonal reactions are highly desirable because they provide efficient tracking of biomolecules that are present in low abundance and/or involved in fast dynamic process in living systems. Toward this end, classic strategy involves the optimization of substrate structures and reaction conditions in test tubes, testing their compatibility with biological systems, devising synthetic biology schemes to introduce the modified substrates into living cells or organisms, and finally validating the superior kinetics for enhanced capacity in tracking biomolecules in vivo—a lengthy process often mired by unexpected results. Here, we report a streamlined approach in which the “microenvironment” of a bioorthogonal chemical reporter is exploited directly in biological systems via phage-assisted interrogation of reactivity (PAIR) to optimize not only reaction kinetics but also specificity. Using the PAIR strategy, we identified a short alkyne-containing peptide sequence showing fast kinetics (k2 = 13 000 ± 2000 M–1 s–1) in a palladium-mediated cross-coupling reaction. Site-directed mutagenesis studies suggested that the residues surrounding the alkyne moiety facilitate the assembly of a key palladium–alkyne intermediate along the reaction pathway. When this peptide sequence was inserted into the extracellular domain of epidermal growth factor receptor (EGFR), this reactive sequence directed the specific labeling of EGFR in live mammalian cells. PMID:25025771

  11. Transcriptome sequencing of HER2-positive breast cancer stem cells identifies potential prognostic marker.

    PubMed

    Lei, Bo; Zhang, Xian-Yu; Zhou, Jia-Peng; Mu, Guan-Nan; Li, Yi-Wen; Zhang, You-Xue; Pang, Da

    2016-11-01

    In cancer stem cell theory, breast cancer stem cells (BCSCs) are postulated to be the root cause of recurrence and metastasis in breast cancer. Discovery of new biomarkers and development of BCSC-targeted therapy are practical issues that urgently need to be addressed in the clinic. However, few breast cancer stem cell targets are known. Given that there are few BCSCs, performing transcriptome sequencing on them thus far has not been possible. With the emergence of single-cell sequencing technology, we have now undertaken such a study. We prepared single-cell suspensions, which were sorted using flow cytometry from breast tumor tissue and adjacent normal breast tissue from two HER2-positive patients. We obtained BCSCs, breast cancer cells, mammary cells, and CD44(+) mammary cells. Transcriptome sequencing was then performed on these four cell types. Using bioinformatics, we identified 404 differentially expressed BCSC genes from the HER2-positive tumors and preliminary explored transcriptome characteristics of BCSCs. Finally, by querying a public database, we found that CA12 was a novel prognostic biomarker in HER2-positive breast cancer, which also had prognostic value in all breast cancer types. In conclusion, our results suggest that CA12 may be associated with BCSCs, especially HER2-positive BCSCs, and is a potential novel therapeutic target and biomarker.

  12. Multilocus sequence typing identifies evidence for recombination and two distinct lineages of Corynebacterium diphtheriae.

    PubMed

    Bolt, Frances; Cassiday, Pamela; Tondella, Maria Lucia; Dezoysa, Aruni; Efstratiou, Androulla; Sing, Andreas; Zasada, Aleksandra; Bernard, Kathryn; Guiso, Nicole; Badell, Edgar; Rosso, Marie-Laure; Baldwin, Adam; Dowson, Christopher

    2010-11-01

    We describe the development of a multilocus sequence typing (MLST) scheme for Corynebacterium diphtheriae, the causative agent of the potentially fatal upper respiratory disease diphtheria. Global changes in diphtheria epidemiology are highlighted by the recent epidemic in the former Soviet Union (FSU) and also by the emergence of nontoxigenic strains causing atypical disease. Although numerous techniques have been developed to characterize C. diphtheriae, their use is hindered by limited portability and, in some instances, poor reproducibility. One hundred fifty isolates from 18 countries and encompassing a period of 50 years were analyzed by multilocus sequence typing (MLST). Strain discrimination was in accordance with previous ribotyping data, and clonal complexes associated with disease outbreaks were clearly identified by MLST. The data produced are portable, reproducible, and unambiguous. The MLST scheme described provides a valuable tool for monitoring and characterizing endemic and epidemic C. diphtheriae strains. Furthermore, multilocus sequence analysis of the nucleotide data reveals two distinct lineages within the population of C. diphtheriae examined, one of which is composed exclusively of biotype belfanti isolates and the other of multiple biotypes.

  13. Single-cell RNA sequencing identifies distinct mouse medial ganglionic eminence cell types

    PubMed Central

    Chen, Ying-Jiun J.; Friedman, Brad A.; Ha, Connie; Durinck, Steffen; Liu, Jinfeng; Rubenstein, John L.; Seshagiri, Somasekar; Modrusan, Zora

    2017-01-01

    Many subtypes of cortical interneurons (CINs) are found in adult mouse cortices, but the mechanism generating their diversity remains elusive. We performed single-cell RNA sequencing on the mouse embryonic medial ganglionic eminence (MGE), the major birthplace for CINs, and on MGE-like cells differentiated from embryonic stem cells. Two distinct cell types were identified as proliferating neural progenitors and immature neurons, both of which comprised sub-populations. Although lineage development of MGE progenitors was reconstructed and immature neurons were characterized as GABAergic, cells that might correspond to precursors of different CINs were not identified. A few non-neuronal cell types were detected, including microglia. In vitro MGE-like cells resembled bona fide MGE cells but expressed lower levels of Foxg1 and Epha4. Together, our data provide detailed understanding of the embryonic MGE developmental program and suggest how CINs are specified. PMID:28361918

  14. Whole-exome sequencing identified mutational profiles of high-grade colon adenomas

    PubMed Central

    Kim, Tae-Min; Rhee, Je-Keun; Park, Hyeon-Chun; Sung, Min Kim; Kim, Sung Soo; Hyeok, Chang An; Lee Hyung, Sug; Chung, Yeun-Jun

    2017-01-01

    Although gene-to-gene analyses identified genetic alterations such as APC, KRAS and TP53 mutations in colon adenomas, it is largely unknown whether there are any others in them. Mutational profiling of high-grade colon adenoma (HGCA) that just precedes colon carcinoma might identify not only novel adenoma-specific genes but also critical genes for its progression to carcinoma. For this, we performed whole-exome sequencing (WES) of 12 HGCAs and identified 11 non-hypermutated and one hypermutated (POLE-mutated) cases. We identified 22 genes including APC, KRAS, TP53, GNAS, NRAS, SMAD4, ARID2, and PIK3CA with non-silent mutations in the cancer Census Genes. Bi-allelic and mono-allelic APC alterations were found in nine and one HGCAs, respectively, while the other two harbored wild-type APC. Five HGCAs harbored either mono-allelic (four HGCAs) or bi-allelic (one HGCA) SMAD4 mutation or 18q loss that had been known as early carcinoma-specific changes. We identified MTOR, ACVR1B, GNAQ, ATM, CNOT1, EP300, ARID2, RET and MAP2K4 mutations for the first time in colon adenomas. Our WES data is largely matched with the earlier ‘adenoma-carcinoma model’ (APC, KRAS, NRAS and GNAS mutations), but there are newly identified SMAD4, MTOR, ACVR1B, GNAQ, ATM, CNOT1, EP300, ARID2, RET and MAP2K4 mutations in this study. Our findings provide resource for understanding colon premalignant lesions and for identifying genomic clues for differential diagnosis and therapy options for colon adenomas and carcinomas. PMID:28179590

  15. The amino acid sequence of the aspartate aminotransferase from baker's yeast (Saccharomyces cerevisiae).

    PubMed Central

    Cronin, V B; Maras, B; Barra, D; Doonan, S

    1991-01-01

    1. The single (cytosolic) aspartate aminotransferase was purified in high yield from baker's yeast (Saccharomyces cerevisiae). 2. Amino-acid-sequence analysis was carried out by digestion of the protein with trypsin and with CNBr; some of the peptides produced were further subdigested with Staphylococcus aureus V8 proteinase or with pepsin. Peptides were sequenced by the dansyl-Edman method and/or by automated gas-phase methods. The amino acid sequence obtained was complete except for a probable gap of two residues as indicated by comparison with the structures of counterpart proteins in other species. 3. The N-terminus of the enzyme is blocked. Fast-atom-bombardment m.s. was used to identify the blocking group as an acetyl one. 4. Alignment of the sequence of the enzyme with those of vertebrate cytosolic and mitochondrial aspartate aminotransferases and with the enzyme from Escherichia coli showed that about 25% of residues are conserved between these distantly related forms. 5. Experimental details and confirmatory data for the results presented here are given in a Supplementary Publication (SUP 50164, 25 pages) that has been deposited at the British Library Document Supply Centre, Boston Spa. Wetherby, West Yorkshire LS23 7 BQ, U.K., from whom copies can be obtained on the terms indicated in Biochem. J. (1991) 273, 5. PMID:1859361

  16. Analysis of amino acid sequence variations and immunoglobulin E-binding epitopes of German cockroach tropomyosin.

    PubMed

    Jeong, Kyoung Yong; Lee, Jongweon; Lee, In-Yong; Ree, Han-Il; Hong, Chein-Soo; Yong, Tai-Soon

    2004-09-01

    The allergenicities of tropomyosins from different organisms have been reported to vary. The cDNA encoding German cockroach tropomyosin (Bla g 7) was isolated, expressed, and characterized previously. In the present study, the amino acid sequence variations in German cockroach tropomyosin were analyzed in order to investigate its influence on allergenicity. We also undertook the identification of immunodominant peptides containing immunoglobulin E (IgE) epitopes which may facilitate the development of diagnostic and immunotherapeutic strategies based on the recombinant proteins. Two-dimensional gel electrophoresis and immunoblot analysis with mouse anti-recombinant German cockroach tropomyosin serum was performed to investigate the isoforms at the protein level. Reverse transcriptase PCR (RT-PCR) was applied to examine the sequence diversity. Eleven different variants of the deduced amino acid sequences were identified by RT-PCR. German cockroach tropomyosin has only minor sequence variations that did not seem to affect its allergenicity significantly. These results support the molecular basis underlying the cross-reactivities of arthropod tropomyosins. Recombinant fragments were also generated by PCR, and IgE-binding epitopes were assessed by enzyme-linked immunosorbent assay. Sera from seven patients revealed heterogeneous IgE-binding responses. This study demonstrates multiple IgE-binding epitope regions in a single molecule, suggesting that full-length tropomyosin should be used for the development of diagnostic and therapeutic reagents.

  17. High-throughput single-cell sequencing identifies photoheterotrophs and chemoautotrophs in freshwater bacterioplankton

    PubMed Central

    Martinez-Garcia, Manuel; Swan, Brandon K; Poulton, Nicole J; Gomez, Monica Lluesma; Masland, Dashiell; Sieracki, Michael E; Stepanauskas, Ramunas

    2012-01-01

    Recent discoveries suggest that photoheterotrophs (rhodopsin-containing bacteria (RBs) and aerobic anoxygenic phototrophs (AAPs)) and chemoautotrophs may be significant for marine and freshwater ecosystem productivity. However, their abundance and taxonomic identities remain largely unknown. We used a combination of single-cell and metagenomic DNA sequencing to study the predominant photoheterotrophs and chemoautotrophs inhabiting the euphotic zone of temperate, physicochemically diverse freshwater lakes. Multi-locus sequencing of 712 single amplified genomes, generated by fluorescence-activated cell sorting and whole genome multiple displacement amplification, showed that most of the cosmopolitan freshwater clusters contain photoheterotrophs. These comprised at least 10–23% of bacterioplankton, and RBs were the dominant fraction. Our data demonstrate that Actinobacteria, including clusters acI, Luna and acSTL, are the predominant freshwater RBs. We significantly broaden the known taxonomic range of freshwater RBs, to include Alpha-, Beta-, Gamma- and Deltaproteobacteria, Verrucomicrobia and Sphingobacteria. By sequencing single cells, we found evidence for inter-phyla horizontal gene transfer and recombination of rhodopsin genes and identified specific taxonomic groups involved in these evolutionary processes. Our data suggest that members of the ubiquitous betaproteobacteria Polynucleobacter spp. are the dominant AAPs in temperate freshwater lakes. Furthermore, the RuBisCO (ribulose 1,5-bisphosphate carboxylase/oxygenase) gene was found in several single cells of Betaproteobacteria, Bacteroidetes and Gammaproteobacteria, suggesting that chemoautotrophs may be more prevalent among aerobic bacterioplankton than previously thought. This study demonstrates the power of single-cell DNA sequencing addressing previously unresolved questions about the metabolic potential and evolutionary histories of uncultured microorganisms, which dominate most natural environments

  18. High-throughput single-cell sequencing identifies photoheterotrophs and chemoautotrophs in freshwater bacterioplankton.

    PubMed

    Martinez-Garcia, Manuel; Swan, Brandon K; Poulton, Nicole J; Gomez, Monica Lluesma; Masland, Dashiell; Sieracki, Michael E; Stepanauskas, Ramunas

    2012-01-01

    Recent discoveries suggest that photoheterotrophs (rhodopsin-containing bacteria (RBs) and aerobic anoxygenic phototrophs (AAPs)) and chemoautotrophs may be significant for marine and freshwater ecosystem productivity. However, their abundance and taxonomic identities remain largely unknown. We used a combination of single-cell and metagenomic DNA sequencing to study the predominant photoheterotrophs and chemoautotrophs inhabiting the euphotic zone of temperate, physicochemically diverse freshwater lakes. Multi-locus sequencing of 712 single amplified genomes, generated by fluorescence-activated cell sorting and whole genome multiple displacement amplification, showed that most of the cosmopolitan freshwater clusters contain photoheterotrophs. These comprised at least 10-23% of bacterioplankton, and RBs were the dominant fraction. Our data demonstrate that Actinobacteria, including clusters acI, Luna and acSTL, are the predominant freshwater RBs. We significantly broaden the known taxonomic range of freshwater RBs, to include Alpha-, Beta-, Gamma- and Deltaproteobacteria, Verrucomicrobia and Sphingobacteria. By sequencing single cells, we found evidence for inter-phyla horizontal gene transfer and recombination of rhodopsin genes and identified specific taxonomic groups involved in these evolutionary processes. Our data suggest that members of the ubiquitous betaproteobacteria Polynucleobacter spp. are the dominant AAPs in temperate freshwater lakes. Furthermore, the RuBisCO (ribulose 1,5-bisphosphate carboxylase/oxygenase) gene was found in several single cells of Betaproteobacteria, Bacteroidetes and Gammaproteobacteria, suggesting that chemoautotrophs may be more prevalent among aerobic bacterioplankton than previously thought. This study demonstrates the power of single-cell DNA sequencing addressing previously unresolved questions about the metabolic potential and evolutionary histories of uncultured microorganisms, which dominate most natural environments.

  19. Challenges in identifying cancer genes by analysis of exome sequencing data

    PubMed Central

    Hofree, Matan; Carter, Hannah; Kreisberg, Jason F.; Bandyopadhyay, Sourav; Mischel, Paul S.; Friend, Stephen; Ideker, Trey

    2016-01-01

    Massively parallel sequencing has permitted an unprecedented examination of the cancer exome, leading to predictions that all genes important to cancer will soon be identified by genetic analysis of tumours. To examine this potential, here we evaluate the ability of state-of-the-art sequence analysis methods to specifically recover known cancer genes. While some cancer genes are identified by analysis of recurrence, spatial clustering or predicted impact of somatic mutations, many remain undetected due to lack of power to discriminate driver mutations from the background mutational load (13–60% recall of cancer genes impacted by somatic single-nucleotide variants, depending on the method). Cancer genes not detected by mutation recurrence also tend to be missed by all types of exome analysis. Nonetheless, these genes are implicated by other experiments such as functional genetic screens and expression profiling. These challenges are only partially addressed by increasing sample size and will likely hold even as greater numbers of tumours are analysed. PMID:27417679

  20. Identifying Genetic Signatures of Natural Selection Using Pooled Population Sequencing in Picea abies

    PubMed Central

    Chen, Jun; Källman, Thomas; Ma, Xiao-Fei; Zaina, Giusi; Morgante, Michele; Lascoux, Martin

    2016-01-01

    The joint inference of selection and past demography remain a costly and demanding task. We used next generation sequencing of two pools of 48 Norway spruce mother trees, one corresponding to the Fennoscandian domain, and the other to the Alpine domain, to assess nucleotide polymorphism at 88 nuclear genes. These genes are candidate genes for phenological traits, and most belong to the photoperiod pathway. Estimates of population genetic summary statistics from the pooled data are similar to previous estimates, suggesting that pooled sequencing is reliable. The nonsynonymous SNPs tended to have both lower frequency differences and lower FST values between the two domains than silent ones. These results suggest the presence of purifying selection. The divergence between the two domains based on synonymous changes was around 5 million yr, a time similar to a recent phylogenetic estimate of 6 million yr, but much larger than earlier estimates based on isozymes. Two approaches, one of them novel and that considers both FST and difference in allele frequencies between the two domains, were used to identify SNPs potentially under diversifying selection. SNPs from around 20 genes were detected, including genes previously identified as main target for selection, such as PaPRR3 and PaGI. PMID:27172202

  1. Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia

    PubMed Central

    Puente, Xose S.; Pinyol, Magda; Quesada, Víctor; Conde, Laura; Ordóñez, Gonzalo R.; Villamor, Neus; Escaramis, Georgia; Jares, Pedro; Beà, Sílvia; González-Díaz, Marcos; Bassaganyas, Laia; Baumann, Tycho; Juan, Manel; López-Guerra, Mónica; Colomer, Dolors; Tubío, José M. C.; López, Cristina; Navarro, Alba; Tornador, Cristian; Aymerich, Marta; Rozman, María; Hernández, Jesús M.; Puente, Diana A.; Freije, José M. P.; Velasco, Gloria; Gutiérrez-Fernández, Ana; Costa, Dolors; Carrió, Anna; Guijarro, Sara; Enjuanes, Anna; Hernández, Lluís; Yagüe, Jordi; Nicolás, Pilar; Romeo-Casabona, Carlos M.; Himmelbauer, Heinz; Castillo, Ester; Dohm, Juliane C.; de Sanjosé, Silvia; Piris, Miguel A.; de Alava, Enrique; Miguel, Jesús San; Royo, Romina; Gelpí, Josep L.; Torrents, David; Orozco, Modesto; Pisano, David G.; Valencia, Alfonso; Guigó, Roderic; Bayés, Mónica; Heath, Simon; Gut, Marta; Klatt, Peter; Marshall, John; Raine, Keiran; Stebbings, Lucy A.; Futreal, P. Andrew; Stratton, Michael R.; Campbell, Peter J.; Gut, Ivo; López-Guillermo, Armando; Estivill, Xavier; Montserrat, Emili; López-Otín, Carlos; Campo, Elías

    2012-01-01

    Chronic lymphocytic leukaemia (CLL), the most frequent leukaemia in adults in Western countries, is a heterogeneous disease with variable clinical presentation and evolution1,2. Two major molecular subtypes can be distinguished, characterized respectively by a high or low number of somatic hypermutations in the variable region of immunoglobulin genes3,4. The molecular changes leading to the pathogenesis of the disease are still poorly understood. Here we performed whole-genome sequencing of four cases of CLL and identified 46 somatic mutations that potentially affect gene function. Further analysis of these mutations in 363 patients with CLL identified four genes that are recurrently mutated: notch 1 (NOTCH1), exportin 1 (XPO1), myeloid differentiation primary response gene 88 (MYD88) and kelch-like 6 (KLHL6). Mutations in MYD88 and KLHL6 are predominant in cases of CLL with mutated immunoglobulin genes, whereas NOTCH1 and XPO1 mutations are mainly detected in patients with unmutated immunoglobulins. The patterns of somatic mutation, supported by functional and clinical analyses, strongly indicate that the recurrent NOTCH1, MYD88 and XPO1 mutations are oncogenic changes that contribute to the clinical evolution of the disease. To our knowledge, this is the first comprehensive analysis of CLL combining whole-genome sequencing with clinical characteristics and clinical outcomes. It highlights the usefulness of this approach for the identification of clinically relevant mutations in cancer. PMID:21642962

  2. Whole-genome sequencing identifies a recurrent functional synonymous mutation in melanoma.

    PubMed

    Gartner, Jared J; Parker, Stephen C J; Prickett, Todd D; Dutton-Regester, Ken; Stitzel, Michael L; Lin, Jimmy C; Davis, Sean; Simhadri, Vijaya L; Jha, Sujata; Katagiri, Nobuko; Gotea, Valer; Teer, Jamie K; Wei, Xiaomu; Morken, Mario A; Bhanot, Umesh K; Chen, Guo; Elnitski, Laura L; Davies, Michael A; Gershenwald, Jeffrey E; Carter, Hannah; Karchin, Rachel; Robinson, William; Robinson, Steven; Rosenberg, Steven A; Collins, Francis S; Parmigiani, Giovanni; Komar, Anton A; Kimchi-Sarfaty, Chava; Hayward, Nicholas K; Margulies, Elliott H; Samuels, Yardena

    2013-08-13

    Synonymous mutations, which do not alter the protein sequence, have been shown to affect protein function [Sauna ZE, Kimchi-Sarfaty C (2011) Nat Rev Genet 12(10):683-691]. However, synonymous mutations are rarely investigated in the cancer genomics field. We used whole-genome and -exome sequencing to identify somatic mutations in 29 melanoma samples. Validation of one synonymous somatic mutation in BCL2L12 in 285 samples identified 12 cases that harbored the recurrent F17F mutation. This mutation led to increased BCL2L12 mRNA and protein levels because of differential targeting of WT and mutant BCL2L12 by hsa-miR-671-5p. Protein made from mutant BCL2L12 transcript bound p53, inhibited UV-induced apoptosis more efficiently than WT BCL2L12, and reduced endogenous p53 target gene transcription. This report shows selection of a recurrent somatic synonymous mutation in cancer. Our data indicate that silent alterations have a role to play in human cancer, emphasizing the importance of their investigation in future cancer genome studies.

  3. Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia.

    PubMed

    Puente, Xose S; Pinyol, Magda; Quesada, Víctor; Conde, Laura; Ordóñez, Gonzalo R; Villamor, Neus; Escaramis, Georgia; Jares, Pedro; Beà, Sílvia; González-Díaz, Marcos; Bassaganyas, Laia; Baumann, Tycho; Juan, Manel; López-Guerra, Mónica; Colomer, Dolors; Tubío, José M C; López, Cristina; Navarro, Alba; Tornador, Cristian; Aymerich, Marta; Rozman, María; Hernández, Jesús M; Puente, Diana A; Freije, José M P; Velasco, Gloria; Gutiérrez-Fernández, Ana; Costa, Dolors; Carrió, Anna; Guijarro, Sara; Enjuanes, Anna; Hernández, Lluís; Yagüe, Jordi; Nicolás, Pilar; Romeo-Casabona, Carlos M; Himmelbauer, Heinz; Castillo, Ester; Dohm, Juliane C; de Sanjosé, Silvia; Piris, Miguel A; de Alava, Enrique; San Miguel, Jesús; Royo, Romina; Gelpí, Josep L; Torrents, David; Orozco, Modesto; Pisano, David G; Valencia, Alfonso; Guigó, Roderic; Bayés, Mónica; Heath, Simon; Gut, Marta; Klatt, Peter; Marshall, John; Raine, Keiran; Stebbings, Lucy A; Futreal, P Andrew; Stratton, Michael R; Campbell, Peter J; Gut, Ivo; López-Guillermo, Armando; Estivill, Xavier; Montserrat, Emili; López-Otín, Carlos; Campo, Elías

    2011-06-05

    Chronic lymphocytic leukaemia (CLL), the most frequent leukaemia in adults in Western countries, is a heterogeneous disease with variable clinical presentation and evolution. Two major molecular subtypes can be distinguished, characterized respectively by a high or low number of somatic hypermutations in the variable region of immunoglobulin genes. The molecular changes leading to the pathogenesis of the disease are still poorly understood. Here we performed whole-genome sequencing of four cases of CLL and identified 46 somatic mutations that potentially affect gene function. Further analysis of these mutations in 363 patients with CLL identified four genes that are recurrently mutated: notch 1 (NOTCH1), exportin 1 (XPO1), myeloid differentiation primary response gene 88 (MYD88) and kelch-like 6 (KLHL6). Mutations in MYD88 and KLHL6 are predominant in cases of CLL with mutated immunoglobulin genes, whereas NOTCH1 and XPO1 mutations are mainly detected in patients with unmutated immunoglobulins. The patterns of somatic mutation, supported by functional and clinical analyses, strongly indicate that the recurrent NOTCH1, MYD88 and XPO1 mutations are oncogenic changes that contribute to the clinical evolution of the disease. To our knowledge, this is the first comprehensive analysis of CLL combining whole-genome sequencing with clinical characteristics and clinical outcomes. It highlights the usefulness of this approach for the identification of clinically relevant mutations in cancer.

  4. Next-generation sequencing identifies major DNA methylation changes during progression of Ph+ chronic myeloid leukemia

    PubMed Central

    Heller, G; Topakian, T; Altenberger, C; Cerny-Reiterer, S; Herndlhofer, S; Ziegler, B; Datlinger, P; Byrgazov, K; Bock, C; Mannhalter, C; Hörmann, G; Sperr, W R; Lion, T; Zielinski, C C; Valent, P; Zöchbauer-Müller, S

    2016-01-01

    Little is known about the impact of DNA methylation on the evolution/progression of Ph+ chronic myeloid leukemia (CML). We investigated the methylome of CML patients in chronic phase (CP-CML), accelerated phase (AP-CML) and blast crisis (BC-CML) as well as in controls by reduced representation bisulfite sequencing. Although only ~600 differentially methylated CpG sites were identified in samples obtained from CP-CML patients compared with controls, ~6500 differentially methylated CpG sites were found in samples from BC-CML patients. In the majority of affected CpG sites, methylation was increased. In CP-CML patients who progressed to AP-CML/BC-CML, we identified up to 897 genes that were methylated at the time of progression but not at the time of diagnosis. Using RNA-sequencing, we observed downregulated expression of many of these genes in BC-CML compared with CP-CML samples. Several of them are well-known tumor-suppressor genes or regulators of cell proliferation, and gene re-expression was observed by the use of epigenetic active drugs. Together, our results demonstrate that CpG site methylation clearly increases during CML progression and that it may provide a useful basis for revealing new targets of therapy in advanced CML. PMID:27211271

  5. Human liver apolipoprotein B-100 cDNA: complete nucleic acid and derived amino acid sequence.

    PubMed Central

    Law, S W; Grant, S M; Higuchi, K; Hospattankar, A; Lackner, K; Lee, N; Brewer, H B

    1986-01-01

    Human apolipoprotein B-100 (apoB-100), the ligand on low density lipoproteins that interacts with the low density lipoprotein receptor and initiates receptor-mediated endocytosis and low density lipoprotein catabolism, has been cloned, and the complete nucleic acid and derived amino acid sequences have been determined. ApoB-100 cDNAs were isolated from normal human liver cDNA libraries utilizing immunoscreening as well as filter hybridization with radiolabeled apoB-100 oligodeoxynucleotides. The apoB-100 mRNA is 14.1 kilobases long encoding a mature apoB-100 protein of 4536 amino acids with a calculated amino acid molecular weight of 512,723. ApoB-100 contains 20 potential glycosylation sites, and 12 of a total of 25 cysteine residues are located in the amino-terminal region of the apolipoprotein providing a potential globular structure of the amino terminus of the protein. ApoB-100 contains relatively few regions of amphipathic helices, but compared to other human apolipoproteins it is enriched in beta-structure. The delineation of the entire human apoB-100 sequence will now permit a detailed analysis of the conformation of the protein, the low density lipoprotein receptor binding domain(s), and the structural relationship between apoB-100 and apoB-48 and will provide the basis for the study of genetic defects in apoB-100 in patients with dyslipoproteinemias. PMID:3464946

  6. Computer selection of oligonucleotide probes from amino acid sequences for use in gene library screening.

    PubMed

    Yang, J H; Ye, J H; Wallace, D C

    1984-01-11

    We present a computer program, FINPROBE, which utilizes known amino acid sequence data to deduce minimum redundancy oligonucleotide probes for use in screening cDNA or genomic libraries or in primer extension. The user enters the amino acid sequence of interest, the desired probe length, the number of probes sought, and the constraints on oligonucleotide synthesis. The computer generates a table of possible probes listed in increasing order of redundancy and provides the location of each probe in the protein and mRNA coding sequence. Activation of a next function provides the amino acid and mRNA sequences of each probe of interest as well as the complementary sequence and the minimum dissociation temperature of the probe. A final routine prints out the amino acid sequence of the protein in parallel with the mRNA sequence listing all possible codons for each amino acid.

  7. Single-cell RNA sequencing identifies diverse roles of epithelial cells in idiopathic pulmonary fibrosis

    PubMed Central

    Mizuno, Takako; Sridharan, Anusha; Du, Yina; Guo, Minzhe; Wikenheiser-Brokamp, Kathryn A.; Perl, Anne-Karina T.; Funari, Vincent A.; Gokey, Jason J.; Stripp, Barry R.; Whitsett, Jeffrey A.

    2016-01-01

    Idiopathic pulmonary fibrosis (IPF) is a lethal interstitial lung disease characterized by airway remodeling, inflammation, alveolar destruction, and fibrosis. We utilized single-cell RNA sequencing (scRNA-seq) to identify epithelial cell types and associated biological processes involved in the pathogenesis of IPF. Transcriptomic analysis of normal human lung epithelial cells defined gene expression patterns associated with highly differentiated alveolar type 2 (AT2) cells, indicated by enrichment of RNAs critical for surfactant homeostasis. In contrast, scRNA-seq of IPF cells identified 3 distinct subsets of epithelial cell types with characteristics of conducting airway basal and goblet cells and an additional atypical transitional cell that contributes to pathological processes in IPF. Individual IPF cells frequently coexpressed alveolar type 1 (AT1), AT2, and conducting airway selective markers, demonstrating “indeterminate” states of differentiation not seen in normal lung development. Pathway analysis predicted aberrant activation of canonical signaling via TGF-β, HIPPO/YAP, P53, WNT, and AKT/PI3K. Immunofluorescence confocal microscopy identified the disruption of alveolar structure and loss of the normal proximal-peripheral differentiation of pulmonary epithelial cells. scRNA-seq analyses identified loss of normal epithelial cell identities and unique contributions of epithelial cells to the pathogenesis of IPF. The present study provides a rich data source to further explore lung health and disease. PMID:27942595

  8. Triangulation of the human, chimpanzee, and Neanderthal genome sequences identifies potentially compensated mutations.

    PubMed

    Zhang, Guojie; Pei, Zhang; Krawczak, Michael; Ball, Edward V; Mort, Matthew; Kehrer-Sawatzki, Hildegard; Cooper, David N

    2010-12-01

    Triangulation of the human, chimpanzee, and Neanderthal genome sequences with respect to 44,348 disease-causing or disease-associated missense mutations and 1,712 putative regulatory mutations listed in the Human Gene Mutation Database was employed to identify genetic variants that are apparently pathogenic in humans but which may represent a "compensated" wild-type state in at least one of the other two species. Of 122 such "potentially compensated mutations" (PCMs) identified, 88 were deemed "ancestral" on the basis that the reported wild-type Neanderthal nucleotide was identical to that of the chimpanzee. Another 33 PCMs were deemed to be "derived" in that the Neanderthal wild-type nucleotide matched the human but not the chimpanzee wild-type. For the remaining PCM, all three wild-type states were found to differ. Whereas a derived PCM would require compensation only in the chimpanzee, ancestral PCMs are useful as a means to identify sites of possible adaptive differences between modern humans on the one hand, and Neanderthals and chimpanzees on the other. Ancestral PCMs considered to be disease-causing in humans were identified in two Neanderthal genes (DUOX2, MAMLD1). Because the underlying mutations are known to give rise to recessive conditions in human, it is possible that they may also have been of pathological significance in Neanderthals. Hum Mutat 31:1-8, 2010. © 2010 Wiley-Liss, Inc.

  9. Integrative analyses of transcriptome sequencing identify novel functional lncRNAs in esophageal squamous cell carcinoma

    PubMed Central

    Li, C-Q; Huang, G-W; Wu, Z-Y; Xu, Y-J; Li, X-C; Xue, Y-J; Zhu, Y; Zhao, J-M; Li, M; Zhang, J; Wu, J-Y; Lei, F; Wang, Q-Y; Li, S; Zheng, C-P; Ai, B; Tang, Z-D; Feng, C-C; Liao, L-D; Wang, S-H; Shen, J-H; Liu, Y-J; Bai, X-F; He, J-Z; Cao, H-H; Wu, B-L; Wang, M-R; Lin, D-C; Koeffler, H P; Wang, L-D; Li, X; Li, E-M; Xu, L-Y

    2017-01-01

    Long non-coding RNAs (lncRNAs) have a critical role in cancer initiation and progression, and thus may mediate oncogenic or tumor suppressing effects, as well as be a new class of cancer therapeutic targets. We performed high-throughput sequencing of RNA (RNA-seq) to investigate the expression level of lncRNAs and protein-coding genes in 30 esophageal samples, comprised of 15 esophageal squamous cell carcinoma (ESCC) samples and their 15 paired non-tumor tissues. We further developed an integrative bioinformatics method, denoted URW-LPE, to identify key functional lncRNAs that regulate expression of downstream protein-coding genes in ESCC. A number of known onco-lncRNA and many putative novel ones were effectively identified by URW-LPE. Importantly, we identified lncRNA625 as a novel regulator of ESCC cell proliferation, invasion and migration. ESCC patients with high lncRNA625 expression had significantly shorter survival time than those with low expression. LncRNA625 also showed specific prognostic value for patients with metastatic ESCC. Finally, we identified E1A-binding protein p300 (EP300) as a downstream executor of lncRNA625-induced transcriptional responses. These findings establish a catalog of novel cancer-associated functional lncRNAs, which will promote our understanding of lncRNA-mediated regulation in this malignancy. PMID:28194033

  10. Novel pathogenic variants and genes for myopathies identified by whole exome sequencing

    PubMed Central

    Hunter, Jesse M; Ahearn, Mary Ellen; Balak, Christopher D; Liang, Winnie S; Kurdoglu, Ahmet; Corneveaux, Jason J; Russell, Megan; Huentelman, Matthew J; Craig, David W; Carpten, John; Coons, Stephen W; DeMello, Daphne E; Hall, Judith G; Bernes, Saunder M; Baumbach-Reardon, Lisa

    2015-01-01

    Neuromuscular diseases (NMD) account for a significant proportion of infant and childhood mortality and devastating chronic disease. Determining the specific diagnosis of NMD is challenging due to thousands of unique or rare genetic variants that result in overlapping phenotypes. We present four unique childhood myopathy cases characterized by relatively mild muscle weakness, slowly progressing course, mildly elevated creatine phosphokinase (CPK), and contractures. We also present two additional cases characterized by severe prenatal/neonatal myopathy. Prior extensive genetic testing and histology of these cases did not reveal the genetic etiology of disease. Here, we applied whole exome sequencing (WES) and bioinformatics to identify likely causal pathogenic variants in each pedigree. In two cases, we identified novel pathogenic variants in COL6A3. In a third case, we identified novel likely pathogenic variants in COL6A6 and COL6A3. We identified a novel splice variant in EMD in a fourth case. Finally, we classify two cases as calcium channelopathies with identification of novel pathogenic variants in RYR1 and CACNA1S. These are the first cases of myopathies reported to be caused by variants in COL6A6 and CACNA1S. Our results demonstrate the utility and genetic diagnostic value of WES in the broad class of NMD phenotypes. PMID:26247046

  11. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data...

  12. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data...

  13. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data...

  14. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data...

  15. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data...

  16. Linkage study and exome sequencing identify a BDP1 mutation associated with hereditary hearing loss.

    PubMed

    Girotto, Giorgia; Abdulhadi, Khalid; Buniello, Annalisa; Vozzi, Diego; Licastro, Danilo; d'Eustacchio, Angela; Vuckovic, Dragana; Alkowari, Moza Khalifa; Steel, Karen P; Badii, Ramin; Gasparini, Paolo

    2013-01-01

    Nonsyndromic Hereditary Hearing Loss is a common disorder accounting for at least 60% of prelingual deafness. GJB2 gene mutations, GJB6 deletion, and the A1555G mitochondrial mutation play a major role worldwide in causing deafness, but there is a high degree of genetic heterogeneity and many genes involved in deafness have not yet been identified. Therefore, there remains a need to search for new causative mutations. In this study, a combined strategy using both linkage analysis and sequencing identified a new mutation causing hearing loss. Linkage analysis identified a region of 40 Mb on chromosome 5q13 (LOD score 3.8) for which exome sequencing data revealed a mutation (c.7873 T>G leading to p.*2625Gluext*11) in the BDP1 gene (B double prime 1, subunit of RNA polymerase III transcription initiation factor IIIB) in patients from a consanguineous Qatari family of second degree, showing bilateral, post-lingual, sensorineural moderate to severe hearing impairment. The mutation disrupts the termination codon of the transcript resulting in an elongation of 11 residues of the BDP1 protein. This elongation does not contain any known motif and is not conserved across species. Immunohistochemistry studies carried out in the mouse inner ear showed Bdp1 expression within the endothelial cells in the stria vascularis, as well as in mesenchyme-derived cells surrounding the cochlear duct. The identification of the BDP1 mutation increases our knowledge of the molecular bases of Nonsyndromic Hereditary Hearing Loss and provides new opportunities for the diagnosis and treatment of this disease in the Qatari population.

  17. Next-generation SELEX identifies sequence and structural determinants of splicing factor binding in human pre-mRNA sequence

    PubMed Central

    Reid, Daniel C.; Chang, Brian L.; Gunderson, Samuel I.; Alpert, Lauren; Thompson, William A.; Fairbrother, William G.

    2009-01-01

    Many splicing factors interact with both mRNA and pre-mRNA. The identification of these interactions has been greatly improved by the development of in vivo cross-linking immunoprecipitation. However, the output carries a strong sampling bias in favor of RNPs that form on more abundant RNA species like mRNA. We have developed a novel in vitro approach for surveying binding on pre-mRNA, without cross-linking or sampling bias. Briefly, this approach entails specifically designed oligonucleotide pools that tile through a pre-mRNA sequence. The pool is then partitioned into bound and unbound fractions, which are quantified by a two-color microarray. We applied this approach to locating splicing factor binding sites in and around ∼4000 exons. We also quantified the effect of secondary structure on binding. The method is validated by the finding that U1snRNP binds at the 5′ splice site (5′ss) with a specificity that is nearly identical to the splice donor motif. In agreement with prior reports, we also show that U1snRNP appears to have some affinity for intronic G triplets that are proximal to the 5′ss. Both U1snRNP and the polypyrimidine tract binding protein (PTB) avoid exonic binding, and the PTB binding map shows increased enrichment at the polypyrimidine tract. For PTB, we confirm polypyrimidine specificity and are also able to identify structural determinants of PTB binding. We detect multiple binding motifs enriched in the PTB bound fraction of oligonucleotides. These motif combinations augment binding in vitro and are also enriched in the vicinity of exons that have been determined to be in vivo targets of PTB. PMID:19861426

  18. Sequencing of SCN5A identifies rare and common variants associated with cardiac conduction

    PubMed Central

    Magnani, Jared W.; Brody, Jennifer A.; Prins, Bram P.; Arking, Dan E.; Lin, Honghuang; Yin, Xiaoyan; Liu, Ching-Ti; Morrison, Alanna C.; Zhang, Feng; Spector, Tim D.; Alonso, Alvaro; Bis, Joshua C.; Heckbert, Susan R.; Lumley, Thomas; Sitlani, Colleen M.; Cupples, L. Adrienne; Lubitz, Steven A.; Soliman, Elsayed Z.; Pulit, Sara L.; Newton-Cheh, Christopher; O'Donnell, Christopher J.; Ellinor, Patrick T.; Benjamin, Emelia J.; Muzny, Donna M.; Gibbs, Richard A.; Santibanez, Jireh; Taylor, Herman A.; Rotter, Jerome I.; Lange, Leslie A.; Psaty, Bruce M.; Jackson, Rebecca; Rich, Stephen S.; Boerwinkle, Eric; Jamshidi, Yalda; Sotoodehnia, Nona

    2014-01-01

    Background The cardiac sodium channel SCN5A regulates atrioventricular and ventricular conduction. Genetic variants in this gene are associated with PR and QRS intervals. We sought to further characterize the contribution of rare and common coding variation in SCN5A to cardiac conduction. Methods and Results In the Cohorts for Heart and Aging Research in Genomic Epidemiology Targeted Sequencing Study (CHARGE), we performed targeted exonic sequencing of SCN5A (n=3699, European-ancestry individuals) and identified 4 common (minor allele frequency >1%) and 157 rare variants. Common and rare SCN5A coding variants were examined for association with PR and QRS intervals through meta-analysis of European ancestry participants from CHARGE, NHLBI’s Exome Sequencing Project (ESP, n=607) and the UK10K (n=1275) and by examining ESP African-ancestry participants (N=972). Rare coding SCN5A variants in aggregate were associated with PR interval in European and African-ancestry participants (P=1.3×10−3). Three common variants were associated with PR and/or QRS interval duration among European-ancestry participants and one among African-ancestry participants. These included two well-known missense variants; rs1805124 (H558R) was associated with PR and QRS shortening in European-ancestry participants (P=6.25×10−4 and P=5.2×10−3 respectively) and rs7626962 (S1102Y) was associated with PR shortening in those of African ancestry (P=2.82×10−3). Among European-ancestry participants, two novel synonymous variants, rs1805126 and rs6599230, were associated with cardiac conduction. Our top signal, rs1805126 was associated with PR and QRS lengthening (P=3.35×10−7 and P=2.69×10−4 respectively), and rs6599230 was associated with PR shortening (P=2.67×10−5). Conclusions By sequencing SCN5A, we identified novel common and rare coding variants associated with cardiac conduction. PMID:24951663

  19. Method for the detection of specific nucleic acid sequences by polymerase nucleotide incorporation

    DOEpatents

    Castro, Alonso

    2004-06-01

    A method for rapid and efficient detection of a target DNA or RNA sequence is provided. A primer having a 3'-hydroxyl group at one end and having a sequence of nucleotides sufficiently homologous with an identifying sequence of nucleotides in the target DNA is selected. The primer is hybridized to the identifying sequence of nucleotides on the DNA or RNA sequence and a reporter molecule is synthesized on the target sequence by progressively binding complementary nucleotides to the primer, where the complementary nucleotides include nucleotides labeled with a fluorophore. Fluorescence emitted by fluorophores on single reporter molecules is detected to identify the target DNA or RNA sequence.

  20. Whole-genome sequencing identifies EN1 as a determinant of bone density and fracture

    PubMed Central

    Zheng, Hou-Feng; Forgetta, Vincenzo; Hsu, Yi-Hsiang; Estrada, Karol; Rosello-Diez, Alberto; Leo, Paul J; Dahia, Chitra L; Park-Min, Kyung Hyun; Tobias, Jonathan H; Kooperberg, Charles; Kleinman, Aaron; Styrkarsdottir, Unnur; Liu, Ching-Ti; Uggla, Charlotta; Evans, Daniel S; Nielson, Carrie M; Walter, Klaudia; Pettersson-Kymmer, Ulrika; McCarthy, Shane; Eriksson, Joel; Kwan, Tony; Jhamai, Mila; Trajanoska, Katerina; Memari, Yasin; Min, Josine; Huang, Jie; Danecek, Petr; Wilmot, Beth; Li, Rui; Chou, Wen-Chi; Mokry, Lauren E; Moayyeri, Alireza; Claussnitzer, Melina; Cheng, Chia-Ho; Cheung, Warren; Medina-Gómez, Carolina; Ge, Bing; Chen, Shu-Huang; Choi, Kwangbom; Oei, Ling; Fraser, James; Kraaij, Robert; Hibbs, Matthew A; Gregson, Celia L; Paquette, Denis; Hofman, Albert; Wibom, Carl; Tranah, Gregory J; Marshall, Mhairi; Gardiner, Brooke B; Cremin, Katie; Auer, Paul; Hsu, Li; Ring, Sue; Tung, Joyce Y; Thorleifsson, Gudmar; Enneman, Anke W; van Schoor, Natasja M; de Groot, Lisette C.P.G.M.; van der Velde, Nathalie; Melin, Beatrice; Kemp, John P; Christiansen, Claus; Sayers, Adrian; Zhou, Yanhua; Calderari, Sophie; van Rooij, Jeroen; Carlson, Chris; Peters, Ulrike; Berlivet, Soizik; Dostie, Josée; Uitterlinden, Andre G; Williams, Stephen R.; Farber, Charles; Grinberg, Daniel; LaCroix, Andrea Z; Haessler, Jeff; Chasman, Daniel I; Giulianini, Franco; Rose, Lynda M; Ridker, Paul M; Eisman, John A; Nguyen, Tuan V; Center, Jacqueline R; Nogues, Xavier; Garcia-Giralt, Natalia; Launer, Lenore L; Gudnason, Vilmunder; Mellström, Dan; Vandenput, Liesbeth; Karlsson, Magnus K; Ljunggren, Östen; Svensson, Olle; Hallmans, Göran; Rousseau, François; Giroux, Sylvie; Bussière, Johanne; Arp, Pascal P; Koromani, Fjorda; Prince, Richard L; Lewis, Joshua R; Langdahl, Bente L; Hermann, A Pernille; Jensen, Jens-Erik B; Kaptoge, Stephen; Khaw, Kay-Tee; Reeve, Jonathan; Formosa, Melissa M; Xuereb-Anastasi, Angela; Åkesson, Kristina; McGuigan, Fiona E; Garg, Gaurav; Olmos, Jose M; Zarrabeitia, Maria T; Riancho, Jose A; Ralston, Stuart H; Alonso, Nerea; Jiang, Xi; Goltzman, David; Pastinen, Tomi; Grundberg, Elin; Gauguier, Dominique; Orwoll, Eric S; Karasik, David; Davey-Smith, George; Smith, Albert V; Siggeirsdottir, Kristin; Harris, Tamara B; Zillikens, M Carola; van Meurs, Joyce BJ; Thorsteinsdottir, Unnur; Maurano, Matthew T; Timpson, Nicholas J; Soranzo, Nicole; Durbin, Richard; Wilson, Scott G; Ntzani, Evangelia E; Brown, Matthew A; Stefansson, Kari; Hinds, David A; Spector, Tim; Cupples, L Adrienne; Ohlsson, Claes; Greenwood, Celia MT; Jackson, Rebecca D; Rowe, David W; Loomis, Cynthia A; Evans, David M; Ackert-Bicknell, Cheryl L; Joyner, Alexandra L; Duncan, Emma L; Kiel, Douglas P; Rivadeneira, Fernando; Richards, J Brent

    2016-01-01

    SUMMARY The extent to which low-frequency (minor allele frequency [MAF] between 1–5%) and rare (MAF ≤ 1%) variants contribute to complex traits and disease in the general population is largely unknown. Bone mineral density (BMD) is highly heritable, is a major predictor of osteoporotic fractures and has been previously associated with common genetic variants1–8, and rare, population-specific, coding variants9. Here we identify novel non-coding genetic variants with large effects on BMD (ntotal = 53,236) and fracture (ntotal = 508,253) in individuals of European ancestry from the general population. Associations for BMD were derived from whole-genome sequencing (n=2,882 from UK10K), whole-exome sequencing (n= 3,549), deep imputation of genotyped samples using a combined UK10K/1000Genomes reference panel (n=26,534), and de-novo replication genotyping (n= 20,271). We identified a low-frequency non-coding variant near a novel locus, EN1, with an effect size 4-fold larger than the mean of previously reported common variants for lumbar spine BMD8 (rs11692564[T], MAF = 1.7%, replication effect size = +0.20 standard deviations [SD], Pmeta = 2×10−14), which was also associated with a decreased risk of fracture (OR = 0.85; P = 2×10−11; ncases = 98,742 and ncontrols = 409,511). Using an En1Cre/flox mouse model, we observed that conditional loss of En1 results in low bone mass, likely as a consequence of high bone turn-over. We also identified a novel low-frequency non-coding variant with large effects on BMD near WNT16 (rs148771817[T], MAF = 1.1%, replication effect size = +0.39 SD, Pmeta = 1×10−11). In general, there was an excess of association signals arising from deleterious coding and conserved non-coding variants. These findings provide evidence that low-frequency non-coding variants have large effects on BMD and fracture, thereby providing rationale for whole-genome sequencing and improved imputation reference panels to study the genetic architecture of

  1. Human retroviruses and AIDS 1996. A compilation and analysis of nucleic acid and amino acid sequences

    SciTech Connect

    Myers, G.; Foley, B.; Korber, B.; Mellors, J.W.; Jeang, K.T.; Wain-Hobson, S.

    1997-04-01

    This compendium and the accompanying floppy diskettes are the result of an effort to compile and rapidly publish all relevant molecular data concerning the human immunodeficiency viruses (HIV) and related retroviruses. The scope of the compendium and database is best summarized by the five parts that it comprises: (1) Nuclear Acid Alignments and Sequences; (2) Amino Acid Alignments; (3) Analysis; (4) Related Sequences; and (5) Database Communications. Information within all the parts is updated throughout the year on the Web site, http://hiv-web.lanl.gov. While this publication could take the form of a review or sequence monograph, it is not so conceived. Instead, the literature from which the database is derived has simply been summarized and some elementary computational analyses have been performed upon the data. Interpretation and commentary have been avoided insofar as possible so that the reader can form his or her own judgments concerning the complex information. In addition to the general descriptions of the parts of the compendium, the user should read the individual introductions for each part.

  2. WHITE-DWARF-MAIN-SEQUENCE BINARIES IDENTIFIED FROM THE LAMOST PILOT SURVEY

    SciTech Connect

    Ren Juanjuan; Luo Ali; Li Yinbi; Wei Peng; Zhao Jingkun; Zhao Yongheng; Song Yihan; Zhao Gang E-mail: lal@nao.cas.cn

    2013-10-01

    We present a set of white-dwarf-main-sequence (WDMS) binaries identified spectroscopically from the Large sky Area Multi-Object fiber Spectroscopic Telescope (LAMOST, also called the Guo Shou Jing Telescope) pilot survey. We develop a color selection criteria based on what is so far the largest and most complete Sloan Digital Sky Survey (SDSS) DR7 WDMS binary catalog and identify 28 WDMS binaries within the LAMOST pilot survey. The primaries in our binary sample are mostly DA white dwarfs except for one DB white dwarf. We derive the stellar atmospheric parameters, masses, and radii for the two components of 10 of our binaries. We also provide cooling ages for the white dwarf primaries as well as the spectral types for the companion stars of these 10 WDMS binaries. These binaries tend to contain hot white dwarfs and early-type companions. Through cross-identification, we note that nine binaries in our sample have been published in the SDSS DR7 WDMS binary catalog. Nineteen spectroscopic WDMS binaries identified by the LAMOST pilot survey are new. Using the 3{sigma} radial velocity variation as a criterion, we find two post-common-envelope binary candidates from our WDMS binary sample.

  3. Targeted next-generation sequencing identified novel mutations in triple-negative myeloproliferative neoplasms.

    PubMed

    Chang, Yu-Cheng; Lin, Huan-Chau; Chiang, Yi-Hao; Chen, Caleb Gon-Shen; Huang, Ling; Wang, Wei-Ting; Cheng, Chun-Chia; Lin, Johnson; Chang, Yi-Fang; Chang, Ming-Chih; Hsieh, Ruey-Kuen; Chen, Shu-Jen; Lim, Ken-Hong; Kuo, Yuan-Yeh

    2017-05-01

    Mutations in JAK2, MPL and CALR genes have been identified in the majority of myeloproliferative neoplasm (MPN) patients, and patients negative for these three mutations are the so-called triple-negative (TN) MPN. In this study, we examined the mutational profiles of 16 triple-negative MPN patients including 7 essential thrombocythemia (ET), 1 primary myelofibrosis and 8 polycythemia vera (PV). Targeted next-generation sequencing was performed using the ACTOnco Comprehensive Cancer Panel (Ion AmpliSeq Comprehensive Cancer Panel, Life Technologies) to target all coding exons of 409 cancer-related genes. Overall, 30 nonsynonymous somatic mutations were detected in 12 (75%) patients with a range of 1-5 mutations per sample. Notably, one ET patient was found to have JAK2V617F and KITP551L mutations at very low allele frequency. One MPLP70L and 1 MPLM602T mutations were identified each in 1 ET and 1 PV, respectively. Other recurrent mutations were also identified including KMT2C, KMT2D, IRS2, SYNE1, PDE4DIP, SETD2, ATM, TNFAIP3 and CCND2. In addition, germline mutations were also found in some cancer-related genes. Copy number changes were rare in this cohort of TN MPNs. In conclusion, both somatic and germline mutations can be detected in TN MPN patients.

  4. Transcriptome Sequencing Identifies PCAT-1, a Novel lincRNA Implicated in Prostate Cancer Progression

    PubMed Central

    Prensner, John R.; Iyer, Matthew K.; Balbin, O. Alejandro; Dhanasekaran, Saravana M.; Cao, Qi; Brenner, J. Chad; Laxman, Bharathi; Asangani, Irfan; Grasso, Catherine; Kominsky, Hal D.; Cao, Xuhong; Jing, Xiaojun; Wang, Xiaoju; Siddiqui, Javed; Wei, John T.; Robinson, Daniel; Iyer, Hari K.; Palanisamy, Nallasivam; Maher, Christopher A.; Chinnaiyan, Arul M.

    2011-01-01

    High-throughput sequencing of polyA+ RNA (RNA-Seq) in human cancer shows remarkable potential to identify both novel markers of disease and uncharacterized aspects of tumor biology, particularly non-coding RNA (ncRNA) species. We employed RNA-Seq on a cohort of 102 prostate tissues and cells lines and performed ab initio transcriptome assembly to discover unannotated ncRNAs. We nominated 121 such Prostate Cancer Associated Transcripts (PCATs) with cancer-specific expression patterns. Among these, we characterized PCAT-1 as a novel prostate-specific regulator of cell proliferation and target of the Polycomb Repressive Complex 2 (PRC2). We further found that high PCAT-1 and PRC2 expression stratified patient tissues into molecular subtypes distinguished by expression signatures of PCAT-1-repressed target genes. Taken together, the findings presented herein identify PCAT-1 as a novel transcriptional repressor implicated in subset of prostate cancer patients. These findings establish the utility of RNA-Seq to identify disease-associated ncRNAs that may improve the stratification of cancer subtypes. PMID:21804560

  5. Exome sequencing of hepatocellular carcinomas identifies new mutational signatures and potential therapeutic targets

    PubMed Central

    Alexandrov, Ludmil B; Calderaro, Julien; Rebouissou, Sandra; Couchy, Gabrielle; Meiller, Clément; Shinde, Jayendra; Soysouvanh, Frederic; Calatayud, Anna-Line; Pinyol, Roser; Pelletier, Laura; Balabaud, Charles; Laurent, Alexis; Blanc, Jean-Frederic; Mazzaferro, Vincenzo; Calvo, Fabien; Villanueva, Augusto; Nault, Jean-Charles; Bioulac-Sage, Paulette; Stratton, Michael R; Llovet, Josep M; Zucman-Rossi, Jessica

    2015-01-01

    Genomic analyses promise to improve tumor characterization in order to optimize personalized treatment for patients with hepatocellular carcinoma (HCC). Exome sequencing analysis of 243 liver tumors revealed mutational signatures associated with specific risk factors, mainly combined alcohol/tobacco consumption, and aflatoxin B1. We identified 161 putative driver genes associated with 11 recurrent pathways. Associations of mutations defined 3 groups of genes related to risk factors and centered on CTNNB1 (alcohol), TP53 (HBV), and AXIN1. Analyses according to tumor stage progression revealed TERT promoter mutation as an early event whereas FGF3, FGF4, FGF19/CCND1 amplification, TP53 and CDKN2A alterations, appeared at more advanced stages in aggressive tumors. In 28% of the tumors we identified genetic alterations potentially targetable by FDA-approved drugs. In conclusion, we identified risk factor-specific mutational signatures and defined the extensive landscape of altered genes and pathways in HCC which will be useful to design clinical trials for targeted therapy. PMID:25822088

  6. Huntington's disease biomarker progression profile identified by transcriptome sequencing in peripheral blood.

    PubMed

    Mastrokolias, Anastasios; Ariyurek, Yavuz; Goeman, Jelle J; van Duijn, Erik; Roos, Raymund A C; van der Mast, Roos C; van Ommen, GertJan B; den Dunnen, Johan T; 't Hoen, Peter A C; van Roon-Mom, Willeke M C

    2015-10-01

    With several therapeutic approaches in development for Huntington's disease, there is a need for easily accessible biomarkers to monitor disease progression and therapy response. We performed next-generation sequencing-based transcriptome analysis of total RNA from peripheral blood of 91 mutation carriers (27 presymptomatic and, 64 symptomatic) and 33 controls. Transcriptome analysis by DeepSAGE identified 167 genes significantly associated with clinical total motor score in Huntington's disease patients. Relative to previous studies, this yielded novel genes and confirmed previously identified genes, such as H2AFY, an overlap in results that has proven difficult in the past. Pathway analysis showed enrichment of genes of the immune system and target genes of miRNAs, which are downregulated in Huntington's disease models. Using a highly parallelized microfluidics array chip (Fluidigm), we validated 12 of the top 20 significant genes in our discovery cohort and 7 in a second independent cohort. The five genes (PROK2, ZNF238, AQP9, CYSTM1 and ANXA3) that were validated independently in both cohorts present a candidate biomarker panel for stage determination and therapeutic readout in Huntington's disease. Finally we suggest a first empiric formula predicting total motor score from the expression levels of our biomarker panel. Our data support the view that peripheral blood is a useful source to identify biomarkers for Huntington's disease and monitor disease progression in future clinical trials.

  7. Using RNA sequencing for identifying gene imprinting and random monoallelic expression in human placenta

    PubMed Central

    Metsalu, Tauno; Viltrop, Triin; Tiirats, Airi; Rajashekar, Balaji; Reimann, Ene; Kõks, Sulev; Rull, Kristiina; Milani, Lili; Acharya, Ganesh; Basnet, Purusotam; Vilo, Jaak; Mägi, Reedik; Metspalu, Andres; Peters, Maire; Haller-Kikkatalo, Kadri; Salumets, Andres

    2014-01-01

    Given the possible critical importance of placental gene imprinting and random monoallelic expression on fetal and infant health, most of those genes must be identified, in order to understand the risks that the baby might meet during pregnancy and after birth. Therefore, the aim of the current study was to introduce a workflow and tools for analyzing imprinted and random monoallelic gene expression in human placenta, by applying whole-transcriptome (WT) RNA sequencing of placental tissue and genotyping of coding DNA variants in family trios. Ten family trios, each with a healthy spontaneous single-term pregnancy, were recruited. Total RNA was extracted for WT analysis, providing the full sequence information for the placental transcriptome. Parental and child blood DNA genotypes were analyzed by exome SNP genotyping microarrays, mapping the inheritance and estimating the abundance of parental expressed alleles. Imprinted genes showed consistent expression from either parental allele, as demonstrated by the SNP content of sequenced transcripts, while monoallelically expressed genes had random activity of parental alleles. We revealed 4 novel possible imprinted genes (LGALS8, LGALS14, PAPPA2 and SPTLC3) and confirmed the imprinting of 4 genes (AIM1, PEG10, RHOBTB3 and ZFAT-AS1) in human placenta. The major finding was the identification of 4 genes (ABP1, BCLAF1, IFI30 and ZFAT) with random allelic bias, expressing one of the parental alleles preferentially. The main functions of the imprinted and monoallelically expressed genes included: i) mediating cellular apoptosis and tissue development; ii) regulating inflammation and immune system; iii) facilitating metabolic processes; and iv) regulating cell cycle. PMID:25437054

  8. Targeted Next Generation Sequencing Identifies Markers of Response to PD-1 Blockade.

    PubMed

    Johnson, Douglas B; Frampton, Garrett M; Rioth, Matthew J; Yusko, Erik; Xu, Yaomin; Guo, Xingyi; Ennis, Riley C; Fabrizio, David; Chalmers, Zachary R; Greenbowe, Joel; Ali, Siraj M; Balasubramanian, Sohail; Sun, James X; He, Yuting; Frederick, Dennie T; Puzanov, Igor; Balko, Justin M; Cates, Justin M; Ross, Jeffrey S; Sanders, Catherine; Robins, Harlan; Shyr, Yu; Miller, Vincent A; Stephens, Philip J; Sullivan, Ryan J; Sosman, Jeffrey A; Lovly, Christine M

    2016-11-01

    Therapeutic antibodies blocking programmed death-1 and its ligand (PD-1/PD-L1) induce durable responses in a substantial fraction of melanoma patients. We sought to determine whether the number and/or type of mutations identified using a next-generation sequencing (NGS) panel available in the clinic was correlated with response to anti-PD-1 in melanoma. Using archival melanoma samples from anti-PD-1/PD-L1-treated patients, we performed hybrid capture-based NGS on 236-315 genes and T-cell receptor (TCR) sequencing on initial and validation cohorts from two centers. Patients who responded to anti-PD-1/PD-L1 had higher mutational loads in an initial cohort (median, 45.6 vs. 3.9 mutations/MB; P = 0.003) and a validation cohort (37.1 vs. 12.8 mutations/MB; P = 0.002) compared with nonresponders. Response rate, progression-free survival, and overall survival were superior in the high, compared with intermediate and low, mutation load groups. Melanomas with NF1 mutations harbored high mutational loads (median, 62.7 mutations/MB) and high response rates (74%), whereas BRAF/NRAS/NF1 wild-type melanomas had a lower mutational load. In these archival samples, TCR clonality did not predict response. Mutation numbers in the 315 genes in the NGS platform strongly correlated with those detected by whole-exome sequencing in The Cancer Genome Atlas samples, but was not associated with survival. In conclusion, mutational load, as determined by an NGS platform available in the clinic, effectively stratified patients by likelihood of response. This approach may provide a clinically feasible predictor of response to anti-PD-1/PD-L1. Cancer Immunol Res; 4(11); 959-67. ©2016 AACR.

  9. Identification of amino acid sequences in the polyomavirus capsid proteins that serve as nuclear localization signals

    NASA Technical Reports Server (NTRS)

    Chang, D.; Haynes, J. I. Jr; Brady, J. N.; Consigli, R. A.; Spooner, B. S. (Principal Investigator)

    1993-01-01

    The molecular mechanism participating in the transport of newly synthesized proteins from the cytoplasm to the nucleus in mammalian cells is poorly understood. Recently, the nuclear localization signal sequences (NLS) of many nuclear proteins have been identified, and most have been found to be composed of a highly basic amino acid stretch. A genetic "subtractive" and a biochemical "additive" approach were used in our studies to identify the NLS's of the polyomavirus structural capsid proteins. An NLS was identified at the N-terminus (Ala1-Pro-Lys-Arg-Lys-Ser-Gly-Val-Ser-Lys-Cys11) of the major capsid protein VP1 and at the C-terminus (Glu307 -Glu-Asp-Gly-Pro-Glu-Lys-Lys-Lys-Arg-Arg-Leu318) of the VP2/VP3 minor capsid proteins.

  10. A novel homozygous mutation in SUCLA2 gene identified by exome sequencing

    PubMed Central

    Lamperti, Costanza; Fang, Mingyan; Invernizzi, Federica; Liu, Xuanzhu; Wang, Hairong; Zhang, Qing; Carrara, Franco; Moroni, Isabella; Zeviani, Massimo; Zhang, Jianguo; Ghezzi, Daniele

    2012-01-01

    Mitochondrial disorders with multiple mitochondrial respiratory chain (MRC) enzyme deficiency and depletion of mitochondrial DNA (mtDNA) are autosomal recessive conditions due to mutations in several nuclear genes necessary for proper mtDNA maintenance. In this report, we describe two Italian siblings presenting with encephalomyopathy and mtDNA depletion in muscle. By whole exome-sequencing and prioritization of candidate genes, we identified a novel homozygous missense mutation in the SUCLA2 gene in a highly conserved aminoacid residue. Although a recurrent mutation in the SUCLA2 gene is relatively frequent in the Faroe Islands, mutations in other populations are extremely rare. In contrast with what has been reported in other patients, methyl-malonic aciduria, a biomarker for this genetic defect, was absent in our proband and very mildly elevated in her affected sister. This report demonstrates that next-generation technologies, particularly exome-sequencing, are user friendly, powerful means for the identification of disease genes in genetically and clinically heterogeneous inherited conditions, such as mitochondrial disorders. PMID:23010432

  11. Whole-exome sequencing identifies rare, functional CFH variants in families with macular degeneration.

    PubMed

    Yu, Yi; Triebwasser, Michael P; Wong, Edwin K S; Schramm, Elizabeth C; Thomas, Brett; Reynolds, Robyn; Mardis, Elaine R; Atkinson, John P; Daly, Mark; Raychaudhuri, Soumya; Kavanagh, David; Seddon, Johanna M

    2014-10-01

    We sequenced the whole exome of 35 cases and 7 controls from 9 age-related macular degeneration (AMD) families in whom known common genetic risk alleles could not explain their high disease burden and/or their early-onset advanced disease. Two families harbored novel rare mutations in CFH (R53C and D90G). R53C segregates perfectly with AMD in 11 cases (heterozygous) and 1 elderly control (reference allele) (LOD = 5.07, P = 6.7 × 10(-7)). In an independent cohort, 4 out of 1676 cases but none of the 745 examined controls or 4300 NHBLI Exome Sequencing Project (ESP) samples carried the R53C mutation (P = 0.0039). In another family of six siblings, D90G similarly segregated with AMD in five cases and one control (LOD = 1.22, P = 0.009). No other sample in our large cohort or the ESP had this mutation. Functional studies demonstrated that R53C decreased the ability of FH to perform decay accelerating activity. D90G exhibited a decrease in cofactor-mediated inactivation. Both of these changes would lead to a loss of regulatory activity, resulting in excessive alternative pathway activation. This study represents an initial application of the whole-exome strategy to families with early-onset AMD. It successfully identified high impact alleles leading to clearer functional insight into AMD etiopathogenesis.

  12. Whole genome sequencing of Gir cattle for identifying polymorphisms and loci under selection.

    PubMed

    Liao, Xiaoping; Peng, Fred; Forni, Selma; McLaren, David; Plastow, Graham; Stothard, Paul

    2013-10-01

    Genetic variation in Gir cattle (Bos indicus) has so far not been well characterized. In this study, we used whole genome sequencing of three Gir bulls and a pooled sample from another 11 bulls to identify polymorphisms and loci under selection. A total of 9 990 733 single nucleotide polymorphisms (SNPs) and 604 308 insertion/deletions (indels) were discovered in Gir samples, of which 62.34% and 83.62%, respectively, are previously unknown. Moreover, we detected 79 putative selective sweeps using the sequence data of the pooled sample. One of the most striking sweeps harbours several genes belonging to the cathelicidin gene family, such as CAMP, CATHL1, CATHL2, and CATHL3, which are related to pathogen- and parasite-resistance. Another interesting region harbours genes encoding mitogen-activated protein kinases, which are involved in directing cellular responses to a variety of stimuli, such as osmotic stress and heat shock. These findings are particularly interesting because Gir is resistant to hot temperatures and tropical diseases. This initial selective sweep analysis of Gir cattle has revealed a number of loci that could be important for their adaptation to tropical climates.

  13. Exome and genome sequencing of nasopharynx cancer identifies NF-κB pathway activating mutations

    PubMed Central

    Li, Yvonne Y; Chung, Grace T. Y.; Lui, Vivian W. Y.; To, Ka-Fai; Ma, Brigette B. Y.; Chow, Chit; Woo, John K, S.; Yip, Kevin Y.; Seo, Jeongsun; Hui, Edwin P.; Mak, Michael K. F.; Rusan, Maria; Chau, Nicole G.; Or, Yvonne Y. Y.; Law, Marcus H. N.; Law, Peggy P. Y.; Liu, Zoey W. Y.; Ngan, Hoi-Lam; Hau, Pok-Man; Verhoeft, Krista R.; Poon, Peony H. Y.; Yoo, Seong-Keun; Shin, Jong-Yeon; Lee, Sau-Dan; Lun, Samantha W. M.; Jia, Lin; Chan, Anthony W. H.; Chan, Jason Y. K.; Lai, Paul B. S.; Fung, Choi-Yi; Hung, Suet-Ting; Wang, Lin; Chang, Ann Margaret V.; Chiosea, Simion I.; Hedberg, Matthew L.; Tsao, Sai-Wah; van Hasselt, Andrew C.; Chan, Anthony T. C.; Grandis, Jennifer R.; Hammerman, Peter S.; Lo, Kwok-Wai

    2017-01-01

    Nasopharyngeal carcinoma (NPC) is an aggressive head and neck cancer characterized by Epstein-Barr virus (EBV) infection and dense lymphocyte infiltration. The scarcity of NPC genomic data hinders the understanding of NPC biology, disease progression and rational therapy design. Here we performed whole-exome sequencing (WES) on 111 micro-dissected EBV-positive NPCs, with 15 cases subjected to further whole-genome sequencing (WGS), to determine its mutational landscape. We identified enrichment for genomic aberrations of multiple negative regulators of the NF-κB pathway, including CYLD, TRAF3, NFKBIA and NLRC5, in a total of 41% of cases. Functional analysis confirmed inactivating CYLD mutations as drivers for NPC cell growth. The EBV oncoprotein latent membrane protein 1 (LMP1) functions to constitutively activate NF-κB signalling, and we observed mutual exclusivity among tumours with somatic NF-κB pathway aberrations and LMP1-overexpression, suggesting that NF-κB activation is selected for by both somatic and viral events during NPC pathogenesis. PMID:28098136

  14. Whole genome sequencing resource identifies 18 new candidate genes for autism spectrum disorder.

    PubMed

    C Yuen, Ryan K; Merico, Daniele; Bookman, Matt; L Howe, Jennifer; Thiruvahindrapuram, Bhooma; Patel, Rohan V; Whitney, Joe; Deflaux, Nicole; Bingham, Jonathan; Wang, Zhuozhi; Pellecchia, Giovanna; Buchanan, Janet A; Walker, Susan; Marshall, Christian R; Uddin, Mohammed; Zarrei, Mehdi; Deneault, Eric; D'Abate, Lia; Chan, Ada J S; Koyanagi, Stephanie; Paton, Tara; Pereira, Sergio L; Hoang, Ny; Engchuan, Worrawat; Higginbotham, Edward J; Ho, Karen; Lamoureux, Sylvia; Li, Weili; MacDonald, Jeffrey R; Nalpathamkalam, Thomas; Sung, Wilson W L; Tsoi, Fiona J; Wei, John; Xu, Lizhen; Tasse, Anne-Marie; Kirby, Emily; Van Etten, William; Twigger, Simon; Roberts, Wendy; Drmic, Irene; Jilderda, Sanne; Modi, Bonnie MacKinnon; Kellam, Barbara; Szego, Michael; Cytrynbaum, Cheryl; Weksberg, Rosanna; Zwaigenbaum, Lonnie; Woodbury-Smith, Marc; Brian, Jessica; Senman, Lili; Iaboni, Alana; Doyle-Thomas, Krissy; Thompson, Ann; Chrysler, Christina; Leef, Jonathan; Savion-Lemieux, Tal; Smith, Isabel M; Liu, Xudong; Nicolson, Rob; Seifer, Vicki; Fedele, Angie; Cook, Edwin H; Dager, Stephen; Estes, Annette; Gallagher, Louise; Malow, Beth A; Parr, Jeremy R; Spence, Sarah J; Vorstman, Jacob; Frey, Brendan J; Robinson, James T; Strug, Lisa J; Fernandez, Bridget A; Elsabbagh, Mayada; Carter, Melissa T; Hallmayer, Joachim; Knoppers, Bartha M; Anagnostou, Evdokia; Szatmari, Peter; Ring, Robert H; Glazer, David; Pletcher, Mathew T; Scherer, Stephen W

    2017-04-01

    We are performing whole-genome sequencing of families with autism spectrum disorder (ASD) to build a resource (MSSNG) for subcategorizing the phenotypes and underlying genetic factors involved. Here we report sequencing of 5,205 samples from families with ASD, accompanied by clinical information, creating a database accessible on a cloud platform and through a controlled-access internet portal. We found an average of 73.8 de novo single nucleotide variants and 12.6 de novo insertions and deletions or copy number variations per ASD subject. We identified 18 new candidate ASD-risk genes and found that participants bearing mutations in susceptibility genes had significantly lower adaptive ability (P = 6 × 10(-4)). In 294 of 2,620 (11.2%) of ASD cases, a molecular basis could be determined and 7.2% of these carried copy number variations and/or chromosomal abnormalities, emphasizing the importance of detecting all forms of genetic variation as diagnostic and therapeutic targets in ASD.

  15. A distinctive oral phenotype points to FAM20A mutations not identified by Sanger sequencing.

    PubMed

    Poulter, James A; Smith, Claire E L; Murrillo, Gina; Silva, Sandra; Feather, Sally; Howell, Marianella; Crinnion, Laura; Bonthron, David T; Carr, Ian M; Watson, Christopher M; Inglehearn, Chris F; Mighell, Alan J

    2015-11-01

    Biallelic FAM20A mutations cause two conditions where Amelogenesis Imperfecta (AI) is the presenting feature: Amelogenesis Imperfecta and Gingival Fibromatosis Syndrome; and Enamel Renal Syndrome. A distinctive oral phenotype is shared in both conditions. On Sanger sequencing of FAM20A in cases with that phenotype, we identified two probands with single, likely pathogenic heterozygous mutations. Given the recessive inheritance pattern seen in all previous FAM20A mutation-positive families and the potential for renal disease, further screening was carried out to look for a second pathogenic allele. Reverse transcriptase-PCR on cDNA was used to determine transcript levels. CNVseq was used to screen for genomic insertions and deletions. In one family, FAM20A cDNA screening revealed only a single mutated FAM20A allele with the wild-type allele not transcribed. In the second family, CNV detection by whole genome sequencing (CNVseq) revealed a heterozygous 54.7 kb duplication encompassing exons 1 to 4 of FAM20A. This study confirms the link between biallelic FAM20A mutations and the characteristic oral phenotype. It highlights for the first time examples of FAM20A mutations missed by the most commonly used mutation screening techniques. This information informed renal assessment and ongoing clinical care.

  16. Adaptation and possible ancient interspecies introgression in pigs identified by whole-genome sequencing.

    PubMed

    Ai, Huashui; Fang, Xiaodong; Yang, Bin; Huang, Zhiyong; Chen, Hao; Mao, Likai; Zhang, Feng; Zhang, Lu; Cui, Leilei; He, Weiming; Yang, Jie; Yao, Xiaoming; Zhou, Lisheng; Han, Lijuan; Li, Jing; Sun, Silong; Xie, Xianhua; Lai, Boxian; Su, Ying; Lu, Yao; Yang, Hui; Huang, Tao; Deng, Wenjiang; Nielsen, Rasmus; Ren, Jun; Huang, Lusheng

    2015-03-01

    Domestic pigs have evolved genetic adaptations to their local environmental conditions, such as cold and hot climates. We sequenced the genomes of 69 pigs from 15 geographically divergent locations in China and detected 41 million variants, of which 21 million were absent from the dbSNP database. In a genome-wide scan, we identified a set of loci that likely have a role in regional adaptations to high- and low-latitude environments within China. Intriguingly, we found an exceptionally large (14-Mb) region with a low recombination rate on the X chromosome that appears to have two distinct haplotypes in the high- and low-latitude populations, possibly underlying their adaptation to cold and hot environments, respectively. Surprisingly, the adaptive sweep in the high-latitude regions has acted on DNA that might have been introgressed from an extinct Sus species. Our findings provide new insights into the evolutionary history of pigs and the role of introgression in adaptation.

  17. Transcriptome Sequencing of Lima Bean (Phaseolus lunatus) to Identify Putative Positive Selection in Phaseolus and Legumes

    PubMed Central

    Li, Fengqi; Cao, Depan; Liu, Yang; Yang, Ting; Wang, Guirong

    2015-01-01

    The identification of genes under positive selection is a central goal of evolutionary biology. Many legume species, including Phaseolus vulgaris (common bean) and Phaseolus lunatus (lima bean), have important ecological and economic value. In this study, we sequenced and assembled the transcriptome of one Phaseolus species, lima bean. A comparison with the genomes of six other legume species, including the common bean, Medicago, lotus, soybean, chickpea, and pigeonpea, revealed 15 and 4 orthologous groups with signatures of positive selection among the two Phaseolus species and among the seven legume species, respectively. Characterization of these positively selected genes using Non redundant (nr) annotation, gene ontology (GO) classification, GO term enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses revealed that these genes are mostly involved in thylakoids, photosynthesis and metabolism. This study identified genes that may be related to the divergence of the Phaseolus and legume species. These detected genes are particularly good candidates for subsequent functional studies. PMID:26151849

  18. A new system identification approach to identify genetic variants in sequencing studies for a binary phenotype.

    PubMed

    Kang, Guolian; Bi, Wenjian; Zhao, Yanlong; Zhang, Ji-Feng; Yang, Jun J; Xu, Heng; Loh, Mignon L; Hunger, Stephen P; Relling, Mary V; Pounds, Stanley; Cheng, Cheng

    2014-01-01

    We propose in this paper a set-valued (SV) system model, which is a generalized form of logistic (LG) and Probit (Probit) regression, to be considered as a method for discovering genetic variants, especially rare genetic variants in next-generation sequencing studies, for a binary phenotype. We propose a new SV system identification method to estimate all underlying key system parameters for the Probit model and compare it with the LG model in the setting of genetic association studies. Across an extensive series of simulation studies, the Probit method maintained type I error control and had similar or greater power than the LG method, which is robust to different distributions of noise: logistic, normal, or t distributions. Additionally, the Probit association parameter estimate was 2.7-46.8-fold less variable than the LG log-odds ratio association parameter estimate. Less variability in the association parameter estimate translates to greater power and robustness across the spectrum of minor allele frequencies (MAFs), and these advantages are the most pronounced for rare variants. For instance, in a simulation that generated data from an additive logistic model with an odds ratio of 7.4 for a rare single nucleotide polymorphism with a MAF of 0.005 and a sample size of 2,300, the Probit method had 60% power whereas the LG method had 25% power at the α = 10(-6) level. Consistent with these simulation results, the set of variants identified by the LG method was a subset of those identified by the Probit method in two example analyses. Thus, we suggest the Probit method may be a competitive alternative to the LG method in genetic association studies such as candidate gene, genome-wide, or next-generation sequencing studies for a binary phenotype.

  19. De Novo Transcriptome Sequencing of Oryza officinalis Wall ex Watt to Identify Disease-Resistance Genes.

    PubMed

    He, Bin; Gu, Yinghong; Tao, Xiang; Cheng, Xiaojie; Wei, Changhe; Fu, Jian; Cheng, Zaiquan; Zhang, Yizheng

    2015-12-10

    Oryza officinalis Wall ex Watt is one of the most important wild relatives of cultivated rice and exhibits high resistance to many diseases. It has been used as a source of genes for introgression into cultivated rice. However, there are limited genomic resources and little genetic information publicly reported for this species. To better understand the pathways and factors involved in disease resistance and accelerating the process of rice breeding, we carried out a de novo transcriptome sequencing of O. officinalis. In this research, 137,229 contigs were obtained ranging from 200 to 19,214 bp with an N50 of 2331 bp through de novo assembly of leaves, stems and roots in O. officinalis using an Illumina HiSeq 2000 platform. Based on sequence similarity searches against a non-redundant protein database, a total of 88,249 contigs were annotated with gene descriptions and 75,589 transcripts were further assigned to GO terms. Candidate genes for plant-pathogen interaction and plant hormones regulation pathways involved in disease-resistance were identified. Further analyses of gene expression profiles showed that the majority of genes related to disease resistance were all expressed in the three tissues. In addition, there are two kinds of rice bacterial blight-resistant genes in O. officinalis, including two Xa1 genes and three Xa26 genes. All 2 Xa1 genes showed the highest expression level in stem, whereas one of Xa26 was expressed dominantly in leaf and other 2 Xa26 genes displayed low expression level in all three tissues. This transcriptomic database provides an opportunity for identifying the genes involved in disease-resistance and will provide a basis for studying functional genomics of O. officinalis and genetic improvement of cultivated rice in the future.

  20. Human retroviruses and aids, 1992. A compilation and analysis of nucleic acid and amino acid sequences

    SciTech Connect

    Myers, G.; Korber, B.; Berzofsky, J.A.; Pavlakis, G.N.; Smith, R.F.

    1992-10-01

    This compendium and the accompanying floppy diskettes are the result of an effort to compile and rapidly publish all relevant molecular data concerning the human immunodeficiency viruses (HIV) and related retroviruses. The scope of the compendium and database is best summarized by the five parts that it comprises: (1) HIV and SIV Nucleotide Sequences; (H) Amino Acid Sequences; (III) Analyses; (IV) Related Sequences; and (V) Database Communications. information within all the parts is updated at least twice in each year, which accounts for the modes of binding and pagination in the compendium. While this publication could take the form of a review or sequence monograph, it is not so conceived. Instead, the literature from which the database is derived has simply been summarized and some elementary computational analyses have been performed upon the data. Interpretation and commentary have been avoided insofar as possible so that the reader can form his or her own judgments concerning the complex information. In addition to the general descriptions below of the parts of the compendium, the user should read the individual introductions for each part.

  1. Nucleotide and protein sequences for dog masticatory tropomyosin identify a novel Tpm4 gene product

    PubMed Central

    Reiser, Peter J.

    2016-01-01

    Jaw-closing muscles of several vertebrate species, including members of Carnivora, express a unique, “masticatory”, isoform of myosin heavy chain, along with isoforms of other myofibrillar proteins that are not expressed in most other muscles. It is generally believed that the complement of myofibrillar isoforms in these muscles serves high force generation for capturing live prey, breaking down tough plant material and defensive biting. A unique isoform of tropomyosin (Tpm) was reported to be expressed in cat jaw-closing muscle, based upon two-dimensional gel mobility, peptide mapping, and immunohistochemistry. The objective of this study was to obtain protein and gene sequence information for this unique Tpm isoform. Samples of masseter (also a jaw-closing muscle), tibialis (with predominantly fast-twitch fibers), and the deep lateral gastrocnemius (predominantly slow-twitch fibers) were obtained from adult dogs. Expressed Tpm isoforms were cloned and sequencing yielded cDNAs that were identical to genomic predicted striated muscle Tpm1.1St(a,b,b,a) (historically referred to as αTpm), Tpm2.2St(a,b,b,a) (βTpm) and Tpm3.12St(a,b,b,a) (cTpm) isoforms (nomenclature reflects predominant tissue expression (“St”—striated muscle) and exon splicing pattern), as well as a novel 284 amino acid isoform observed in jaw-closing muscle that is identical to a genomic predicted product of the Tpm4 gene (δTpm) family. The novel isoform is designated as Tpm4.3St(a,b,b,a). The myofibrillar Tpm isoform expressed in dog masseter exhibits a unique electrophoretic mobility on gels containing 6 M urea, compared to other skeletal Tpm isoforms. To validate that the cloned Tpm4.3 isoform is the Tpm expressed in dog masseter, E. coli-expressed Tpm4.3 was electrophoresed in the presence of urea. Results demonstrate that Tpm4.3 has identical electrophoretic mobility to the unique dog masseter Tpm isoform and is of different mobility from that of muscle Tpm1.1, Tpm2.2 and Tpm3

  2. Exome sequencing identifies rare LDLR and APOA5 alleles conferring risk for myocardial infarction.

    PubMed

    Do, Ron; Stitziel, Nathan O; Won, Hong-Hee; Jørgensen, Anders Berg; Duga, Stefano; Angelica Merlini, Pier; Kiezun, Adam; Farrall, Martin; Goel, Anuj; Zuk, Or; Guella, Illaria; Asselta, Rosanna; Lange, Leslie A; Peloso, Gina M; Auer, Paul L; Girelli, Domenico; Martinelli, Nicola; Farlow, Deborah N; DePristo, Mark A; Roberts, Robert; Stewart, Alexander F R; Saleheen, Danish; Danesh, John; Epstein, Stephen E; Sivapalaratnam, Suthesh; Hovingh, G Kees; Kastelein, John J; Samani, Nilesh J; Schunkert, Heribert; Erdmann, Jeanette; Shah, Svati H; Kraus, William E; Davies, Robert; Nikpay, Majid; Johansen, Christopher T; Wang, Jian; Hegele, Robert A; Hechter, Eliana; Marz, Winfried; Kleber, Marcus E; Huang, Jie; Johnson, Andrew D; Li, Mingyao; Burke, Greg L; Gross, Myron; Liu, Yongmei; Assimes, Themistocles L; Heiss, Gerardo; Lange, Ethan M; Folsom, Aaron R; Taylor, Herman A; Olivieri, Oliviero; Hamsten, Anders; Clarke, Robert; Reilly, Dermot F; Yin, Wu; Rivas, Manuel A; Donnelly, Peter; Rossouw, Jacques E; Psaty, Bruce M; Herrington, David M; Wilson, James G; Rich, Stephen S; Bamshad, Michael J; Tracy, Russell P; Cupples, L Adrienne; Rader, Daniel J; Reilly, Muredach P; Spertus, John A; Cresci, Sharon; Hartiala, Jaana; Tang, W H Wilson; Hazen, Stanley L; Allayee, Hooman; Reiner, Alex P; Carlson, Christopher S; Kooperberg, Charles; Jackson, Rebecca D; Boerwinkle, Eric; Lander, Eric S; Schwartz, Stephen M; Siscovick, David S; McPherson, Ruth; Tybjaerg-Hansen, Anne; Abecasis, Goncalo R; Watkins, Hugh; Nickerson, Deborah A; Ardissino, Diego; Sunyaev, Shamil R; O'Donnell, Christopher J; Altshuler, David; Gabriel, Stacey; Kathiresan, Sekar

    2015-02-05

    Myocardial infarction (MI), a leading cause of death around the world, displays a complex pattern of inheritance. When MI occurs early in life, genetic inheritance is a major component to risk. Previously, rare mutations in low-density lipoprotein (LDL) genes have been shown to contribute to MI risk in individual families, whereas common variants at more than 45 loci have been associated with MI risk in the population. Here we evaluate how rare mutations contribute to early-onset MI risk in the population. We sequenced the protein-coding regions of 9,793 genomes from patients with MI at an early age (≤50 years in males and ≤60 years in females) along with MI-free controls. We identified two genes in which rare coding-sequence mutations were more frequent in MI cases versus controls at exome-wide significance. At low-density lipoprotein receptor (LDLR), carriers of rare non-synonymous mutations were at 4.2-fold increased risk for MI; carriers of null alleles at LDLR were at even higher risk (13-fold difference). Approximately 2% of early MI cases harbour a rare, damaging mutation in LDLR; this estimate is similar to one made more than 40 years ago using an analysis of total cholesterol. Among controls, about 1 in 217 carried an LDLR coding-sequence mutation and had plasma LDL cholesterol > 190 mg dl(-1). At apolipoprotein A-V (APOA5), carriers of rare non-synonymous mutations were at 2.2-fold increased risk for MI. When compared with non-carriers, LDLR mutation carriers had higher plasma LDL cholesterol, whereas APOA5 mutation carriers had higher plasma triglycerides. Recent evidence has connected MI risk with coding-sequence mutations at two genes functionally related to APOA5, namely lipoprotein lipase and apolipoprotein C-III (refs 18, 19). Combined, these observations suggest that, as well as LDL cholesterol, disordered metabolism of triglyceride-rich lipoproteins contributes to MI risk.

  3. SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data

    USGS Publications Warehouse

    Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).

  4. SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data.

    PubMed

    Miller, Mark P; Knaus, Brian J; Mullins, Thomas D; Haig, Susan M

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25 bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).

  5. Deep sequencing reveals the complete genome and evidence for transcriptional activity of the first virus-like sequences identified in Aristotelia chilensis (Maqui Berry).

    PubMed

    Villacreses, Javier; Rojas-Herrera, Marcelo; Sánchez, Carolina; Hewstone, Nicole; Undurraga, Soledad F; Alzate, Juan F; Manque, Patricio; Maracaja-Coutinho, Vinicius; Polanco, Victor

    2015-04-03

    Here, we report the genome sequence and evidence for transcriptional activity of a virus-like element in the native Chilean berry tree Aristotelia chilensis. We propose to name the endogenous sequence as Aristotelia chilensis Virus 1 (AcV1). High-throughput sequencing of the genome of this tree uncovered an endogenous viral element, with a size of 7122 bp, corresponding to the complete genome of AcV1. Its sequence contains three open reading frames (ORFs): ORFs 1 and 2 shares 66%-73% amino acid similarity with members of the Caulimoviridae virus family, especially the Petunia vein clearing virus (PVCV), Petuvirus genus. ORF1 encodes a movement protein (MP); ORF2 a Reverse Transcriptase (RT) and a Ribonuclease H (RNase H) domain; and ORF3 showed no amino acid sequence similarity with any other known virus proteins. Analogous to other known endogenous pararetrovirus sequences (EPRVs), AcV1 is integrated in the genome of Maqui Berry and showed low viral transcriptional activity, which was detected by deep sequencing technology (DNA and RNA-seq). Phylogenetic analysis of AcV1 and other pararetroviruses revealed a closer resemblance with Petuvirus. Overall, our data suggests that AcV1 could be a new member of Caulimoviridae family, genus Petuvirus, and the first evidence of this kind of virus in a fruit plant.

  6. Deep Sequencing Reveals the Complete Genome and Evidence for Transcriptional Activity of the First Virus-Like Sequences Identified in Aristotelia chilensis (Maqui Berry)

    PubMed Central

    Villacreses, Javier; Rojas-Herrera, Marcelo; Sánchez, Carolina; Hewstone, Nicole; Undurraga, Soledad F.; Alzate, Juan F.; Manque, Patricio; Maracaja-Coutinho, Vinicius; Polanco, Victor

    2015-01-01

    Here, we report the genome sequence and evidence for transcriptional activity of a virus-like element in the native Chilean berry tree Aristotelia chilensis. We propose to name the endogenous sequence as Aristotelia chilensis Virus 1 (AcV1). High-throughput sequencing of the genome of this tree uncovered an endogenous viral element, with a size of 7122 bp, corresponding to the complete genome of AcV1. Its sequence contains three open reading frames (ORFs): ORFs 1 and 2 shares 66%–73% amino acid similarity with members of the Caulimoviridae virus family, especially the Petunia vein clearing virus (PVCV), Petuvirus genus. ORF1 encodes a movement protein (MP); ORF2 a Reverse Transcriptase (RT) and a Ribonuclease H (RNase H) domain; and ORF3 showed no amino acid sequence similarity with any other known virus proteins. Analogous to other known endogenous pararetrovirus sequences (EPRVs), AcV1 is integrated in the genome of Maqui Berry and showed low viral transcriptional activity, which was detected by deep sequencing technology (DNA and RNA-seq). Phylogenetic analysis of AcV1 and other pararetroviruses revealed a closer resemblance with Petuvirus. Overall, our data suggests that AcV1 could be a new member of Caulimoviridae family, genus Petuvirus, and the first evidence of this kind of virus in a fruit plant. PMID:25855242

  7. FN-Identify: Novel Restriction Enzymes-Based Method for Bacterial Identification in Absence of Genome Sequencing.

    PubMed

    Awad, Mohamed; Ouda, Osama; El-Refy, Ali; El-Feky, Fawzy A; Mosa, Kareem A; Helmy, Mohamed

    2015-01-01

    Sequencing and restriction analysis of genes like 16S rRNA and HSP60 are intensively used for molecular identification in the microbial communities. With aid of the rapid progress in bioinformatics, genome sequencing became the method of choice for bacterial identification. However, the genome sequencing technology is still out of reach in the developing countries. In this paper, we propose FN-Identify, a sequencing-free method for bacterial identification. FN-Identify exploits the gene sequences data available in GenBank and other databases and the two algorithms that we developed, CreateScheme and GeneIdentify, to create a restriction enzyme-based identification scheme. FN-Identify was tested using three different and diverse bacterial populations (members of Lactobacillus, Pseudomonas, and Mycobacterium groups) in an in silico analysis using restriction enzymes and sequences of 16S rRNA gene. The analysis of the restriction maps of the members of three groups using the fragment numbers information only or along with fragments sizes successfully identified all of the members of the three groups using a minimum of four and maximum of eight restriction enzymes. Our results demonstrate the utility and accuracy of FN-Identify method and its two algorithms as an alternative method that uses the standard microbiology laboratories techniques when the genome sequencing is not available.

  8. Whole-Exome Sequencing Identifies Novel Somatic Mutations in Chinese Breast Cancer Patients

    PubMed Central

    Zhang, Yanfeng; Cai, Qiuyin; Shu, Xiao-Ou; Gao, Yu-Tang; Li, Chun; Zheng, Wei; Long, Jirong

    2016-01-01

    Most breast cancer genomes harbor complex mutational landscapes. Somatic alterations have been predominantly discovered in breast cancer patients of European ancestry; however, little is known about somatic aberration in patients of other ethnic groups including Asians. In the present study, whole-exome sequencing (WES) was conducted in DNA extracted from tumor and matched adjacent normal tissue samples from eleven early onset breast cancer patients who were included in the Shanghai Breast Cancer Study. We discovered 159 somatic missense and ten nonsense mutations distributed among 167 genes. The most frequent 50 somatic mutations identified by WES were selected for validation using Sequenom MassARRAY system in the eleven breast cancer patients and an additional 433 tumor and 921 normal tissue/blood samples from the Shanghai Breast Cancer Study. Among these 50 mutations selected for validation, 32 were technically validated. Within the validated mutations, somatic mutations in the TRPM6, HYDIN, ENTHD1, and NDUFB10 genes were found in two or more tumor samples in the replication stage. Mutations in the ADRA1B, CBFB, KIAA2022, and RBM25 genes were observed once in the replication stage. To summarize, this study identified some novel somatic mutations for breast cancer. Future studies will need to be conducted to determine the function of these mutations/genes in the breast carcinogenesis. PMID:26870154

  9. Sparse whole genome sequencing identifies two loci for major depressive disorder

    PubMed Central

    2015-01-01

    Major depressive disorder (MDD), one of the most frequently encountered forms of mental illness and a leading cause of disability worldwide1, poses a major challenge to genetic analysis. To date no robustly replicated genetic loci have been identified 2, despite analysis of more than 9,000 cases3. Using low coverage genome sequence of 5,303 Chinese women with recurrent MDD selected to reduce phenotypic heterogeneity, and 5,337 controls screened to exclude MDD, we identified and replicated two genome-wide significant loci contributing to risk of MDD on chromosome 10: one near the SIRT1 gene (P-value = 2.53×10−10) the other in an intron of the LHPP gene (P = 6.45×10−12). Analysis of 4,509 cases with a severe subtype of MDD, melancholia, yielded an increased genetic signal at the SIRT1 locus. We attribute our success to the recruitment of relatively homogeneous cases with severe illness. PMID:26176920

  10. Whole-exome sequencing identifies recurrent AKT1 mutations in sclerosing hemangioma of lung

    PubMed Central

    Jung, Seung-Hyun; Kim, Min Sung; Lee, Sung-Hak; Park, Hyun-Chun; Choi, Hyun Joo; Maeng, Leeso; Min, Ki Ouk; Kim, Jeana; Park, Tae In; Shin, Ok Ran; Kim, Tae-Jung; Xu, Haidong; Lee, Kyo Young; Kim, Tae-Min; Song, Sang Yong; Lee, Charles; Chung, Yeun-Jun; Lee, Sug Hyung

    2016-01-01

    Pulmonary sclerosing hemangioma (PSH) is a benign tumor with two cell populations (epithelial and stromal cells), for which genomic profiles remain unknown. We conducted exome sequencing of 44 PSHs and identified recurrent somatic mutations of AKT1 (43.2%) and β-catenin (4.5%). We used a second subset of 24 PSHs to confirm the high frequency of AKT1 mutations (overall 31/68, 45.6%; p.E17K, 33.8%) and recurrent β-catenin mutations (overall 3 of 68, 4.4%). Of the PSHs without AKT1 mutations, two exhibited AKT1 copy gain. AKT1 mutations existed in both epithelial and stromal cells. In two separate PSHs from one patient, we observed two different AKT1 mutations, indicating they were not disseminated but independent arising tumors. Because the AKT1 mutations were not found to co-occur with β-catenin mutations (or any other known driver alterations) in any of the PSHs studied, we speculate that this may be the single-most common driver alteration to develop PSHs. Our study revealed genomic differences between PSHs and lung adenocarcinomas, including a high rate of AKT1 mutation in PSHs. These genomic features of PSH identified in the present study provide clues to understanding the biology of PSH and for differential genomic diagnosis of lung tumors. PMID:27601661

  11. Identifying and removing the cell-cycle effect from single-cell RNA-Sequencing data

    PubMed Central

    Barron, Martin; Li, Jun

    2016-01-01

    Single-cell RNA-Sequencing (scRNA-Seq) is a revolutionary technique for discovering and describing cell types in heterogeneous tissues, yet its measurement of expression often suffers from large systematic bias. A major source of this bias is the cell cycle, which introduces large within-cell-type heterogeneity that can obscure the differences in expression between cell types. The current method for removing the cell-cycle effect is unable to effectively identify this effect and has a high risk of removing other biological components of interest, compromising downstream analysis. We present ccRemover, a new method that reliably identifies the cell-cycle effect and removes it. ccRemover preserves other biological signals of interest in the data and thus can serve as an important pre-processing step for many scRNA-Seq data analyses. The effectiveness of ccRemover is demonstrated using simulation data and three real scRNA-Seq datasets, where it boosts the performance of existing clustering algorithms in distinguishing between cell types. PMID:27670849

  12. Identifying Plasmodium falciparum EBA-175 homologue sequences that specifically bind to human erythrocytes.

    PubMed

    Valbuena, John Jairo; Bravo, Ricardo Vera; Ocampo, Marisol; Lopez, Ramses; Rodriguez, Luis E; Curtidor, Hernando; Puentes, Alvaro; Garcia, Javier E; Tovar, Diana; Gomez, Johana; Leiton, Jesus; Patarroyo, Manuel Elkin

    2004-09-03

    Erythrocyte binding antigen-160 (EBA-160) protein is a Plasmodium falciparum antigen homologue from the erythrocyte binding protein family (EBP). It has been shown that the EBP family plays a role in parasite binding to the erythrocyte surface. The EBA-160 sequence has been chemically synthesised in seventy 20-mer sequential peptides covering the entire 3D7 protein strain, each of which was tested in erythrocyte binding assays to identify possible EBA-160 functional regions. Five EBA-160 high activity binding peptides (HABPs) specifically binding to erythrocytes with high affinity were identified. Dissociation constants lay between 200 and 460 nM and Hill coefficients between 1.5 and 2.3. Erythrocyte membrane protein binding peptide cross-linking assays using SDS-PAGE showed that these peptides bound specifically to 12, 28, and 44 kDa erythrocyte membrane proteins. The nature of these receptor sites was studied in peptide binding assays using enzyme-treated erythrocytes. HABPs were able to block merozoite in vitro invasion of erythrocytes. HABPs' potential as anti-malarial vaccine candidates is also discussed.

  13. Sparse whole-genome sequencing identifies two loci for major depressive disorder.

    PubMed

    2015-07-30

    Major depressive disorder (MDD), one of the most frequently encountered forms of mental illness and a leading cause of disability worldwide, poses a major challenge to genetic analysis. To date, no robustly replicated genetic loci have been identified, despite analysis of more than 9,000 cases. Here, using low-coverage whole-genome sequencing of 5,303 Chinese women with recurrent MDD selected to reduce phenotypic heterogeneity, and 5,337 controls screened to exclude MDD, we identified, and subsequently replicated in an independent sample, two loci contributing to risk of MDD on chromosome 10: one near the SIRT1 gene (P = 2.53 × 10(-10)), the other in an intron of the LHPP gene (P = 6.45 × 10(-12)). Analysis of 4,509 cases with a severe subtype of MDD, melancholia, yielded an increased genetic signal at the SIRT1 locus. We attribute our success to the recruitment of relatively homogeneous cases with severe illness.

  14. Exome Sequencing Identifies INPPL1 Mutations as a Cause of Opsismodysplasia

    PubMed Central

    Huber, Céline; Faqeih, Eissa Ali; Bartholdi, Deborah; Bole-Feysot, Christine; Borochowitz, Zvi; Cavalcanti, Denise P.; Frigo, Amandine; Nitschke, Patrick; Roume, Joelle; Santos, Heloísa G.; Shalev, Stavit A.; Superti-Furga, Andrea; Delezoide, Anne-Lise; Le Merrer, Martine; Munnich, Arnold; Cormier-Daire, Valérie

    2013-01-01

    Opsismodysplasia (OPS) is a severe autosomal-recessive chondrodysplasia characterized by pre- and postnatal micromelia with extremely short hands and feet. The main radiological features are severe platyspondyly, squared metacarpals, delayed skeletal ossification, and metaphyseal cupping. In order to identify mutations causing OPS, a total of 16 cases (7 terminated pregnancies and 9 postnatal cases) from 10 unrelated families were included in this study. We performed exome sequencing in three cases from three unrelated families and only one gene was found to harbor mutations in all three cases: inositol polyphosphate phosphatase-like 1 (INPPL1). Screening INPPL1 in the remaining cases identified a total of 12 distinct INPPL1 mutations in the 10 families, present at the homozygote state in 7 consanguinous families and at the compound heterozygote state in the 3 remaining families. Most mutations (6/12) resulted in premature stop codons, 2/12 were splice site, and 4/12 were missense mutations located in the catalytic domain, 5-phosphatase. INPPL1 belongs to the inositol-1,4,5-trisphosphate 5-phosphatase family, a family of signal-modulating enzymes that govern a plethora of cellular functions by regulating the levels of specific phosphoinositides. Our finding of INPPL1 mutations in OPS, a severe spondylodysplastic dysplasia with major growth plate disorganization, supports a key and specific role of this enzyme in endochondral ossification. PMID:23273569

  15. Exome sequencing identifies MAX mutations as a cause of hereditary pheochromocytoma.

    PubMed

    Comino-Méndez, Iñaki; Gracia-Aznárez, Francisco J; Schiavi, Francesca; Landa, Iñigo; Leandro-García, Luis J; Letón, Rocío; Honrado, Emiliano; Ramos-Medina, Rocío; Caronia, Daniela; Pita, Guillermo; Gómez-Graña, Alvaro; de Cubas, Aguirre A; Inglada-Pérez, Lucía; Maliszewska, Agnieszka; Taschin, Elisa; Bobisse, Sara; Pica, Giuseppe; Loli, Paola; Hernández-Lavado, Rafael; Díaz, José A; Gómez-Morales, Mercedes; González-Neira, Anna; Roncador, Giovanna; Rodríguez-Antona, Cristina; Benítez, Javier; Mannelli, Massimo; Opocher, Giuseppe; Robledo, Mercedes; Cascón, Alberto

    2011-06-19

    Hereditary pheochromocytoma (PCC) is often caused by germline mutations in one of nine susceptibility genes described to date, but there are familial cases without mutations in these known genes. We sequenced the exomes of three unrelated individuals with hereditary PCC (cases) and identified mutations in MAX, the MYC associated factor X gene. Absence of MAX protein in the tumors and loss of heterozygosity caused by uniparental disomy supported the involvement of MAX alterations in the disease. A follow-up study of a selected series of 59 cases with PCC identified five additional MAX mutations and suggested an association with malignant outcome and preferential paternal transmission of MAX mutations. The involvement of the MYC-MAX-MXD1 network in the development and progression of neural crest cell tumors is further supported by the lack of functional MAX in rat PCC (PC12) cells and by the amplification of MYCN in neuroblastoma and suggests that loss of MAX function is correlated with metastatic potential.

  16. Comparison of inherently essential genes of Porphyromonas gingivalis identified in two transposon-sequencing libraries.

    PubMed

    Hutcherson, J A; Gogeneni, H; Yoder-Himes, D; Hendrickson, E L; Hackett, M; Whiteley, M; Lamont, R J; Scott, D A

    2016-08-01

    Porphyromonas gingivalis is a Gram-negative anaerobe and keystone periodontal pathogen. A mariner transposon insertion mutant library has recently been used to define 463 genes as putatively essential for the in vitro growth of P. gingivalis ATCC 33277 in planktonic culture (Library 1). We have independently generated a transposon insertion mutant library (Library 2) for the same P. gingivalis strain and herein compare genes that are putatively essential for in vitro growth in complex media, as defined by both libraries. In all, 281 genes (61%) identified by Library 1 were common to Library 2. Many of these common genes are involved in fundamentally important metabolic pathways, notably pyrimidine cycling as well as lipopolysaccharide, peptidoglycan, pantothenate and coenzyme A biosynthesis, and nicotinate and nicotinamide metabolism. Also in common are genes encoding heat-shock protein homologues, sigma factors, enzymes with proteolytic activity, and the majority of sec-related protein export genes. In addition to facilitating a better understanding of critical physiological processes, transposon-sequencing technology has the potential to identify novel strategies for the control of P. gingivalis infections. Those genes defined as essential by two independently generated TnSeq mutant libraries are likely to represent particularly attractive therapeutic targets.

  17. Diverse Sources of C. difficile Infection Identified on Whole-Genome Sequencing

    PubMed Central

    Eyre, David W.; Cule, Madeleine L.; Wilson, Daniel J.; Griffiths, David; Vaughan, Alison; O’Connor, Lily; Ip, Camilla L.C.; Golubchik, Tanya; Batty, Elizabeth M.; Finney, John M.; Wyllie, David H.; Didelot, Xavier; Piazza, Paolo; Bowden, Rory; Dingle, Kate E.; Harding, Rosalind M.

    2013-01-01

    BACKGROUND It has been thought that Clostridium difficile infection is transmitted predominantly within health care settings. However, endemic spread has hampered identification of precise sources of infection and the assessment of the efficacy of interventions. METHODS From September 2007 through March 2011, we performed whole-genome sequencing on isolates obtained from all symptomatic patients with C. difficile infection identified in health care settings or in the community in Oxfordshire, United Kingdom. We compared single-nucleotide variants (SNVs) between the isolates, using C. difficile evolution rates estimated on the basis of the first and last samples obtained from each of 145 patients, with 0 to 2 SNVs expected between transmitted isolates obtained less than 124 days apart, on the basis of a 95% prediction interval. We then identified plausible epidemiologic links among genetically related cases from data on hospital admissions and community location. RESULTS Of 1250 C. difficile cases that were evaluated, 1223 (98%) were successfully sequenced. In a comparison of 957 samples obtained from April 2008 through March 2011 with those obtained from September 2007 onward, a total of 333 isolates (35%) had no more than 2 SNVs from at least 1 earlier case, and 428 isolates (45%) had more than 10 SNVs from all previous cases. Reductions in incidence over time were similar in the two groups, a finding that suggests an effect of interventions targeting the transition from exposure to disease. Of the 333 patients with no more than 2 SNVs (consistent with transmission), 126 patients (38%) had close hospital contact with another patient, and 120 patients (36%) had no hospital or community contact with another patient. Distinct subtypes of infection continued to be identified throughout the study, which suggests a considerable reservoir of C. difficile. CONCLUSIONS Over a 3-year period, 45% of C. difficile cases in Oxfordshire were genetically distinct from all

  18. iRSpot-TNCPseAAC: identify recombination spots with trinucleotide composition and pseudo amino acid components.

    PubMed

    Qiu, Wang-Ren; Xiao, Xuan; Chou, Kuo-Chen

    2014-01-24

    Meiosis and recombination are the two opposite aspects that coexist in a DNA system. As a driving force for evolution by generating natural genetic variations, meiotic recombination plays a very important role in the formation of eggs and sperm. Interestingly, the recombination does not occur randomly across a genome, but with higher probability in some genomic regions called "hotspots", while with lower probability in so-called "coldspots". With the ever-increasing amount of genome sequence data in the postgenomic era, computational methods for effectively identifying the hotspots and coldspots have become urgent as they can timely provide us with useful insights into the mechanism of meiotic recombination and the process of genome evolution as well. To meet the need, we developed a new predictor called "iRSpot-TNCPseAAC", in which a DNA sample was formulated by combining its trinucleotide composition (TNC) and the pseudo amino acid components (PseAAC) of the protein translated from the DNA sample according to its genetic codes. The former was used to incorporate its local or short-rage sequence order information; while the latter, its global and long-range one. Compared with the best existing predictor in this area, iRSpot-TNCPseAAC achieved higher rates in accuracy, Mathew's correlation coefficient, and sensitivity, indicating that the new predictor may become a useful tool for identifying the recombination hotspots and coldspots, or, at least, become a complementary tool to the existing methods. It has not escaped our notice that the aforementioned novel approach to incorporate the DNA sequence order information into a discrete model may also be used for many other genome analysis problems. The web-server for iRSpot-TNCPseAAC is available at http://www.jci-bioinfo.cn/iRSpot-TNCPseAAC. Furthermore, for the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the current web server to obtain their desired

  19. Whole-genome sequencing identifies emergence of a quinolone resistance mutation in a case of Stenotrophomonas maltophilia bacteremia.

    PubMed

    Pak, Theodore R; Altman, Deena R; Attie, Oliver; Sebra, Robert; Hamula, Camille L; Lewis, Martha; Deikus, Gintaras; Newman, Leah C; Fang, Gang; Hand, Jonathan; Patel, Gopi; Wallach, Fran; Schadt, Eric E; Huprikar, Shirish; van Bakel, Harm; Kasarskis, Andrew; Bashir, Ali

    2015-11-01

    Whole-genome sequences for Stenotrophomonas maltophilia serial isolates from a bacteremic patient before and after development of levofloxacin resistance were assembled de novo and differed by one single-nucleotide variant in smeT, a repressor for multidrug efflux operon smeDEF. Along with sequenced isolates from five contemporaneous cases, they displayed considerable diversity compared against all published complete genomes. Whole-genome sequencing and complete assembly can conclusively identify resistance mechanisms emerging in S. maltophilia strains during clinical therapy.

  20. Completion of the amino acid sequence of the alpha 1 chain from type I calf skin collagen. Amino acid sequence of alpha 1(I)B8.

    PubMed Central

    Glanville, R W; Breitkreutz, D; Meitinger, M; Fietzek, P P

    1983-01-01

    The complete amino acid sequence of the 279-residue CNBr peptide CB8 from the alpha 1 chain of type I calf skin collagen is presented. It was determined by sequencing overlapping fragments of CB8 produced by Staphylococcus aureus V8 proteinase, trypsin, Endoproteinase Arg-C and hydroxylamine. Tryptic cleavages were also made specific for lysine by blocking arginine residues with cyclohexane-1,2-dione. This completes the amino acid sequence analysis of the 1054-residues-long alpha (I) chain of calf skin collagen. PMID:6354180

  1. Pitfalls of mapping high throughput sequencing data to repetitive sequences: Piwi’s genomic targets still not identified

    PubMed Central

    Marinov, Georgi K.; Wang, Jie; Handler, Dominik; Wold, Barbara J.; Weng, Zhiping; Hannon, Gregory J.; Aravin, Alexei A.; Zamore, Phillip D.; Brennecke, Julius; Toth, Katalin Fejes

    2015-01-01

    Huang et al. (2013) recently reported that chromatin immuno-precipitation followed by sequencing (ChIP-seq) reveals the genome-wide sites of occupancy by Piwi - a piRNA-guided Argonaute protein central to transposon silencing in Drosophila. Their study also reported that loss of Piwi causes widespread rewiring of transcriptional patterns as evidenced by changes in RNA polymerase II occupancy across the genome. Here we reanalyze their underlying deep sequencing data and report that the data do not support the author’s central conclusions. PMID:25805138

  2. Whole-exome sequencing identifies MST1R as a genetic susceptibility gene in nasopharyngeal carcinoma

    PubMed Central

    Dai, Wei; Zheng, Hong; Cheung, Arthur Kwok Leung; Tang, Clara Sze-man; Ko, Josephine Mun Yee; Wong, Bonnie Wing Yan; Leong, Merrin Man Long; Sham, Pak Chung; Cheung, Florence; Kwong, Dora Lai-Wan; Ngan, Roger Kai Cheong; Ng, Wai Tong; Yau, Chun Chung; Pan, Jianji; Peng, Xun; Tung, Stewart; Zhang, Zengfeng; Ji, Mingfang; Chiang, Alan Kwok-Shing; Lee, Anne Wing-Mui; Lee, Victor Ho-fun; Lam, Ka-On; Au, Kwok Hung; Cheng, Hoi Ching; Yiu, Harry Ho-Yin; Lung, Maria Li

    2016-01-01

    Multiple factors, including host genetics, environmental factors, and Epstein–Barr virus (EBV) infection, contribute to nasopharyngeal carcinoma (NPC) development. To identify genetic susceptibility genes for NPC, a whole-exome sequencing (WES) study was performed in 161 NPC cases and 895 controls of Southern Chinese descent. The gene-based burden test discovered an association between macrophage-stimulating 1 receptor (MST1R) and NPC. We identified 13 independent cases carrying the MST1R pathogenic heterozygous germ-line variants, and 53.8% of these cases were diagnosed with NPC aged at or even younger than 20 y, indicating that MST1R germ-line variants are relevant to disease early-age onset (EAO) (age of ≤20 y). In total, five MST1R missense variants were found in EAO cases but were rare in controls (EAO vs. control, 17.9% vs. 1.2%, P = 7.94 × 10−12). The validation study, including 2,160 cases and 2,433 controls, showed that the MST1R variant c.G917A:p.R306H is highly associated with NPC (odds ratio of 9.0). MST1R is predominantly expressed in the tissue-resident macrophages and is critical for innate immunity that protects organs from tissue damage and inflammation. Importantly, MST1R expression is detected in the ciliated epithelial cells in normal nasopharyngeal mucosa and plays a role in the cilia motility important for host defense. Although no somatic mutation of MST1R was identified in the sporadic NPC tumors, copy number alterations and promoter hypermethylation at MST1R were often observed. Our findings provide new insights into the pathogenesis of NPC by highlighting the involvement of the MST1R-mediated signaling pathways. PMID:26951679

  3. Amino acid substitutions in genetic variants of human serum albumin and in sequences inferred from molecular cloning

    SciTech Connect

    Takahashi, N.; Takahashi, Y.; Blumberg, B.S.; Putnam, F.W.

    1987-07-01

    The structural changes in four genetic variants of human serum albumin were analyzed by tandem high-pressure liquid chromatography (HPLC) of the tryptic peptides, HPLC mapping and isoelectric focusing of the CNBr fragments, and amino acid sequence analysis of the purified peptides. Lysine-372 of normal (common) albumin A was changed to glutamic acid both in albumin Naskapi, a widespread polymorphic variant of North American Indians, and in albumin Mersin found in Eti Turks. The two variants also exhibited anomalous migration in NaDodSO/sub 4//PAGE, which is attributed to a conformational change. The identity of albumins Naskapi and Mersin may have originated through descent from a common mid-Asiatic founder of the two migrating ethnic groups, or it may represent identical but independent mutations of the albumin gene. In albumin Adana, from Eti Turks, the substitution site was not identified but was localized to the region from positions 447 through 548. The substitution of aspartic acid-550 by glycine was found in albumin Mexico-2 from four individuals of the Pima tribe. Although only single-point substitutions have been found in these and in certain other genetic variants of human albumin, five differences exist in the amino acid sequences inferred from cDNA sequences by workers in three other laboratories. However, our results on albumin A and on 14 different genetic variants accord with the amino acid sequence of albumin deduced from the genomic sequence. The apparent amino acid substitutions inferred from comparison of individual cDNA sequences probably reflect artifacts in cloning or in cDNA sequence analysis rather than polymorphism of the coding sections of the albumin gene.

  4. An Integrated Sequence-Structure Database incorporating matching mRNA sequence, amino acid sequence and protein three-dimensional structure data.

    PubMed Central

    Adzhubei, I A; Adzhubei, A A; Neidle, S

    1998-01-01

    We have constructed a non-homologous database, termed the Integrated Sequence-Structure Database (ISSD) which comprises the coding sequences of genes, amino acid sequences of the corresponding proteins, their secondary structure and straight phi,psi angles assignments, and polypeptide backbone coordinates. Each protein entry in the database holds the alignment of nucleotide sequence, amino acid sequence and the PDB three-dimensional structure data. The nucleotide and amino acid sequences for each entry are selected on the basis of exact matches of the source organism and cell environment. The current version 1.0 of ISSD is available on the WWW at http://www.protein.bio.msu.su/issd/ and includes 107 non-homologous mammalian proteins, of which 80 are human proteins. The database has been used by us for the analysis of synonymous codon usage patterns in mRNA sequences showing their correlation with the three-dimensional structure features in the encoded proteins. Possible ISSD applications include optimisation of protein expression, improvement of the protein structure prediction accuracy, and analysis of evolutionary aspects of the nucleotide sequence-protein structure relationship. PMID:9399866

  5. Whole-exome sequencing to identify somatic mutations in peritoneal metastatic gastric adenocarcinoma: A preliminary study

    PubMed Central

    Zhu, Yu; Li, Tingting; Huang, Haipeng; Lin, Tian; Hu, Yanfeng; Qi, Xiaolong; Yu, Jiang; Li, Guoxin

    2016-01-01

    Peritoneal metastasis occurs in more than half of patients with unresectable or recurrent gastric cancer and is associated with the worst prognosis. The associated genomic events and pathogenesis remain ambiguous. The aim of the present study was to characterize the mutation spectrum of gastric cancer with peritoneal metastasis and provide a basis for the identification of new biomarkers and treatment targets. Matched pairs of normal gastric mucosa and peritoneal tissue and matched pairs of primary tumor and peritoneal metastasis were collected from one patient for whole-exome sequencing (WES); Sanger sequencing was employed to confirm the somatic mutations. G>A and C>T mutations were the two most frequent transversions among the somatic mutations. We confirmed 48somatic mutations in the primary site and 49 in the peritoneal site. Additionally, 25 non-synonymous somatic variations (single-nucleotide variants, SNVs) and 2 somatic insertions/deletions (INDELs) were confirmed in the primary tumor, and 30 SNVs and 5 INDELs were verified in the peritoneal metastasis. Approximately 59% of the somatic mutations were shared between the primary and metastatic site. Five genes (TP53, BAI1, THSD1, ARID2, and KIAA2022) verified in our study were also mutated at a frequency greater than 5%in the COSMIC database. We also identified 9genes (ERBB4, ZNF721, NT5E, PDE10A, CA1, NUMB, NBN, ZFYVE16, and NCAM1) that were only mutated in metastasis and are expected to become treatment targets. In conclusion, we observed that the majority of the somatic mutations in the primary site persisted in metastasis, whereas several single-nucleotide polymorphisms occurred de novo at the second site. PMID:27270314

  6. Multilocus Sequence Typing Identifies Epidemic Clones of Flavobacterium psychrophilum in Nordic Countries

    PubMed Central

    Duchaud, Eric; Nicolas, Pierre; Dalsgaard, Inger; Madsen, Lone; Aspán, Anna; Jansson, Eva; Colquhoun, Duncan J.; Wiklund, Tom

    2014-01-01

    Flavobacterium psychrophilum is the causative agent of bacterial cold water disease (BCWD), which affects a variety of freshwater-reared salmonid species. A large-scale study was performed to investigate the genetic diversity of F. psychrophilum in the four Nordic countries: Denmark, Finland, Norway, and Sweden. Multilocus sequence typing of 560 geographically and temporally disparate F. psychrophilum isolates collected from various sources between 1983 and 2012 revealed 81 different sequence types (STs) belonging to 12 clonal complexes (CCs) and 30 singleton STs. The largest CC, CC-ST10, which represented almost exclusively isolates from rainbow trout and included the most predominant genotype, ST2, comprised 65% of all isolates examined. In Norway, with a shorter history (<10 years) of BCWD in rainbow trout, ST2 was the only isolated CC-ST10 genotype, suggesting a recent introduction of an epidemic clone. The study identified five additional CCs shared between countries and five country-specific CCs, some with apparent host specificity. Almost 80% of the singleton STs were isolated from non-rainbow trout species or the environment. The present study reveals a simultaneous presence of genetically distinct CCs in the Nordic countries and points out specific F. psychrophilum STs posing a threat to the salmonid production. The study provides a significant contribution toward mapping the genetic diversity of F. psychrophilum globally and support for the existence of an epidemic population structure where recombination is a significant driver in F. psychrophilum evolution. Evidence indicating dissemination of a putatively virulent clonal complex (CC-ST10) with commercial movement of fish or fish products is strengthened. PMID:24561585

  7. Exome sequencing coupled with mRNA analysis identifies NDUFAF6 as a Leigh gene.

    PubMed

    Bianciardi, Laura; Imperatore, Valentina; Fernandez-Vizarra, Erika; Lopomo, Angela; Falabella, Micol; Furini, Simone; Galluzzi, Paolo; Grosso, Salvatore; Zeviani, Massimo; Renieri, Alessandra; Mari, Francesca; Frullanti, Elisa

    2016-11-01

    We report here the case of a young male who started to show verbal fluency disturbance, clumsiness and gait anomalies at the age of 3.5years and presented bilateral striatal necrosis. Clinically, the diagnosis was compatible with Leigh syndrome but the underlying molecular defect remained elusive even after exome analysis using autosomal/X-linked recessive or de novo models. Dosage of respiratory chain activity on fibroblasts, but not in muscle, underlined a deficit in complex I. Re-analysis of heterozygous probably pathogenic variants, inherited from one healthy parent, identified the p.Ala178Pro in NDUFAF6, a complex I assembly factor. RNA analysis showed an almost mono-allelic expression of the mutated allele in blood and fibroblasts and puromycin treatment on cultured fibroblasts did not lead to the rescue of the maternal allele expression, not supporting the involvement of nonsense-mediated RNA decay mechanism. Complementation assay underlined a recovery of complex I activity after transduction of the wild-type gene. Since the second mutation was not detected and promoter methylation analysis resulted normal, we hypothesized a non-exonic event in the maternal allele affecting a regulatory element that, in conjunction with the paternal mutation, leads to the autosomal recessive disorder and the different allele expression in various tissues. This paper confirms NDUFAF6 as a genuine morbid gene and proposes the coupling of exome sequencing with mRNA analysis as a method useful for enhancing the exome sequencing detection rate when the simple application of classical inheritance models fails.

  8. Unconventional amino acid sequence of the sun anemone (Stoichactis helianthus) polypeptide neurotoxin

    SciTech Connect

    Kem, W.; Dunn, B.; Parten, B.; Pennington, M.; Price, D.

    1986-05-01

    A 5000 dalton polypeptide neurotoxin (Sh-NI) purified by G50 Sephadex, P-cellulose, and SP-Sephadex chromatography was homogeneous by isoelectric focusing. Sh-NI was highly toxic to crayfish (LD/sub 50/ 0.6 ..mu..g/kg) but without effect upon mice at 15,000 ..mu..g/kg (i.p. injection). The reduced, /sup 3/H-carboxymethylated toxin and its fragments were subjected to automatic Edman degradation and the resulting PTH-amino acids were identified by HPLC, back hydrolysis, and scintillation counting. Peptides resulting from proteolytic (clostripain, staphylococcal protease) and chemical (tryptophan) cleavage were sequenced. The sequence is: AACKCDDEGPDIRTAPLTGTVDLGSCNAGWEKCASYYTIIADCCRKKK. This sequence differs considerably from the homologous Anemonia and Anthopleura toxins; many of the identical residues (6 half-cystines, G9, P10, R13, G19, G29, W30) are probably critical for folding rather than receptor recognition. However, the Sh-NI sequence closely resembles Radioanthus macrodactylus neurotoxin III and r. paumotensis II. The authors propose that Sh-NI and related Radioanthus toxins act upon a different site on the sodium channel.

  9. Complete amino acid sequence and structure characterization of the taste-modifying protein, miraculin.

    PubMed

    Theerasilp, S; Hitotsuya, H; Nakajo, S; Nakaya, K; Nakamura, Y; Kurihara, Y

    1989-04-25

    The taste-modifying protein, miraculin, has the unusual property of modifying sour taste into sweet taste. The complete amino acid sequence of miraculin purified from miracle fruits by a newly developed method (Theerasilp, S., and Kurihara, Y. (1988) J. Biol. Chem. 263, 11536-11539) was determined by an automatic Edman degradation method. Miraculin was a single polypeptide with 191 amino acid residues. The calculated molecular weight based on the amino acid sequence and the carbohydrate content (13.9%) was 24,600. Asn-42 and Asn-186 were linked N-glycosidically to carbohydrate chains. High homology was found between the amino acid sequences of miraculin and soybean trypsin inhibitor.

  10. Complete Genome Sequence of Seneca Valley Virus CH-01-2015 Identified in China.

    PubMed

    Wu, Qiwen; Zhao, Xiaoya; Chen, Yanshan; He, Xiaoming; Zhang, Guanqun; Ma, Jingyun

    2016-01-21

    The complete genome sequence of Seneca Valley virus (SVV), a single-stranded RNA virus that causes porcine vesicular disease in China, has been sequenced and analyzed. This Chinese isolate shares 94.4 to 97.1% sequence identity to another 8 strains from Canada, Brazil, and the United States. This is the first report of SVV infecting swine in China.

  11. Third-Generation Sequencing and Analysis of Four Complete Pig Liver Esterase Gene Sequences in Clones Identified by Screening BAC Library

    PubMed Central

    Zhou, Qiongqiong; Sun, Wenjuan; Liu, Xiyan; Wang, Xiliang; Xiao, Yuncai; Bi, Dingren; Yin, Jingdong; Shi, Deshi

    2016-01-01

    Aim Pig liver carboxylesterase (PLE) gene sequences in GenBank are incomplete, which has led to difficulties in studying the genetic structure and regulation mechanisms of gene expression of PLE family genes. The aim of this study was to obtain and analysis of complete gene sequences of PLE family by screening from a Rongchang pig BAC library and third-generation PacBio gene sequencing. Methods After a number of existing incomplete PLE isoform gene sequences were analysed, primers were designed based on conserved regions in PLE exons, and the whole pig genome used as a template for Polymerase chain reaction (PCR) amplification. Specific primers were then selected based on the PCR amplification results. A three-step PCR screening method was used to identify PLE-positive clones by screening a Rongchang pig BAC library and PacBio third-generation sequencing was performed. BLAST comparisons and other bioinformatics methods were applied for sequence analysis. Results Five PLE-positive BAC clones, designated BAC-10, BAC-70, BAC-75, BAC-119 and BAC-206, were identified. Sequence analysis yielded the complete sequences of four PLE genes, PLE1, PLE-B9, PLE-C4, and PLE-G2. Complete PLE gene sequences were defined as those containing regulatory sequences, exons, and introns. It was found that, not only did the PLE exon sequences of the four genes show a high degree of homology, but also that the intron sequences were highly similar. Additionally, the regulatory region of the genes contained two 720bps reverse complement sequences that may have an important function in the regulation of PLE gene expression. Significance This is the first report to confirm the complete sequences of four PLE genes. In addition, the study demonstrates that each PLE isoform is encoded by a single gene and that the various genes exhibit a high degree of sequence homology, suggesting that the PLE family evolved from a single ancestral gene. Obtaining the complete sequences of these PLE genes

  12. Detection and isolation of nucleic acid sequences using a bifunctional hybridization probe

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    2000-01-01

    A method for detecting and isolating a target sequence in a sample of nucleic acids is provided using a bifunctional hybridization probe capable of hybridizing to the target sequence that includes a detectable marker and a first complexing agent capable of forming a binding pair with a second complexing agent. A kit is also provided for detecting a target sequence in a sample of nucleic acids using a bifunctional hybridization probe according to this method.

  13. Identify biosorption effects of Thiobacillus towards perfluorooctanoic acid (PFOA): Pilot study from field to laboratory.

    PubMed

    Li, Lei; Wang, Tieyu; Sun, Yajun; Wang, Pei; Yvette, Baninla; Meng, Jing; Li, Qifeng; Zhou, Yunqiao

    2017-03-01

    The concentration of Perfluoroalkyl acids (PFAAs) and the bacterial community composition along the Xiaoqing River were explored with HPLC-MS/MS and Illumina high-throughput sequencing in present study. The results showed that perfluorooctanoic acid (PFOA) was the predominant PFAAs in all sediment samples, and high level of PFOA could lead to an evident increase in the abundance of Thiobacillus. Thiobacillus was identified with the survival ability in high concentrations of PFOA accordingly. Therefore, Thiobacillus thioparus and Thiobacillus denitrificans were selected as receptors to design indoor biosorption experiment. The growth curves under different PFOA concentrations and residual rates of PFOA in the processes of cultivation were analyzed. The results showed that upwards concentrations of PFOA below 5000 ng/L led to an obvious increase in the growth rate of T. thioparus. Whereas PFOA promoted the growth of T. denitrificans in a relatively limited range of concentration, and the effect was not obvious. The addition of different concentrations of PFOA had no apparent effects on pH values in the media of both T. thioparus and T. denitrificans. The concentrations of PFOA in liquid media reduced after the process of bacteria culturing. The removal rates of T. thioparus and T. denitrificans to PFOA were 21.1-26.8% and 13.5-18.4%, respectively. The current findings indicated that T. thioparus could play a significant role as potential biosorbent with the ability to eliminate PFOA effectively in aquatic environment, which would provide novel information for PFOA ecological decontamination and remediation.

  14. Long Non-Coding RNA and Alternative Splicing Modulations in Parkinson's Leukocytes Identified by RNA Sequencing

    PubMed Central

    Soreq, Lilach; Guffanti, Alessandro; Salomonis, Nathan; Simchovitz, Alon; Israel, Zvi; Bergman, Hagai; Soreq, Hermona

    2014-01-01

    The continuously prolonged human lifespan is accompanied by increase in neurodegenerative diseases incidence, calling for the development of inexpensive blood-based diagnostics. Analyzing blood cell transcripts by RNA-Seq is a robust means to identify novel biomarkers that rapidly becomes a commonplace. However, there is lack of tools to discover novel exons, junctions and splicing events and to precisely and sensitively assess differential splicing through RNA-Seq data analysis and across RNA-Seq platforms. Here, we present a new and comprehensive computational workflow for whole-transcriptome RNA-Seq analysis, using an updated version of the software AltAnalyze, to identify both known and novel high-confidence alternative splicing events, and to integrate them with both protein-domains and microRNA binding annotations. We applied the novel workflow on RNA-Seq data from Parkinson's disease (PD) patients' leukocytes pre- and post- Deep Brain Stimulation (DBS) treatment and compared to healthy controls. Disease-mediated changes included decreased usage of alternative promoters and N-termini, 5′-end variations and mutually-exclusive exons. The PD regulated FUS and HNRNP A/B included prion-like domains regulated regions. We also present here a workflow to identify and analyze long non-coding RNAs (lncRNAs) via RNA-Seq data. We identified reduced lncRNA expression and selective PD-induced changes in 13 of over 6,000 detected leukocyte lncRNAs, four of which were inversely altered post-DBS. These included the U1 spliceosomal lncRNA and RP11-462G22.1, each entailing sequence complementarity to numerous microRNAs. Analysis of RNA-Seq from PD and unaffected controls brains revealed over 7,000 brain-expressed lncRNAs, of which 3,495 were co-expressed in the leukocytes including U1, which showed both leukocyte and brain increases. Furthermore, qRT-PCR validations confirmed these co-increases in PD leukocytes and two brain regions, the amygdala and substantia

  15. A 25-Amino Acid Sequence of the Arabidopsis TGD2 Protein Is Sufficient for Specific Binding of Phosphatidic Acid*

    PubMed Central

    Lu, Binbin; Benning, Christoph

    2009-01-01

    Genetic analysis suggests that the TGD2 protein of Arabidopsis is required for the biosynthesis of endoplasmic reticulum derived thylakoid lipids. TGD2 is proposed to be the substrate-binding protein of a presumed lipid transporter consisting of the TGD1 (permease) and TGD3 (ATPase) proteins. The TGD1, -2, and -3 proteins are localized in the inner chloroplast envelope membrane. TGD2 appears to be anchored with an N-terminal membrane-spanning domain into the inner envelope membrane, whereas the C-terminal domain faces the intermembrane space. It was previously shown that the C-terminal domain of TGD2 binds phosphatidic acid (PtdOH). To investigate the PtdOH binding site of TGD2 in detail, the C-terminal domain of the TGD2 sequence lacking the transit peptide and transmembrane sequences was fused to the C terminus of the Discosoma sp. red fluorescent protein (DR). This greatly improved the solubility of the resulting DR-TGD2C fusion protein following production in Escherichia coli. The DR-TGD2C protein bound PtdOH with high specificity, as demonstrated by membrane lipid-protein overlay and liposome association assays. Internal deletion and truncation mutagenesis identified a previously undescribed minimal 25-amino acid fragment in the C-terminal domain of TGD2 that is sufficient for PtdOH binding. Binding characteristics of this 25-mer were distinctly different from those of TGD2C, suggesting that additional sequences of TGD2 providing the proper context for this 25-mer are needed for wild type-like PtdOH binding. PMID:19416982

  16. Comprehensively identifying and characterizing the missing gene sequences in human reference genome with integrated analytic approaches.

    PubMed

    Chen, Geng; Wang, Charles; Shi, Leming; Tong, Weida; Qu, Xiongfei; Chen, Jiwei; Yang, Jianmin; Shi, Caiping; Chen, Long; Zhou, Peiying; Lu, Bingxin; Shi, Tieliu

    2013-08-01

    The human reference genome is still incomplete and a number of gene sequences are missing from it. The approaches to uncover them, the reasons causing their absence and their functions are less explored. Here, we comprehensively identified and characterized the missing genes of human reference genome with RNA-Seq data from 16 different human tissues. By using a combined approach of genome-guided transcriptome reconstruction coupled with genome-wide comparison, we uncovered 3.78 and 2.37 Mb transcribed regions in the human genome assemblies of Celera and HuRef either missed from their homologous chromosomes of NCBI human reference genome build 37.2 or partially or entirely absent from the reference. We further identified a significant number of novel transcript contigs in each tissue from de novo transcriptome assembly that are unalignable to NCBI build 37.2 but can be aligned to at least one of the genomes from Celera, HuRef, chimpanzee, macaca or mouse. Our analyses indicate that the missing genes could result from genome misassembly, transposition, copy number variation, translocation and other structural variations. Moreover, our results further suggest that a large portion of these missing genes are conserved between human and other mammals, implying their important biological functions. Totally, 1,233 functional protein domains were detected in these missing genes. Collectively, our study not only provides approaches for uncovering the missing genes of a genome, but also proposes the potential reasons causing genes missed from the genome and highlights the importance of uncovering the missing genes of incomplete genomes.

  17. High density genome wide genotyping-by-sequencing and association identifies common and low frequency SNPs, and novel candidate genes influencing cow milk traits

    PubMed Central

    Ibeagha-Awemu, Eveline M.; Peters, Sunday O.; Akwanji, Kingsley A.; Imumorin, Ikhide G.; Zhao, Xin

    2016-01-01

    High-throughput sequencing technologies have increased the ability to detect sequence variations for complex trait improvement. A high throughput genome wide genotyping-by-sequencing (GBS) method was used to generate 515,787 single nucleotide polymorphisms (SNPs), from which 76,355 SNPs with call rates >85% and minor allele frequency ≥1.5% were used in genome wide association study (GWAS) of 44 milk traits in 1,246 Canadian Holstein cows. GWAS was accomplished with a mixed linear model procedure implementing the additive and dominant models. A strong signal within the centromeric region of bovine chromosome 14 was associated with test day fat percentage. Several SNPs were associated with eicosapentaenoic acid, docosapentaenoic acid, arachidonic acid, CLA:9c11t and gamma linolenic acid. Most of the significant SNPs for 44 traits studied are novel and located in intergenic regions or introns of genes. Novel potential candidate genes for milk traits or mammary gland functions include ERCC6, TONSL, NPAS2, ACER3, ITGB4, GGT6, ACOX3, MECR, ADAM12, ACHE, LRRC14, FUK, NPRL3, EVL, SLCO3A1, PSMA4, FTO, ADCK5, PP1R16A and TEP1. Our study further demonstrates the utility of the GBS approach for identifying population-specific SNPs for use in improvement of complex dairy traits. PMID:27506634

  18. Whole exome sequencing identifies a recurrent RQCD1 P131L mutation in cutaneous melanoma

    PubMed Central

    Wong, Stephen Q.; Behren, Andreas; Mar, Victoria J.; Woods, Katherine; Li, Jason; Martin, Claire; Sheppard, Karen E.; Wolfe, Rory; Kelly, John; Cebon, Jonathan; Dobrovic, Alexander; McArthur, Grant A.

    2015-01-01

    Melanoma is often caused by mutations due to exposure to ultraviolet radiation. This study reports a recurrent somatic C > T change causing a P131L mutation in the RQCD1 (Required for Cell Differentiation1 Homolog) gene identified through whole exome sequencing of 20 metastatic melanomas. Screening in 715 additional primary melanomas revealed a prevalence of ~4%. This represents the first reported recurrent mutation in a member of the CCR4-NOT complex in cancer. Compared to tumors without the mutation, the P131L mutant positive tumors were associated with increased thickness (p = 0.02), head and neck (p = 0.009) and upper limb (p = 0.03) location, lentigo maligna melanoma subtype (p = 0.02) and BRAF V600K (p = 0.04) but not V600E or NRAS codon 61 mutations. There was no association with nodal disease (p = 0.3). Mutually exclusive mutations of other members of the CCR4-NOT complex were found in ~20% of the TCGA melanoma dataset suggesting the complex may play an important role in melanoma biology. Mutant RQCD1 was predicted to bind strongly to HLA-A0201 and HLA-Cw3 MHC1 complexes. From thirteen patients with mutant RQCD1, an anti-tumor CD8+ T cell response was observed from a single patient's peripheral blood mononuclear cell population stimulated with mutated peptide compared to wildtype indicating a neoantigen may be formed. PMID:25544760

  19. Identifying Highly Penetrant Disease Causal Mutations Using Next Generation Sequencing: Guide to Whole Process

    PubMed Central

    Erzurumluoglu, A. Mesut; Shihab, Hashem A.; Baird, Denis; Richardson, Tom G.; Day, Ian N. M.; Gaunt, Tom R.

    2015-01-01

    Recent technological advances have created challenges for geneticists and a need to adapt to a wide range of new bioinformatics tools and an expanding wealth of publicly available data (e.g., mutation databases, and software). This wide range of methods and a diversity of file formats used in sequence analysis is a significant issue, with a considerable amount of time spent before anyone can even attempt to analyse the genetic basis of human disorders. Another point to consider that is although many possess “just enough” knowledge to analyse their data, they do not make full use of the tools and databases that are available and also do not fully understand how their data was created. The primary aim of this review is to document some of the key approaches and provide an analysis schema to make the analysis process more efficient and reliable in the context of discovering highly penetrant causal mutations/genes. This review will also compare the methods used to identify highly penetrant variants when data is obtained from consanguineous individuals as opposed to nonconsanguineous; and when Mendelian disorders are analysed as opposed to common-complex disorders. PMID:26106619

  20. Distinct genetic architectures for syndromic and nonsyndromic congenital heart defects identified by exome sequencing.

    PubMed

    Sifrim, Alejandro; Hitz, Marc-Phillip; Wilsdon, Anna; Breckpot, Jeroen; Turki, Saeed H Al; Thienpont, Bernard; McRae, Jeremy; Fitzgerald, Tomas W; Singh, Tarjinder; Swaminathan, Ganesh Jawahar; Prigmore, Elena; Rajan, Diana; Abdul-Khaliq, Hashim; Banka, Siddharth; Bauer, Ulrike M M; Bentham, Jamie; Berger, Felix; Bhattacharya, Shoumo; Bu'Lock, Frances; Canham, Natalie; Colgiu, Irina-Gabriela; Cosgrove, Catherine; Cox, Helen; Daehnert, Ingo; Daly, Allan; Danesh, John; Fryer, Alan; Gewillig, Marc; Hobson, Emma; Hoff, Kirstin; Homfray, Tessa; Kahlert, Anne-Karin; Ketley, Ami; Kramer, Hans-Heiner; Lachlan, Katherine; Lampe, Anne Katrin; Louw, Jacoba J; Manickara, Ashok Kumar; Manase, Dorin; McCarthy, Karen P; Metcalfe, Kay; Moore, Carmel; Newbury-Ecob, Ruth; Omer, Seham Osman; Ouwehand, Willem H; Park, Soo-Mi; Parker, Michael J; Pickardt, Thomas; Pollard, Martin O; Robert, Leema; Roberts, David J; Sambrook, Jennifer; Setchfield, Kerry; Stiller, Brigitte; Thornborough, Chris; Toka, Okan; Watkins, Hugh; Williams, Denise; Wright, Michael; Mital, Seema; Daubeney, Piers E F; Keavney, Bernard; Goodship, Judith; Abu-Sulaiman, Riyadh Mahdi; Klaassen, Sabine; Wright, Caroline F; Firth, Helen V; Barrett, Jeffrey C; Devriendt, Koenraad; FitzPatrick, David R; Brook, J David; Hurles, Matthew E

    2016-09-01

    Congenital heart defects (CHDs) have a neonatal incidence of 0.8-1% (refs. 1,2). Despite abundant examples of monogenic CHD in humans and mice, CHD has a low absolute sibling recurrence risk (∼2.7%), suggesting a considerable role for de novo mutations (DNMs) and/or incomplete penetrance. De novo protein-truncating variants (PTVs) have been shown to be enriched among the 10% of 'syndromic' patients with extra-cardiac manifestations. We exome sequenced 1,891 probands, including both syndromic CHD (S-CHD, n = 610) and nonsyndromic CHD (NS-CHD, n = 1,281). In S-CHD, we confirmed a significant enrichment of de novo PTVs but not inherited PTVs in known CHD-associated genes, consistent with recent findings. Conversely, in NS-CHD we observed significant enrichment of PTVs inherited from unaffected parents in CHD-associated genes. We identified three genome-wide significant S-CHD disorders caused by DNMs in CHD4, CDK13 and PRKD1. Our study finds evidence for distinct genetic architectures underlying the low sibling recurrence risk in S-CHD and NS-CHD.

  1. Exome sequencing identifies a novel homozygous CLN8 mutation in a Turkish family with Northern epilepsy.

    PubMed

    Sahin, Yavuz; Güngör, Olcay; Gormez, Zeliha; Demirci, Huseyin; Ergüner, Bekir; Güngör, Gülay; Dilber, Cengiz

    2017-03-01

    Neuronal ceroid lipofuscinosis (NCL), one of the most common neurodegenerative childhood-onset disorders, is characterized by autosomal-recessive inheritance, epileptic seizures, progressive psychomotor deterioration, visual impairment, and premature death. Based on the country of origin of the patients, the clinical features/courses, and the molecular genetics background of the disorder, 14 distinct NCL subtypes have been described to date. CLN8 mutation was first identified in Finnish patients, and the condition was named Northern Epilepsy (NE); however, the severe phenotype of the CLN8 gene was subsequently found outside Finland and named 'variant late-infantile' NCL. In this study, five patients and their six healthy relatives from a large Turkish consanguineous family were enrolled. The study involved detailed clinical, radiological and molecular genetic evaluations. Whole-exome sequencing and homozygosity mapping revealed a novel homozygous CLN8 mutation, c.677T>C (p.Leu226Pro). We defined NE cases in Turkey, caused by a novel mutation in CLN8. WES can be an important diagnostic method in rare cases with atypical courses.

  2. Evidence for SNP-SNP interaction identified through targeted sequencing of cleft case-parent trios.

    PubMed

    Xiao, Yanzi; Taub, Margaret A; Ruczinski, Ingo; Begum, Ferdouse; Hetmanski, Jacqueline B; Schwender, Holger; Leslie, Elizabeth J; Koboldt, Daniel C; Murray, Jeffrey C; Marazita, Mary L; Beaty, Terri H

    2017-04-01

    Nonsyndromic cleft lip with or without cleft palate (NSCL/P) is the most common craniofacial birth defect in humans, affecting 1 in 700 live births. This malformation has a complex etiology where multiple genes and several environmental factors influence risk. At least a dozen different genes have been confirmed to be associated with risk of NSCL/P in previous studies. However, all the known genetic risk factors cannot fully explain the observed heritability of NSCL/P, and several authors have suggested gene-gene (G × G) interaction may be important in the etiology of this complex and heterogeneous malformation. We tested for G × G interactions using common single nucleotide polymorphic (SNP) markers from targeted sequencing in 13 regions identified by previous studies spanning 6.3 Mb of the genome in a study of 1,498 NSCL/P case-parent trios. We used the R-package trio to assess interactions between polymorphic markers in different genes, using a 1 degree of freedom (1df) test for screening, and a 4 degree of freedom (4df) test to assess statistical significance of epistatic interactions. To adjust for multiple comparisons, we performed permutation tests. The most significant interaction was observed between rs6029315 in MAFB and rs6681355 in IRF6 (4df P = 3.8 × 10(-8) ) in case-parent trios of European ancestry, which remained significant after correcting for multiple comparisons. However, no significant interaction was detected in trios of Asian ancestry.

  3. Whole-genome sequencing identifies genomic heterogeneity at a nucleotide and chromosomal level in bladder cancer.

    PubMed

    Morrison, Carl D; Liu, Pengyuan; Woloszynska-Read, Anna; Zhang, Jianmin; Luo, Wei; Qin, Maochun; Bshara, Wiam; Conroy, Jeffrey M; Sabatini, Linda; Vedell, Peter; Xiong, Donghai; Liu, Song; Wang, Jianmin; Shen, He; Li, Yinwei; Omilian, Angela R; Hill, Annette; Head, Karen; Guru, Khurshid; Kunnev, Dimiter; Leach, Robert; Eng, Kevin H; Darlak, Christopher; Hoeflich, Christopher; Veeranki, Srividya; Glenn, Sean; You, Ming; Pruitt, Steven C; Johnson, Candace S; Trump, Donald L

    2014-02-11

    Using complete genome analysis, we sequenced five bladder tumors accrued from patients with muscle-invasive transitional cell carcinoma of the urinary bladder (TCC-UB) and identified a spectrum of genomic aberrations. In three tumors, complex genotype changes were noted. All three had tumor protein p53 mutations and a relatively large number of single-nucleotide variants (SNVs; average of 11.2 per megabase), structural variants (SVs; average of 46), or both. This group was best characterized by chromothripsis and the presence of subclonal populations of neoplastic cells or intratumoral mutational heterogeneity. Here, we provide evidence that the process of chromothripsis in TCC-UB is mediated by nonhomologous end-joining using kilobase, rather than megabase, fragments of DNA, which we refer to as "stitchers," to repair this process. We postulate that a potential unifying theme among tumors with the more complex genotype group is a defective replication-licensing complex. A second group (two bladder tumors) had no chromothripsis, and a simpler genotype, WT tumor protein p53, had relatively few SNVs (average of 5.9 per megabase) and only a single SV. There was no evidence of a subclonal population of neoplastic cells. In this group, we used a preclinical model of bladder carcinoma cell lines to study a unique SV (translocation and amplification) of the gene glutamate receptor ionotropic N-methyl D-aspertate as a potential new therapeutic target in bladder cancer.

  4. Whole-genome sequencing identifies genomic heterogeneity at a nucleotide and chromosomal level in bladder cancer

    PubMed Central

    Morrison, Carl D.; Liu, Pengyuan; Woloszynska-Read, Anna; Zhang, Jianmin; Luo, Wei; Qin, Maochun; Bshara, Wiam; Conroy, Jeffrey M.; Sabatini, Linda; Vedell, Peter; Xiong, Donghai; Liu, Song; Wang, Jianmin; Shen, He; Li, Yinwei; Omilian, Angela R.; Hill, Annette; Head, Karen; Guru, Khurshid; Kunnev, Dimiter; Leach, Robert; Eng, Kevin H.; Darlak, Christopher; Hoeflich, Christopher; Veeranki, Srividya; Glenn, Sean; You, Ming; Pruitt, Steven C.; Johnson, Candace S.; Trump, Donald L.

    2014-01-01

    Using complete genome analysis, we sequenced five bladder tumors accrued from patients with muscle-invasive transitional cell carcinoma of the urinary bladder (TCC-UB) and identified a spectrum of genomic aberrations. In three tumors, complex genotype changes were noted. All three had tumor protein p53 mutations and a relatively large number of single-nucleotide variants (SNVs; average of 11.2 per megabase), structural variants (SVs; average of 46), or both. This group was best characterized by chromothripsis and the presence of subclonal populations of neoplastic cells or intratumoral mutational heterogeneity. Here, we provide evidence that the process of chromothripsis in TCC-UB is mediated by nonhomologous end-joining using kilobase, rather than megabase, fragments of DNA, which we refer to as “stitchers,” to repair this process. We postulate that a potential unifying theme among tumors with the more complex genotype group is a defective replication–licensing complex. A second group (two bladder tumors) had no chromothripsis, and a simpler genotype, WT tumor protein p53, had relatively few SNVs (average of 5.9 per megabase) and only a single SV. There was no evidence of a subclonal population of neoplastic cells. In this group, we used a preclinical model of bladder carcinoma cell lines to study a unique SV (translocation and amplification) of the gene glutamate receptor ionotropic N-methyl D-aspertate as a potential new therapeutic target in bladder cancer. PMID:24469795

  5. Exome sequencing identifies KIAA1377 and C5orf42 as susceptibility genes for monomelic amyotrophy.

    PubMed

    Lim, Young-Min; Koh, Insong; Park, Young-Mi; Kim, Jae-Jung; Kim, Dae-Seong; Kim, Hyo-Jin; Baik, Kyu-Heum; Choi, Hye-Yeon; Yang, Gap-Seok; Also-Rallo, Eva; Tizzano, Eduardo F; Gamez, Josep; Park, Kiejung; Yoo, Han-Wook; Lee, Jong-Keuk; Kim, Kwang-Kuk

    2012-05-01

    Precise topographic localization, predominance in males mostly of Asian origin, and existence of some familial cases suggest a genetic background for monomelic amyotrophy. To identify susceptibility genes for monomelic amyotrophy, we performed whole-exome sequencing of four unrelated patients with monomelic amyotrophy and detected a total of 45 novel nonsynonymous single-nucleotide polymorphisms as unique variants to monomelic amyotrophy compared to control exomes. Genetic association analysis showed significant association with monomelic amyotrophy in the Gly668Ser variant of the KIAA1377 gene (odds ratio=4.62, P-value=0.0040) and the Pro1794Leu variant of the C5orf42 gene (odds ratio=4.63, P-value=0.0040). Moreover, the combination of two variants increased the risk of monomelic amyotrophy (P=1.4×10(-5), OR=61.69, 95% confidence interval=9.62-394.94, in case of combination of two heterozygotes). These data suggest that KIAA1377 and C5orf42 synergistically play a role as susceptibility genes for monomelic amyotrophy.

  6. PACCMIT/PACCMIT-CDS: identifying microRNA targets in 3′ UTRs and coding sequences

    PubMed Central

    Šulc, Miroslav; Marín, Ray M.; Robins, Harlan S.; Vaníček, Jiří

    2015-01-01

    The purpose of the proposed web server, publicly available at http://paccmit.epfl.ch, is to provide a user-friendly interface to two algorithms for predicting messenger RNA (mRNA) molecules regulated by microRNAs: (i) PACCMIT (Prediction of ACcessible and/or Conserved MIcroRNA Targets), which identifies primarily mRNA transcripts targeted in their 3′ untranslated regions (3′ UTRs), and (ii) PACCMIT-CDS, designed to find mRNAs targeted within their coding sequences (CDSs). While PACCMIT belongs among the accurate algorithms for predicting conserved microRNA targets in the 3′ UTRs, the main contribution of the web server is 2-fold: PACCMIT provides an accurate tool for predicting targets also of weakly conserved or non-conserved microRNAs, whereas PACCMIT-CDS addresses the lack of similar portals adapted specifically for targets in CDS. The web server asks the user for microRNAs and mRNAs to be analyzed, accesses the precomputed P-values for all microRNA–mRNA pairs from a database for all mRNAs and microRNAs in a given species, ranks the predicted microRNA–mRNA pairs, evaluates their significance according to the false discovery rate and finally displays the predictions in a tabular form. The results are also available for download in several standard formats. PMID:25948580

  7. Respiratory Syncytial Virus whole-genome sequencing identifies convergent evolution of sequence duplication in the C-terminus of the G gene

    PubMed Central

    Schobel, Seth A.; Stucker, Karla M.; Moore, Martin L.; Anderson, Larry J.; Larkin, Emma K.; Shankar, Jyoti; Bera, Jayati; Puri, Vinita; Shilts, Meghan H.; Rosas-Salazar, Christian; Halpin, Rebecca A.; Fedorova, Nadia; Shrivastava, Susmita; Stockwell, Timothy B.; Peebles, R. Stokes; Hartert, Tina V.; Das, Suman R.

    2016-01-01

    Respiratory Syncytial Virus (RSV) is responsible for considerable morbidity and mortality worldwide and is the most important respiratory viral pathogen in infants. Extensive sequence variability within and between RSV group A and B viruses and the ability of multiple clades and sub-clades of RSV to co-circulate are likely mechanisms contributing to the evasion of herd immunity. Surveillance and large-scale whole-genome sequencing of RSV is currently limited but would help identify its evolutionary dynamics and sites of selective immune evasion. In this study, we performed complete-genome next-generation sequencing of 92 RSV isolates from infants in central Tennessee during the 2012–2014 RSV seasons. We identified multiple co-circulating clades of RSV from both the A and B groups. Each clade is defined by signature N- and O-linked glycosylation patterns. Analyses of specific RSV genes revealed high rates of positive selection in the attachment (G) gene. We identified RSV-A viruses in circulation with and without a recently reported 72-nucleotide G gene sequence duplication. Furthermore, we show evidence of convergent evolution of G gene sequence duplication and fixation over time, which suggests a potential fitness advantage of RSV with the G sequence duplication. PMID:27212633

  8. Novel classical MHC class I alleles identified in horses by sequencing clones of reverse transcription-PCR products.

    PubMed

    Chung, C; Leib, S R; Fraser, D G; Ellis, S A; McGuire, T C

    2003-12-01

    Improved typing of horse classical MHC class I is required to more accurately define these molecules and to extend the number identified further than current serological assays. Defining classical MHC class I alleleic polymorphism is important in evaluating cytotoxic T lymphocyte (CTL) responses in horses. In this study, horse classical MHC class I genes were analyzed based on reverse transcription (RT)-PCR amplification of sequences encoding the polymorphic peptide binding region and the more conserved alpha 3, transmembrane and cytoplasmic regions followed by cloning and sequencing. Primer sets included a horse classical MHC class I-specific reverse primer and a forward primer conserved in all known horse MHC class I genes. Sequencing at least 25 clones containing MHC class I sequences from each of 13 horses identified 25 novel sequences and three others which had been described. Of these, nine alleles were identified from different horses or different RT-PCR and 19 putative alleles were identified in multiple clones from the same RT-PCR. The primer pairs did not amplify putative non-classical MHC class I genes as only classical MHC class I and related pseudogenes were found in 462 clones. This method also identified classical MHC class I alleles shared between horses by descent, and defined differences in alleles between horses varying in equine leukocyte antigen (ELA)-A haplotype as determined by serology. However, horses sharing ELA-A haplotypes defined by serotyping did not always share cDNA sequences, suggesting subhaplotypic variations within serologically defined ELA-A haplotypes. The 13 horses in this study had two to five classical MHC class I sequences, indicating that multiple loci code for these genes. Sequencing clones from RT-PCR with classical MHC class I-specific primers should be useful for selection of haplotype matched and mismatched horses for CTL studies, and provides sequence information needed to develop easier and more discriminating

  9. [Sequence analysis for genes encoding nucleoprotein and envelope protein of a new human coronavirus NL63 identified from a pediatric patient in Beijing by bioinformatics].

    PubMed

    Xing, Jiang-feng; Zhu, Ru-nan; Qian, Yuan; Zhao, Lin-qing; Deng, Jie; Wang, Fang; Sun, Yu

    2007-07-01

    The aim of this study was to characterize the N and E protein encoding genes of a new human coronavirus (HCoV-NL63) which was identified from one of the clinical specimens (BJ8081) collected from a 12 years-old patient with acute respiratory infection in Beijing. The complete N and E gene sequences of HCoV-NL63 were amplified from clinical sample by RT-PCR, then were cloned into the pCF-T and pUCm-T vectors respectively and sequenced. The complete sequences of N and E genes were submitted to GenBank by Sequin and compared with N and E genes of prototype HCoV-NL63 and the other coronaviruses published in GenBank. The secondary structure and the characteristics of sample BJ8081 N and E proteins were predicted by bioinformatics. It was indicated that the N and E genes amplified from sample BJ8081 were 1134 bp and 234 bp in length and the predicted proteins including 377 amino acids and 77 amino acids, respectively. The data suggested that the region of amino acids 78-85 within N protein probably was the conserved region for all coronaviruses identified so far including HCoV-NL63. The region of amino acids 15-37 for E protein was probably the transmembrane domain. In conclusion, the recombinant plasmids pCF-T-8081 N and pUCm-T-8081 E were successfully constructed and sequenced, and the data predicted by bioinformatics are helpful for the further analysis of HCoV-NL63.

  10. Genome Analysis Identified Novel Candidate Genes for Ascochyta Blight Resistance in Chickpea Using Whole Genome Re-sequencing Data

    PubMed Central

    Li, Yongle; Ruperao, Pradeep; Batley, Jacqueline; Edwards, David; Davidson, Jenny; Hobson, Kristy; Sutton, Tim

    2017-01-01

    Ascochyta blight (AB) is a fungal disease that can significantly reduce chickpea production in Australia and other regions of the world. In this study, 69 chickpea genotypes were sequenced using whole genome re-sequencing (WGRS) methods. They included 48 Australian varieties differing in their resistance ranking to AB, 16 advanced breeding lines from the Australian chickpea breeding program, four landraces, and one accession representing the wild chickpea species Cicer reticulatum. More than 800,000 single nucleotide polymorphisms (SNPs) were identified. Population structure analysis revealed relatively narrow genetic diversity amongst recently released Australian varieties and two groups of varieties separated by the level of AB resistance. Several regions of the chickpea genome were under positive selection based on Tajima’s D test. Both Fst genome- scan and genome-wide association studies (GWAS) identified a 100 kb region (AB4.1) on chromosome 4 that was significantly associated with AB resistance. The AB4.1 region co-located to a large QTL interval of 7 Mb∼30 Mb identified previously in three different mapping populations which were genotyped at relatively low density with SSR or SNP markers. The AB4.1 region was validated by GWAS in an additional collection of 132 advanced breeding lines from the Australian chickpea breeding program, genotyped with approximately 144,000 SNPs. The reduced level of nucleotide diversity and long extent of linkage disequilibrium also suggested the AB4.1 region may have gone through selective sweeps probably caused by selection of the AB resistance trait in breeding. In total, 12 predicted genes were located in the AB4.1 QTL region, including those annotated as: NBS-LRR receptor-like kinase, wall-associated kinase, zinc finger protein, and serine/threonine protein kinases. One significant SNP located in the conserved catalytic domain of a NBS-LRR receptor-like kinase led to amino acid substitution. Transcriptional analysis

  11. Investigating Salmonella Eko from Various Sources in Nigeria by Whole Genome Sequencing to Identify the Source of Human Infections

    PubMed Central

    Leekitcharoenphon, Pimlapas; Raufu, Ibrahim; Nielsen, Mette T.; Rosenqvist Lund, Birthe S.; Ameh, James A.; Ambali, Abdul G.; Sørensen, Gitte; Le Hello, Simon; Aarestrup, Frank M.; Hendriksen, Rene S.

    2016-01-01

    Twenty-six Salmonella enterica serovar Eko isolated from various sources in Nigeria were investigated by whole genome sequencing to identify the source of human infections. Diversity among the isolates was observed and camel and cattle were identified as the primary reservoirs and the most likely source of the human infections. PMID:27228329

  12. Molecular cloning, encoding sequence, and expression of vaccinia virus nucleic acid-dependent nucleoside triphosphatase gene.

    PubMed Central

    Rodriguez, J F; Kahn, J S; Esteban, M

    1986-01-01

    A rabbit poxvirus genomic library contained within the expression vector lambda gt11 was screened with polyclonal antiserum prepared against vaccinia virus nucleic acid-dependent nucleoside triphosphatase (NTPase)-I enzyme. Five positive phage clones containing from 0.72- to 2.5-kilobase-pair (kbp) inserts expressed a beta-galactosidase fusion protein that was reactive by immunoblotting with the NTPase-I antibody. Hybridization analysis allowed the location of this gene within the vaccinia HindIIID restriction fragment. From the known nucleotide sequence of the 16-kbp vaccinia HindIIID fragment, we identified a region that contains a 1896-base open reading frame coding for a 631-amino acid protein. Analysis of the complete sequence revealed a highly basic protein, with hydrophilic COOH and NH2 termini, various hydrophobic domains, and no significant homology to other known proteins. Translational studies demonstrate that NTPase-I belongs to a late class of viral genes. This protein is highly conserved among Orthopoxviruses. Images PMID:3025846

  13. Next generation sequencing identifies ‘interactome’ signatures in relapsed and refractory metastatic colorectal cancer

    PubMed Central

    Cooke, Laurence; Mahadevan, Daruka

    2017-01-01

    Background In the management of metastatic colorectal cancer (mCRC), KRAS, NRAS and BRAF mutational status individualizes therapeutic options and identify a cohort of patients (pts) with an aggressive clinical course. We hypothesized that relapsed and refractory mCRC pts develop unique mutational signatures that may guide therapy, predict for a response and highlight key signaling pathways important for clinical decision making. Methods Relapsed and refractory mCRC pts (N=32) were molecularly profiled utilizing commercially available next generation sequencing (NGS) platforms. Web-based bioinformatics tools (Reactome/Enrichr) were utilized to elucidate mutational profile linked pathways-networks that have the potential to guide therapy. Results Pts had progressed on fluoropyrimidines, oxaliplatin, irinotecan, bevacizumab, cetuximab and/or panitumumab. Most common histology was adenocarcinoma (colon N=29; rectal N=3). Of the mutations TP53 was the most common, followed by APC, KRAS, PIK3CA, BRAF, SMAD4, SPTA1, FAT1, PDGFRA, ATM, ROS1, ALK, CDKN2A, FBXW7, TGFBR2, NOTCH1 and HER3. Pts had on average had ≥5 unique mutations. The most frequent activated signaling pathways were: HER2, fibroblast growth factor receptor (FGFR), p38 through BRAF-MEK cascade via RIT and RIN, ARMS-mediated activation of MAPK cascade, and VEGFR2. Conclusions Dominant driver oncogene mutations do not always equate to oncogenic dependence, hence understanding pathogenic ‘interactome(s)’ in individual pts is key to both clinically relevant targets and in choosing the next best therapy. Mutational signatures derived from corresponding ‘pathway-networks’ represent a meaningful tool to (I) evaluate functional investigation in the laboratory; (II) predict response to drug therapy; and (III) guide rational drug combinations in relapsed and refractory mCRC pts. PMID:28280605

  14. Exome sequencing of Pakistani consanguineous families identifies 30 novel candidate genes for recessive intellectual disability.

    PubMed

    Riazuddin, S; Hussain, M; Razzaq, A; Iqbal, Z; Shahzad, M; Polla, D L; Song, Y; van Beusekom, E; Khan, A A; Tomas-Roca, L; Rashid, M; Zahoor, M Y; Wissink-Lindhout, W M; Basra, M A R; Ansar, M; Agha, Z; van Heeswijk, K; Rasheed, F; Van de Vorst, M; Veltman, J A; Gilissen, C; Akram, J; Kleefstra, T; Assir, M Z; Grozeva, D; Carss, K; Raymond, F L; O'Connor, T D; Riazuddin, S A; Khan, S N; Ahmed, Z M; de Brouwer, A P M; van Bokhoven, H; Riazuddin, S

    2016-07-26

    Intellectual disability (ID) is a clinically and genetically heterogeneous disorder, affecting 1-3% of the general population. Although research into the genetic causes of ID has recently gained momentum, identification of pathogenic mutations that cause autosomal recessive ID (ARID) has lagged behind, predominantly due to non-availability of sizeable families. Here we present the results of exome sequencing in 121 large consanguineous Pakistani ID families. In 60 families, we identified homozygous or compound heterozygous DNA variants in a single gene, 30 affecting reported ID genes and 30 affecting novel candidate ID genes. Potential pathogenicity of these alleles was supported by co-segregation with the phenotype, low frequency in control populations and the application of stringent bioinformatics analyses. In another eight families segregation of multiple pathogenic variants was observed, affecting 19 genes that were either known or are novel candidates for ID. Transcriptome profiles of normal human brain tissues showed that the novel candidate ID genes formed a network significantly enriched for transcriptional co-expression (P<0.0001) in the frontal cortex during fetal development and in the temporal-parietal and sub-cortex during infancy through adulthood. In addition, proteins encoded by 12 novel ID genes directly interact with previously reported ID proteins in six known pathways essential for cognitive function (P<0.0001). These results suggest that disruptions of temporal parietal and sub-cortical neurogenesis during infancy are critical to the pathophysiology of ID. These findings further expand the existing repertoire of genes involved in ARID, and provide new insights into the molecular mechanisms and the transcriptome map of ID.Molecular Psychiatry advance online publication, 26 July 2016; doi:10.1038/mp.2016.109.

  15. Purification, characterization, gene cloning and nucleotide sequencing of D: -stereospecific amino acid amidase from soil bacterium: Delftia acidovorans.

    PubMed

    Hongpattarakere, Tipparat; Komeda, Hidenobu; Asano, Yasuhisa

    2005-12-01

    The D-amino acid amidase-producing bacterium was isolated from soil samples using an enrichment culture technique in medium broth containing D-phenylalanine amide as a sole source of nitrogen. The strain exhibiting the strongest activity was identified as Delftia acidovorans strain 16. This strain produced intracellular D-amino acid amidase constitutively. The enzyme was purified about 380-fold to homogeneity and its molecular mass was estimated to be about 50 kDa, on sodium dodecyl sulfate polyacrylamide gel electrophoresis. The enzyme was active preferentially toward D-amino acid amides rather than their L-counterparts. It exhibited strong amino acid amidase activity toward aromatic amino acid amides including D-phenylalanine amide, D-tryptophan amide and D-tyrosine amide, yet it was not specifically active toward low-molecular-weight D-amino acid amides such as D-alanine amide, L-alanine amide and L-serine amide. Moreover, it was not specifically active toward oligopeptides. The enzyme showed maximum activity at 40 degrees C and pH 8.5 and appeared to be very stable, with 92.5% remaining activity after the reaction was performed at 45 degrees C for 30 min. However, it was mostly inactivated in the presence of phenylmethanesulfonyl fluoride or Cd2+, Ag+, Zn2+, Hg2+ and As3+ . The NH2 terminal and internal amino acid sequences of the enzyme were determined; and the gene was cloned and sequenced. The enzyme gene damA encodes a 466-amino-acid protein (molecular mass 49,860.46 Da); and the deduced amino acid sequence exhibits homology to the D-amino acid amidase from Variovorax paradoxus (67.9% identity), the amidotransferase A subunit from Burkholderia fungorum (50% identity) and other enantioselective amidases.

  16. Identification of random nucleic acid sequence aberrations using dual capture probes which hybridize to different chromosome regions

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1998-03-24

    A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration. 14 figs.

  17. Identification of random nucleic acid sequence aberrations using dual capture probes which hybridize to different chromosome regions

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1998-01-01

    A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration.

  18. Proteomic analysis of cerebrospinal fluid in California sea lions (Zalophus californianus) with domoic acid toxicosis identifies proteins associated with neurodegeneration.

    PubMed

    Neely, Benjamin A; Soper, Jennifer L; Gulland, Frances M D; Bell, P Darwin; Kindy, Mark; Arthur, John M; Janech, Michael G

    2015-12-01

    Proteomic studies including marine mammals are rare, largely due to the lack of fully sequenced genomes. This has hampered the application of these techniques toward biomarker discovery efforts for monitoring of health and disease in these animals. We conducted a pilot label-free LC-MS/MS study to profile and compare the cerebrospinal fluid from California sea lions with domoic acid toxicosis (DAT) and without DAT. Across 11 samples, a total of 206 proteins were identified (FDR<0.1) using a composite mammalian database. Several peptide identifications were validated using stable isotope labeled peptides. Comparison of spectral counts revealed seven proteins that were elevated in the cerebrospinal fluid from sea lions with DAT: complement C3, complement factor B, dickkopf-3, malate dehydrogenase 1, neuron cell adhesion molecule 1, gelsolin, and neuronal cell adhesion molecule. Immunoblot analysis found reelin to be depressed in the cerebrospinal fluid from California sea lions with DAT. Mice administered domoic acid also had lower hippocampal reelin protein levels suggesting that domoic acid depresses reelin similar to kainic acid. In summary, proteomic analysis of cerebrospinal fluid in marine mammals is a useful tool to characterize the underlying molecular pathology of neurodegenerative disease. All MS data have been deposited in the ProteomeXchange with identifier PXD002105 (http://proteomecentral.proteomexchange.org/dataset/PXD002105).

  19. The amino acid sequence of protein CM-3 from Dendroaspis polylepis polylepis (black mamba) venom.

    PubMed

    Joubert, F J

    1985-01-01

    Protein CM-3 from Dendroaspis polylepis polylepis venom was purified by gel filtration and ion exchange chromatography. It comprises 65 amino acids including eight half-cystines. The complete amino acid sequence of protein CM-3 has been elucidated. The sequence (residues 1-50) resembles that of the N-terminal sequence of the subunits of a synergistic type protein and residues 51-65 that of the C-terminal sequence of an angusticeps type protein. Mixtures of protein CM-3 and angusticeps type proteins showed no apparent synergistic effect, in that their toxicity in combination was no greater than the sum of their individual toxicities.

  20. The amino acid sequences of the Fd fragments of two human γ heavy chains

    PubMed Central

    Press, E. M.; Hogg, N. M.

    1970-01-01

    The amino acid sequences of the Fd fragments of two human pathological immunoglobulins of the immunoglobulin G1 class are reported. Comparison of the two sequences shows that the heavy-chain variable regions are similar in length to those of the light chains. The existence of heavy chain variable region subgroups is also deduced, from a comparison of these two sequences with those of another γ 1 chain, Eu, a μ chain, Ou, and the partial sequence of a fourth γ 1 chain, Ste. Carbohydrate has been found to be linked to an aspartic acid residue in the variable region of one of the γ 1 chains, Cor. PMID:5449120

  1. Amino acid sequence of neurotoxin III of the scorpion Androctonus austrialis Hector.

    PubMed

    Kopeyan, C; Martinez, G; Rochat, H

    1979-03-01

    The amino acid sequence of neurotoxin III, purified from the venom of the North African scorpion Androctonus australis Hector, has been determined by Edman degradation using a liquid-phase sequencer. Carboxypeptidase A hydrolyses confirmed not only the sequence of the five last residues but also the presence of a free alpha-carboxylic group at the C-terminus. Edman degradation was conducted on one hand with the Quadrol [N,N,N',N'-tetrakis(2-hydroxypropyl)ethylene diamine] program and S-alkylated protein before or after coupling with sulfophenylisothiocynate (the first 34 residues were thus identified), on the other hand on tryptic and chymotryptic peptides with a dimethylbenzylamine program (residues 1--23 and 31--34 were confirmed, the positions of residues 35-64 were established). Neurotoxin III was found to belong to the same group of scorpion toxins active on mammals as neurotoxin I purified from the same venom (50 homologous positions exist in the two proteins).

  2. Purification, amino acid sequence and characterisation of kangaroo IGF-I.

    PubMed

    Yandell, C A; Francis, G L; Wheldrake, J F; Upton, Z

    1998-01-01

    Insulin-like growth factor-I (IGF-I) and IGF-II have been purified to homogeneity from kangaroo (Macropus fuliginosus) serum, thus this represents the first report of the purification, sequencing and characterisation of marsupial IGFs. N-Terminal protein sequencing reveals that there are six amino acid differences between kangaroo and human IGF-I. Kangaroo IGF-II has been partially sequenced and no differences were found between human and kangaroo IGF-II in the 53 residues identified. Thus the IGFs appear to be remarkably structurally conserved during mammalian radiation. In addition, in vitro characterisation of kangaroo IGF-I demonstrated that the functional properties of human, kangaroo and chicken IGF-I are very similar. In an assay measuring the ability of the proteins to stimulate protein synthesis in rat L6 myoblasts, all IGF-I proteins were found to be equally potent. The ability of all three proteins to compete for binding with radiolabelled human IGF-I to type-1 IGF receptors in L6 myoblasts and in Sminthopsis crassicaudata transformed lung fibroblasts, a marsupial cell line, was comparable. Furthermore, kangaroo and human IGF-I react equally in a human IGF-I RIA using a human reference standard, radiolabelled human IGF-I and a polyclonal antibody raised against recombinant human IGF-I. This study indicates that not only is the primary structure of eutherian and metatherian IGF-I conserved, but also the proteins appear to be functionally similar.

  3. [Exome sequencing: an efficient strategy for identifying the causative genes of monogenic disorders].

    PubMed

    Rebiya, Nuli; Patamu, Mohemaiti

    2011-10-01

    The development of new generation sequencing technologies has brought new opportunities for the study of diseases. Exome sequencing has shown to be an effective, rapid, high performance technique that has already been used in research of inherited diseases such as monogenic disorders. It has already been approved by scientists in the field of monogenic disorder study, and will become widely used. This approach will accelerate discovery of the causative genes of Mendelian disorders. This article reviews some recent applications of exome sequencing in the study of gene-related diseases.

  4. Exome Sequencing of Cell-Free DNA from Metastatic Cancer Patients Identifies Clinically Actionable Mutations Distinct from Primary Disease

    PubMed Central

    Butler, Timothy M.; Johnson-Camacho, Katherine; Peto, Myron; Wang, Nicholas J.; Macey, Tara A.; Korkola, James E.; Koppie, Theresa M.; Corless, Christopher L.; Gray, Joe W.; Spellman, Paul T.

    2015-01-01

    The identification of the molecular drivers of cancer by sequencing is the backbone of precision medicine and the basis of personalized therapy; however, biopsies of primary tumors provide only a snapshot of the evolution of the disease and may miss potential therapeutic targets, especially in the metastatic setting. A liquid biopsy, in the form of cell-free DNA (cfDNA) sequencing, has the potential to capture the inter- and intra-tumoral heterogeneity present in metastatic disease, and, through serial blood draws, track the evolution of the tumor genome. In order to determine the clinical utility of cfDNA sequencing we performed whole-exome sequencing on cfDNA and tumor DNA from two patients with metastatic disease; only minor modifications to our sequencing and analysis pipelines were required for sequencing and mutation calling of cfDNA. The first patient had metastatic sarcoma and 47 of 48 mutations present in the primary tumor were also found in the cell-free DNA. The second patient had metastatic breast cancer and sequencing identified an ESR1 mutation in the cfDNA and metastatic site, but not in the primary tumor. This likely explains tumor progression on Anastrozole. Significant heterogeneity between the primary and metastatic tumors, with cfDNA reflecting the metastases, suggested separation from the primary lesion early in tumor evolution. This is best illustrated by an activating PIK3CA mutation (H1047R) which was clonal in the primary tumor, but completely absent from either the metastasis or cfDNA. Here we show that cfDNA sequencing supplies clinically actionable information with minimal risks compared to metastatic biopsies. This study demonstrates the utility of whole-exome sequencing of cell-free DNA from patients with metastatic disease. cfDNA sequencing identified an ESR1 mutation, potentially explaining a patient’s resistance to aromatase inhibition, and gave insight into how metastatic lesions differ from the primary tumor. PMID:26317216

  5. Whole-Genome Sequencing Identifies Emergence of a Quinolone Resistance Mutation in a Case of Stenotrophomonas maltophilia Bacteremia

    PubMed Central

    Altman, Deena R.; Attie, Oliver; Sebra, Robert; Hamula, Camille L.; Lewis, Martha; Deikus, Gintaras; Newman, Leah C.; Fang, Gang; Hand, Jonathan; Patel, Gopi; Wallach, Fran; Schadt, Eric E.; Huprikar, Shirish; van Bakel, Harm; Bashir, Ali

    2015-01-01

    Whole-genome sequences for Stenotrophomonas maltophilia serial isolates from a bacteremic patient before and after development of levofloxacin resistance were assembled de novo and differed by one single-nucleotide variant in smeT, a repressor for multidrug efflux operon smeDEF. Along with sequenced isolates from five contemporaneous cases, they displayed considerable diversity compared against all published complete genomes. Whole-genome sequencing and complete assembly can conclusively identify resistance mechanisms emerging in S. maltophilia strains during clinical therapy. PMID:26324280

  6. Longitudinal Antigenic Sequences and Sites from Intra-Host Evolution (LASSIE) identifies immune-selected HIV variants

    DOE PAGES

    Hraber, Peter; Korber, Bette; Wagh, Kshitij; ...

    2015-10-21

    Within-host genetic sequencing from samples collected over time provides a dynamic view of how viruses evade host immunity. Immune-driven mutations might stimulate neutralization breadth by selecting antibodies adapted to cycles of immune escape that generate within-subject epitope diversity. Comprehensive identification of immune-escape mutations is experimentally and computationally challenging. With current technology, many more viral sequences can readily be obtained than can be tested for binding and neutralization, making down-selection necessary. Typically, this is done manually, by picking variants that represent different time-points and branches on a phylogenetic tree. Such strategies are likely to miss many relevant mutations and combinations ofmore » mutations, and to be redundant for other mutations. Longitudinal Antigenic Sequences and Sites from Intrahost Evolution (LASSIE) uses transmitted founder loss to identify virus “hot-spots” under putative immune selection and chooses sequences that represent recurrent mutations in selected sites. LASSIE favors earliest sequences in which mutations arise. Here, with well-characterized longitudinal Env sequences, we confirmed selected sites were concentrated in antibody contacts and selected sequences represented diverse antigenic phenotypes. Finally, practical applications include rapidly identifying immune targets under selective pressure within a subject, selecting minimal sets of reagents for immunological assays that characterize evolving antibody responses, and for immunogens in polyvalent “cocktail” vaccines.« less

  7. Longitudinal Antigenic Sequences and Sites from Intra-Host Evolution (LASSIE) identifies immune-selected HIV variants

    SciTech Connect

    Hraber, Peter; Korber, Bette; Wagh, Kshitij; Giorgi, Elena; Bhattacharya, Tanmoy; Gnanakaran, S.; Lapedes, Alan S.; Learn, Gerald H.; Kreider, Edward F.; Li, Yingying; Shaw, George M.; Hahn, Beatrice H.; Montefiori, David C.; Alam, S. Munir; Bonsignori, Mattia; Moody, M. Anthony; Liao, Hua-Xin; Gao, Feng; Haynes, Barton

    2015-10-21

    Within-host genetic sequencing from samples collected over time provides a dynamic view of how viruses evade host immunity. Immune-driven mutations might stimulate neutralization breadth by selecting antibodies adapted to cycles of immune escape that generate within-subject epitope diversity. Comprehensive identification of immune-escape mutations is experimentally and computationally challenging. With current technology, many more viral sequences can readily be obtained than can be tested for binding and neutralization, making down-selection necessary. Typically, this is done manually, by picking variants that represent different time-points and branches on a phylogenetic tree. Such strategies are likely to miss many relevant mutations and combinations of mutations, and to be redundant for other mutations. Longitudinal Antigenic Sequences and Sites from Intrahost Evolution (LASSIE) uses transmitted founder loss to identify virus “hot-spots” under putative immune selection and chooses sequences that represent recurrent mutations in selected sites. LASSIE favors earliest sequences in which mutations arise. Here, with well-characterized longitudinal Env sequences, we confirmed selected sites were concentrated in antibody contacts and selected sequences represented diverse antigenic phenotypes. Finally, practical applications include rapidly identifying immune targets under selective pressure within a subject, selecting minimal sets of reagents for immunological assays that characterize evolving antibody responses, and for immunogens in polyvalent “cocktail” vaccines.

  8. Longitudinal Antigenic Sequences and Sites from Intra-Host Evolution (LASSIE) Identifies Immune-Selected HIV Variants

    PubMed Central

    Hraber, Peter; Korber, Bette; Wagh, Kshitij; Giorgi, Elena E.; Bhattacharya, Tanmoy; Gnanakaran, S.; Lapedes, Alan S.; Learn, Gerald H.; Kreider, Edward F.; Li, Yingying; Shaw, George M.; Hahn, Beatrice H.; Montefiori, David C.; Alam, S. Munir; Bonsignori, Mattia; Moody, M. Anthony; Liao, Hua-Xin; Gao, Feng; Haynes, Barton F.

    2015-01-01

    Within-host genetic sequencing from samples collected over time provides a dynamic view of how viruses evade host immunity. Immune-driven mutations might stimulate neutralization breadth by selecting antibodies adapted to cycles of immune escape that generate within-subject epitope diversity. Comprehensive identification of immune-escape mutations is experimentally and computationally challenging. With current technology, many more viral sequences can readily be obtained than can be tested for binding and neutralization, making down-selection necessary. Typically, this is done manually, by picking variants that represent different time-points and branches on a phylogenetic tree. Such strategies are likely to miss many relevant mutations and combinations of mutations, and to be redundant for other mutations. Longitudinal Antigenic Sequences and Sites from Intrahost Evolution (LASSIE) uses transmitted founder loss to identify virus “hot-spots” under putative immune selection and chooses sequences that represent recurrent mutations in selected sites. LASSIE favors earliest sequences in which mutations arise. With well-characterized longitudinal Env sequences, we confirmed selected sites were concentrated in antibody contacts and selected sequences represented diverse antigenic phenotypes. Practical applications include rapidly identifying immune targets under selective pressure within a subject, selecting minimal sets of reagents for immunological assays that characterize evolving antibody responses, and for immunogens in polyvalent “cocktail” vaccines. PMID:26506369

  9. SNPs in putative regulatory regions identified by human mouse comparative sequencing and transcription factor binding site data

    SciTech Connect

    Banerjee, Poulabi; Bahlo, Melanie; Schwartz, Jody R.; Loots, Gabriela G.; Houston, Kathryn A.; Dubchak, Inna; Speed, Terence P.; Rubin, Edward M.

    2002-01-01

    Genome wide disease association analysis using SNPs is being explored as a method for dissecting complex genetic traits and a vast number of SNPs have been generated for this purpose. As there are cost and throughput limitations of genotyping large numbers of SNPs and statistical issues regarding the large number of dependent tests on the same data set, to make association analysis practical it has been proposed that SNPs should be prioritized based on likely functional importance. The most easily identifiable functional SNPs are coding SNPs (cSNPs) and accordingly cSNPs have been screened in a number of studies. SNPs in gene regulatory sequences embedded in noncoding DNA are another class of SNPs suggested for prioritization due to their predicted quantitative impact on gene expression. The main challenge in evaluating these SNPs, in contrast to cSNPs is a lack of robust algorithms and databases for recognizing regulatory sequences in noncoding DNA. Approaches that have been previously used to delineate noncoding sequences with gene regulatory activity include cross-species sequence comparisons and the search for sequences recognized by transcription factors. We combined these two methods to sift through mouse human genomic sequences to identify putative gene regulatory elements and subsequently localized SNPs within these sequences in a 1 Megabase (Mb) region of human chromosome 5q31, orthologous to mouse chromosome 11 containing the Interleukin cluster.

  10. Complete Genome Sequence of a Genotype G23P[37] Pheasant Rotavirus Strain Identified in Hungary

    PubMed Central

    Gál, János; Marton, Szilvia; Ihász, Katalin; Papp, Hajnalka; Jakab, Ferenc; Malik, Yashpal S.; Bányai, Krisztián

    2016-01-01

    We investigated the genomic properties of a rotavirus A strain isolated from diarrheic pheasant poults in Hungary in 2015. Sequence analyses revealed a shared genomic constellation (G23-P[37]-I4-R4-C4-M4-A16-N10-T4-E4-H4) and close relationship (range of nucleotide sequence similarity: VP2, 88%; VP1 and NSP4, 98%) with another pheasant rotavirus strain isolated previously in Germany. PMID:27034484

  11. Complete Genome Sequence of a Genotype G23P[37] Pheasant Rotavirus Strain Identified in Hungary.

    PubMed

    Gál, János; Marton, Szilvia; Ihász, Katalin; Papp, Hajnalka; Jakab, Ferenc; Malik, Yashpal S; Bányai, Krisztián; Farkas, Szilvia L

    2016-03-31

    We investigated the genomic properties of a rotavirus A strain isolated from diarrheic pheasant poults in Hungary in 2015. Sequence analyses revealed a shared genomic constellation (G23-P[37]-I4-R4-C4-M4-A16-N10-T4-E4-H4) and close relationship (range of nucleotide sequence similarity: VP2, 88%; VP1 and NSP4, 98%) with another pheasant rotavirus strain isolated previously in Germany.

  12. The amino acid sequence of goat beta-lactoglobulin.

    PubMed

    Préaux, G; Braunitzer, G; Schrank, B; Stangl, A

    1979-11-01

    The isolation of beta-lactoglobulin from milk of the goat is described. The purified protein was checked for purity and has been characterized by its gross composition and end groups. The native or the modified protein was then degraded by tryptic and cyanogen bromide cleavage. The cleavage products were isolated and sequenced in the sequenator using a Quadrol and propyne program. These data provide the complete sequence of beta-lactoglobulin of the goat. The results are discussed and compared particularly with bovine beta-lactoglobulin components AB. Some biological aspects are described.

  13. Layered materials with coexisting acidic and basic sites for catalytic one-pot reaction sequences.

    PubMed

    Motokura, Ken; Tada, Mizuki; Iwasawa, Yasuhiro

    2009-06-17

    Acidic montmorillonite-immobilized primary amines (H-mont-NH(2)) were found to be excellent acid-base bifunctional catalysts for one-pot reaction sequences, which are the first materials with coexisting acid and base sites active for acid-base tamdem reactions. For example, tandem deacetalization-Knoevenagel condensation proceeded successfully with the H-mont-NH(2), affording the corresponding condensation product in a quantitative yield. The acidity of the H-mont-NH(2) was strongly influenced by the preparation solvent, and the base-catalyzed reactions were enhanced by interlayer acid sites.

  14. Next-generation RAD sequencing identifies thousands of SNPs for assessing hybridization between rainbow and westslope cutthroat trout.

    PubMed

    Hohenlohe, Paul A; Amish, Stephen J; Catchen, Julian M; Allendorf, Fred W; Luikart, Gordon

    2011-03-01

    The increased numbers of genetic markers produced by genomic techniques have the potential to both identify hybrid individuals and localize chromosomal regions responding to selection and contributing to introgression. We used restriction-site-associated DNA sequencing to identify a dense set of candidate SNP loci with fixed allelic differences between introduced rainbow trout (Oncorhynchus mykiss) and native westslope cutthroat trout (Oncorhynchus clarkii lewisi). We distinguished candidate SNPs from homeologs (paralogs resulting from whole-genome duplication) by detecting excessively high observed heterozygosity and deviations from Hardy-Weinberg proportions. We identified 2923 candidate species-specific SNPs from a single Illumina sequencing lane containing 24 barcode-labelled individuals. Published sequence data and ongoing genome sequencing of rainbow trout will allow physical mapping of SNP loci for genome-wide scans and will also provide flanking sequence for design of qPCR-based TaqMan(®) assays for high-throughput, low-cost hybrid identification using a subset of 50-100 loci. This study demonstrates that it is now feasible to identify thousands of informative SNPs in nonmodel species quickly and at reasonable cost, even if no prior genomic information is available.

  15. K-Pax2: Bayesian identification of cluster-defining amino acid positions in large sequence datasets

    PubMed Central

    Grad, Yonatan; Cobey, Sarah; Puranen, Juha Santeri; Corander, Jukka

    2015-01-01

    The recent growth in publicly available sequence data has introduced new opportunities for studying microbial evolution and spread. Because the pace of sequence accumulation tends to exceed the pace of experimental studies of protein function and the roles of individual amino acids, statistical tools to identify meaningful patterns in protein diversity are essential. Large sequence alignments from fast-evolving micro-organisms are particularly challenging to dissect using standard tools from phylogenetics and multivariate statistics because biologically relevant functional signals are easily masked by neutral variation and noise. To meet this need, a novel computational method is introduced that is easily executed in parallel using a cluster environment and can handle thousands of sequences with minimal subjective input from the user. The usefulness of this kind of machine learning is demonstrated by applying it to nearly 5000 haemagglutinin sequences of influenza A/H3N2.Antigenic and 3D structural mapping of the results show that the method can recover the major jumps in antigenic phenotype that occurred between 1968 and 2013 and identify specific amino acids associated with these changes. The method is expected to provide a useful tool to uncover patterns of protein evolution. PMID:28348810

  16. Purification and partial amino acid sequence of the chloroplast cytochrome b-559.

    PubMed

    Widger, W R; Cramer, W A; Hermodson, M; Meyer, D; Gullifor, M

    1984-03-25

    The hydrophobic cytochrome b-559, purified from unstacked, ethanol-washed spinach thylakoid membranes, using extraction with 2% Triton X-100 in 4 M urea and three chromatographic steps in the presence of protease inhibitors, has a dominant band on sodium dodecyl sulfate-urea gels corresponding to Mr = 10,000. The yield of this preparation is 30-50% (5-10 mg) starting with 600 mg of chlorophyll. The heme content yields a calculated molecular weight of no more than 17,500/heme, and perhaps somewhat smaller after correction for impurities. The Mr = 10,000 band is stained by the tetramethylbenzidine-H2O2 heme reagent on lithium dodecyl sulfate gels run at 0 degrees C. The Mr = 10,000 protein, further separated by high performance liquid chromatography, contains a unique NH2 terminus that is not blocked, and the amino acid sequence for the first 27 residues is NH2-Ser-Gly-Ser-Thr-Gly-Glu-Arg-Ser-Phe-Ala-Asp-Ile-Ile-Thr-Ser-Ile-Arg-Tyr-Trp -Val-Ile-X-Ser-Ile-Thr-Ile-Pro. . . COOH. Approximately 55% of the amino acids are hydrophobic, based on amino acid analysis of the Mr = 10,000 peptide, which also indicated the presence of at least one histidine. Only one cytochrome b-559 component could be identified, whose yield indicated that it arises from a single b-559 protein in chloroplasts corresponding to the in situ high potential cytochrome of the chloroplast photosystem II.

  17. Unraveling the origin of Cladocera by identifying heterochrony in the developmental sequences of Branchiopoda

    PubMed Central

    2013-01-01

    Introduction One of the most interesting riddles within crustaceans is the origin of Cladocera (water fleas). Cladocerans are morphologically diverse and in terms of size and body segmentation differ considerably from other branchiopod taxa (Anostraca, Notostraca, Laevicaudata, Spinicaudata and Cyclestherida). In 1876, the famous zoologist Carl Claus proposed with regard to their origin that cladocerans might have evolved from a precociously maturing larva of a clam shrimp-like ancestor which was able to reproduce at this early stage of development. In order to shed light on this shift in organogenesis and to identify (potential) changes in the chronology of development (heterochrony), we investigated the external and internal development of the ctenopod Penilia avirostris and compared it to development in representatives of Anostraca, Notostraca, Laevicaudata, Spinicaudata and Cyclestherida. The development of the nervous system was investigated using immunohistochemical labeling and confocal microscopy. External morphological development was followed using a scanning electron microscope and confocal microscopy to detect the autofluorescence of the external cuticle. Results In Anostraca, Notostraca, Laevicaudata and Spinicaudata development is indirect and a free-swimming nauplius hatches from resting eggs. In contrast, development in Cyclestherida and Cladocera, in which non-swimming embryo-like larvae hatch from subitaneous eggs (without a resting phase) is defined herein as pseudo-direct and differs considerably from that of the other groups. Both external and internal development in Anostraca, Notostraca, Laevicaudata and Spinicaudata is directed from anterior to posterior, whereas in Cyclestherida and Cladocera differentiation is more synchronous. Conclusions In this study, developmental sequences from representatives of all branchiopod taxa are compared and analyzed using a Parsimov event-pairing approach. The analysis reveals clear evolutionary

  18. Synthesis of gamma,delta-unsaturated glycolic acids via sequenced brook and Ireland--claisen rearrangements.

    PubMed

    Schmitt, Daniel C; Johnson, Jeffrey S

    2010-03-05

    Organozinc, -magnesium, and -lithium nucleophiles initiate a Brook/Ireland-Claisen rearrangement sequence of allylic silyl glyoxylates resulting in the formation of gamma,delta-unsaturated alpha-silyloxy acids.

  19. Computer Simulation of the Determination of Amino Acid Sequences in Polypeptides

    ERIC Educational Resources Information Center

    Daubert, Stephen D.; Sontum, Stephen F.

    1977-01-01

    Describes a computer program that generates a random string of amino acids and guides the student in determining the correct sequence of a given protein by using experimental analytic data for that protein. (MLH)

  20. Real-Time Nucleic Acid Sequence-Based Amplification Assay for Detection of Hepatitis A Virus

    PubMed Central

    Abd El Galil, Khaled H.; El Sokkary, M. A.; Kheira, S. M.; Salazar, Andre M.; Yates, Marylynn V.; Chen, Wilfred; Mulchandani, Ashok

    2005-01-01

    A nucleic acid sequence-based amplification (NASBA) assay in combination with a molecular beacon was developed for the real-time detection and quantification of hepatitis A virus (HAV). A 202-bp, highly conserved 5′ noncoding region of HAV was targeted. The sensitivity of the real-time NASBA assay was tested with 10-fold dilutions of viral RNA, and a detection limit of 1 PFU was obtained. The specificity of the assay was demonstrated by testing with other environmental pathogens and indicator microorganisms, with only HAV positively identified. When combined with immunomagnetic separation, the NASBA assay successfully detected as few as 10 PFU from seeded lake water samples. Due to its isothermal nature, its speed, and its similar sensitivity compared to the real-time RT-PCR assay, this newly reported real-time NASBA method will have broad applications for the rapid detection of HAV in contaminated food or water. PMID:16269748

  1. De Novo Transcriptome Analysis of Warburgia ugandensis to Identify Genes Involved in Terpenoids and Unsaturated Fatty Acids Biosynthesis

    PubMed Central

    Wang, Xin; Zhou, Chen; Yang, Xianpeng; Miao, Di; Zhang, Yansheng

    2015-01-01

    The bark of Warburgia ugandensis (Canellaceae family) has been used as a medicinal source for a long history in many African countries. The presence of diverse terpenoids and abundant polyunsaturated fatty acids (PUFAs) in this organ contributes to its broad range of pharmacological properties. Despite its medicinal and economic importance, the knowledge on the biosynthesis of terpenoid and unsaturated fatty acid in W. ugandensis bark remains largely unknown. Therefore, it is necessary to construct a genomic and/or transcriptomic database for the functional genomics study on W. ugandensis. The chemical profiles of terpenoids and fatty acids between the bark and leaves of W. ugandensis were compared by gas chromatography-mass spectrometry (GC-MS) analysis. Meanwhile, the transcriptome database derived from both tissues was created using Illumina sequencing technology. In total, about 17.1 G clean nucleotides were obtained, and de novo assembled into 72,591 unigenes, of which about 38.06% can be aligned to the NCBI non-redundant protein database. Many candidate genes in the biosynthetic pathways of terpenoids and unsaturated fatty acids were identified, including 14 unigenes for terpene synthases. Furthermore, 2,324 unigenes were discovered to be differentially expressed between both tissues; the functions of those differentially expressed genes (DEGs) were predicted by gene ontology enrichment and metabolic pathway enrichment analyses. In addition, the expression of 12 DEGs with putative roles in terpenoid and unsaturated fatty acid metabolic pathways was confirmed by qRT-PCRs, which was consistent with the data of the RNA-sequencing. In conclusion, we constructed a comprehensive transcriptome dataset derived from the bark and leaf of W. ugandensis, which forms the basis for functional genomics studies on this plant species. Particularly, the comparative analysis of the transcriptome data between the bark and leaf will provide critical clues to reveal the regulatory

  2. De Novo Transcriptome Analysis of Warburgia ugandensis to Identify Genes Involved in Terpenoids and Unsaturated Fatty Acids Biosynthesis.

    PubMed

    Wang, Xin; Zhou, Chen; Yang, Xianpeng; Miao, Di; Zhang, Yansheng

    2015-01-01

    The bark of Warburgia ugandensis (Canellaceae family) has been used as a medicinal source for a long history in many African countries. The presence of diverse terpenoids and abundant polyunsaturated fatty acids (PUFAs) in this organ contributes to its broad range of pharmacological properties. Despite its medicinal and economic importance, the knowledge on the biosynthesis of terpenoid and unsaturated fatty acid in W. ugandensis bark remains largely unknown. Therefore, it is necessary to construct a genomic and/or transcriptomic database for the functional genomics study on W. ugandensis. The chemical profiles of terpenoids and fatty acids between the bark and leaves of W. ugandensis were compared by gas chromatography-mass spectrometry (GC-MS) analysis. Meanwhile, the transcriptome database derived from both tissues was created using Illumina sequencing technology. In total, about 17.1 G clean nucleotides were obtained, and de novo assembled into 72,591 unigenes, of which about 38.06% can be aligned to the NCBI non-redundant protein database. Many candidate genes in the biosynthetic pathways of terpenoids and unsaturated fatty acids were identified, including 14 unigenes for terpene synthases. Furthermore, 2,324 unigenes were discovered to be differentially expressed between both tissues; the functions of those differentially expressed genes (DEGs) were predicted by gene ontology enrichment and metabolic pathway enrichment analyses. In addition, the expression of 12 DEGs with putative roles in terpenoid and unsaturated fatty acid metabolic pathways was confirmed by qRT-PCRs, which was consistent with the data of the RNA-sequencing. In conclusion, we constructed a comprehensive transcriptome dataset derived from the bark and leaf of W. ugandensis, which forms the basis for functional genomics studies on this plant species. Particularly, the comparative analysis of the transcriptome data between the bark and leaf will provide critical clues to reveal the regulatory

  3. Extensive mutagenesis of a transcriptional activation domain identifies single hydrophobic and acidic amino acids important for activation in vivo.

    PubMed Central

    Sainz, M B; Goff, S A; Chandler, V L

    1997-01-01

    C1 is a transcriptional activator of genes encoding biosynthetic enzymes of the maize anthocyanin pigment pathway. C1 has an amino terminus homologous to Myb DNA-binding domains and an acidic carboxyl terminus that is a transcriptional activation domain in maize and yeast cells. To identify amino acids critical for transcriptional activation, an extensive random mutagenesis of the C1 carboxyl terminus was done. The C1 activation domain is remarkably tolerant of amino acid substitutions, as changes at 34 residues had little or no effect on transcriptional activity. These changes include introduction of helix-incompatible amino acids throughout the C1 activation domain and alteration of most single acidic amino acids, suggesting that a previously postulated amphipathic alpha-helix is not required for activation. Substitutions at two positions revealed amino acids important for transcriptional activation. Replacement of leucine 253 with a proline or glutamine resulted in approximately 10% of wild-type transcriptional activation. Leucine 253 is in a region of C1 in which several hydrophobic residues align with residues important for transcriptional activation by the herpes simplex virus VP16 protein. However, changes at all other hydrophobic residues in C1 indicate that none are critical for C1 transcriptional activation. The other important amino acid in C1 is aspartate 262, as a change to valine resulted in only 24% of wild-type transcriptional activation. Comparison of our C1 results with those from VP16 reveal substantial differences in which amino acids are required for transcriptional activation in vivo by these two acidic activation domains. PMID:8972191

  4. The indicator amino acid oxidation method identified limiting amino acids in two parenteral nutrition solutions in neonatal piglets.

    PubMed

    Brunton, Janet A; Shoveller, Anna K; Pencharz, Paul B; Ball, Ronald O

    2007-05-01

    Recent studies using the indicator amino acid oxidation (IAAO) technique in TPN-fed piglets and infants have been instrumental in defining parenteral amino acid requirements. None of the commercial products in use are ideal when assessed against these new data. Our objectives were to determine whether the oxidation of an indicator amino acid would decline with the addition of amino acids that were limiting in the diets of TPN-fed piglets, and to use this technique to identify limiting amino acids in a new amino acid profile. Piglets (n = 26) were randomized to receive TPN with amino acids provided by Vaminolact (VM) or by a new profile (NP). After 5 d of TPN administration, lysine oxidation was measured using a constant infusion of L- [1-(14)C]-lysine. Immediately following the first IAAO study, the piglets were further randomized within diet group to receive either 1) supplemental aromatic amino acids (AAA), 2) sulfur amino acids (SAA) or 3) both (AAA+SAA) (n = 4-5 per treatment group). A second IAAO study was carried out 18 h later. In the first IAAO study, lysine oxidation was high for both groups (18 vs. 21% for VM and NP, respectively, P = 0.055). The addition of AAA to VM induced a 30% decline in lysine oxidation compared with baseline (P < 0.01). Similarly, SAA added to NP lowered lysine oxidation by approximately 30% (P < 0.01). The application of the IAAO technique facilitates rapid evaluation of the amino acids that are limiting to protein synthesis in parenteral solutions.

  5. Frequencies of amino acid strings in globular protein sequences indicate suppression of blocks of consecutive hydrophobic residues

    PubMed Central

    Schwartz, Russell; Istrail, Sorin; King, Jonathan

    2001-01-01

    Patterns of hydrophobic and hydrophilic residues play a major role in protein folding and function. Long, predominantly hydrophobic strings of 20–22 amino acids each are associated with transmembrane helices and have been used to identify such sequences. Much less attention has been paid to hydrophobic sequences within globular proteins. In prior work on computer simulations of the competition between on-pathway folding and off-pathway aggregate formation, we found that long sequences of consecutive hydrophobic residues promoted aggregation within the model, even controlling for overall hydrophobic content. We report here on an analysis of the frequencies of different lengths of contiguous blocks of hydrophobic residues in a database of amino acid sequences of proteins of known structure. Sequences of three or more consecutive hydrophobic residues are found to be significantly less common in actual globular proteins than would be predicted if residues were selected independently. The result may reflect selection against long blocks of hydrophobic residues within globular proteins relative to what would be expected if residue hydrophobicities were independent of those of nearby residues in the sequence. PMID:11316883

  6. Genome sequence of the acid-tolerant strain Rhizobium sp. LPU83.

    PubMed

    Wibberg, Daniel; Tejerizo, Gonzalo Torres; Del Papa, María Florencia; Martini, Carla; Pühler, Alfred; Lagares, Antonio; Schlüter, Andreas; Pistorio, Mariano

    2014-04-20

    Rhizobia are important members of the soil microbiome since they enter into nitrogen-fixing symbiosis with different legume host plants. Rhizobium sp. LPU83 is an acid-tolerant Rhizobium strain featuring a broad-host-range. However, it is ineffective in nitrogen fixation. Here, the improved draft genome sequence of this strain is reported. Genome sequence information provides the basis for analysis of its acid tolerance, symbiotic properties and taxonomic classification.

  7. The amino acid sequence of monal pheasant lysozyme and its activity.

    PubMed

    Araki, T; Matsumoto, T; Torikata, T

    1998-10-01

    The amino acid sequence of monal pheasant lysozyme and its activity were analyzed. Carboxymethylated lysozyme was digested with trypsin and the resulting peptides were sequenced. The established amino acid sequence had one amino acid substitution at position 102 (Arg to Gly) comparing with Indian peafowl lysozyme and four amino acid substitutions at positions 3 (Phe to Tyr), 15 (His to Leu), 41 (Gln to His), and 121 (Gln to His) with chicken lysozyme. Analysis of the time-courses of reaction using N-acetylglucosamine pentamer as a substrate showed a difference of binding free energy change (-0.4 kcal/mol) at subsites A between monal pheasant and Indian peafowl lysozyme. This was assumed to be caused by the amino acid substitution at subsite A with loss of a positive charge at position 102 (Arg102 to Gly).

  8. Multiple Genome Sequences of Important Beer-Spoiling Lactic Acid Bacteria

    PubMed Central

    Geissler, Andreas J.; Vogel, Rudi F.

    2016-01-01

    Seven strains of important beer-spoiling lactic acid bacteria were sequenced using single-molecule real-time sequencing. Complete genomes were obtained for strains of Lactobacillus paracollinoides, Lactobacillus lindneri, and Pediococcus claussenii. The analysis of these genomes emphasizes the role of plasmids as the genomic foundation of beer-spoiling ability. PMID:27795248

  9. Whole-genome sequencing in autism identifies hot spots for de novo germline mutation.

    PubMed

    Michaelson, Jacob J; Shi, Yujian; Gujral, Madhusudan; Zheng, Hancheng; Malhotra, Dheeraj; Jin, Xin; Jian, Minghan; Liu, Guangming; Greer, Douglas; Bhandari, Abhishek; Wu, Wenting; Corominas, Roser; Peoples, Aine; Koren, Amnon; Gore, Athurva; Kang, Shuli; Lin, Guan Ning; Estabillo, Jasper; Gadomski, Therese; Singh, Balvindar; Zhang, Kun; Akshoomoff, Natacha; Corsello, Christina; McCarroll, Steven; Iakoucheva, Lilia M; Li, Yingrui; Wang, Jun; Sebat, Jonathan

    2012-12-21

    De novo mutation plays an important role in autism spectrum disorders (ASDs). Notably, pathogenic copy number variants (CNVs) are characterized by high mutation rates. We hypothesize that hypermutability is a property of ASD genes and may also include nucleotide-substitution hot spots. We investigated global patterns of germline mutation by whole-genome sequencing of monozygotic twins concordant for ASD and their parents. Mutation rates varied widely throughout the genome (by 100-fold) and could be explained by intrinsic characteristics of DNA sequence and chromatin structure. Dense clusters of mutations within individual genomes were attributable to compound mutation or gene conversion. Hypermutability was a characteristic of genes involved in ASD and other diseases. In addition, genes impacted by mutations in this study were associated with ASD in independent exome-sequencing data sets. Our findings suggest that regional hypermutation is a significant factor shaping patterns of genetic variation and disease risk in humans.

  10. Universal and domain-specific sequences in 23S–28S ribosomal RNA identified by computational phylogenetics

    PubMed Central

    Doris, Stephen M.; Smith, Deborah R.; Beamesderfer, Julia N.; Raphael, Benjamin J.; Nathanson, Judith A.; Gerbi, Susan A.

    2015-01-01

    Comparative analysis of ribosomal RNA (rRNA) sequences has elucidated phylogenetic relationships. However, this powerful approach has not been fully exploited to address ribosome function. Here we identify stretches of evolutionarily conserved sequences, which correspond with regions of high functional importance. For this, we developed a structurally aligned database, FLORA (full-length organismal rRNA alignment) to identify highly conserved nucleotide elements (CNEs) in 23S–28S rRNA from each phylogenetic domain (Eukarya, Bacteria, and Archaea). Universal CNEs (uCNEs) are conserved in sequence and structural position in all three domains. Those in regions known to be essential for translation validate our approach. Importantly, some uCNEs reside in areas of unknown function, thus identifying novel sequences of likely great importance. In contrast to uCNEs, domain-specific CNEs (dsCNEs) are conserved in just one phylogenetic domain. This is the first report of conserved sequence elements in rRNA that are domain-specific; they are largely a eukaryotic phenomenon. The locations of the eukaryotic dsCNEs within the structure of the ribosome suggest they may function in nascent polypeptide transit through the ribosome tunnel and in tRNA exit from the ribosome. Our findings provide insights and a resource for ribosome function studies. PMID:26283689

  11. PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences.

    PubMed

    Mirarab, Siavash; Nguyen, Nam; Guo, Sheng; Wang, Li-San; Kim, Junhyong; Warnow, Tandy

    2015-05-01

    We introduce PASTA, a new multiple sequence alignment algorithm. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy and scalability of the leading alignment methods (including SATé). We also show that trees estimated on PASTA alignments are highly accurate--slightly better than SATé trees, but with substantial improvements relative to other methods. Finally, PASTA is faster than SATé, highly parallelizable, and requires relatively little memory.

  12. Benefits and Challenges with Applying Unique Molecular Identifiers in Next Generation Sequencing to Detect Low Frequency Mutations.

    PubMed

    Kou, Ruqin; Lam, Ham; Duan, Hairong; Ye, Li; Jongkam, Narisra; Chen, Weizhi; Zhang, Shifang; Li, Shihong

    2016-01-01

    Indexing individual template molecules with a unique identifier (UID) before PCR and deep sequencing is promising for detecting low frequency mutations, as true mutations could be distinguished from PCR errors or sequencing errors based on consensus among reads sharing same index. In an effort to develop a robust assay to detect from urine low-abundant bladder cancer cells carrying well-documented mutations, we have tested the idea first on a set of mock templates, with wild type and known mutants mixed at defined ratios. We have measured the combined error rate for PCR and Illumina sequencing at each nucleotide position of three exons, and demonstrated the power of a UID in distinguishing and correcting errors. In addition, we have demonstrated that PCR sampling bias, rather than PCR errors, challenges the UID-deep sequencing method in faithfully detecting low frequency mutation.

  13. Exploring the genetic architecture of inflammatory bowel disease by whole-genome sequencing identifies association at ADCY7.

    PubMed

    Luo, Yang; de Lange, Katrina M; Jostins, Luke; Moutsianas, Loukas; Randall, Joshua; Kennedy, Nicholas A; Lamb, Christopher A; McCarthy, Shane; Ahmad, Tariq; Edwards, Cathryn; Serra, Eva Goncalves; Hart, Ailsa; Hawkey, Chris; Mansfield, John C; Mowat, Craig; Newman, William G; Nichols, Sam; Pollard, Martin; Satsangi, Jack; Simmons, Alison; Tremelling, Mark; Uhlig, Holm; Wilson, David C; Lee, James C; Prescott, Natalie J; Lees, Charlie W; Mathew, Christopher G; Parkes, Miles; Barrett, Jeffrey C; Anderson, Carl A

    2017-02-01

    To further resolve the genetic architecture of the inflammatory bowel diseases ulcerative colitis and Crohn's disease, we sequenced the whole genomes of 4,280 patients at low coverage and compared them to 3,652 previously sequenced population controls across 73.5 million variants. We then imputed from these sequences into new and existing genome-wide association study cohorts and tested for association at ∼12 million variants in a total of 16,432 cases and 18,843 controls. We discovered a 0.6% frequency missense variant in ADCY7 that doubles the risk of ulcerative colitis. Despite good statistical power, we did not identify any other new low-frequency risk variants and found that such variants explained little heritability. We detected a burden of very rare, damaging missense variants in known Crohn's disease risk genes, suggesting that more comprehensive sequencing studies will continue to improve understanding of the biology of complex diseases.

  14. An Efficient Method for Identifying Gene Fusions by Targeted RNA Sequencing from Fresh Frozen and FFPE Samples.

    PubMed

    Scolnick, Jonathan A; Dimon, Michelle; Wang, I-Ching; Huelga, Stephanie C; Amorese, Douglas A

    2015-01-01

    Fusion genes are known to be key drivers of tumor growth in several types of cancer. Traditionally, detecting fusion genes has been a difficult task based on fluorescent in situ hybridization to detect chromosomal abnormalities. More recently, RNA sequencing has enabled an increased pace of fusion gene identification. However, RNA-Seq is inefficient for the identification of fusion genes due to the high number of sequencing reads needed to detect the small number of fusion transcripts present in cells of interest. Here we describe a method, Single Primer Enrichment Technology (SPET), for targeted RNA sequencing that is customizable to any target genes, is simple to use, and efficiently detects gene fusions. Using SPET to target 5701 exons of 401 known cancer fusion genes for sequencing, we were able to identify known and previously unreported gene fusions from both fresh-frozen and formalin-fixed paraffin-embedded (FFPE) tissue RNA in both normal tissue and cancer cells.

  15. An Efficient Method for Identifying Gene Fusions by Targeted RNA Sequencing from Fresh Frozen and FFPE Samples

    PubMed Central

    Scolnick, Jonathan A.; Dimon, Michelle; Wang, I-Ching; Huelga, Stephanie C.; Amorese, Douglas A.

    2015-01-01

    Fusion genes are known to be key drivers of tumor growth in several types of cancer. Traditionally, detecting fusion genes has been a difficult task based on fluorescent in situ hybridization to detect chromosomal abnormalities. More recently, RNA sequencing has enabled an increased pace of fusion gene identification. However, RNA-Seq is inefficient for the identification of fusion genes due to the high number of sequencing reads needed to detect the small number of fusion transcripts present in cells of interest. Here we describe a method, Single Primer Enrichment Technology (SPET), for targeted RNA sequencing that is customizable to any target genes, is simple to use, and efficiently detects gene fusions. Using SPET to target 5701 exons of 401 known cancer fusion genes for sequencing, we were able to identify known and previously unreported gene fusions from both fresh-frozen and formalin-fixed paraffin-embedded (FFPE) tissue RNA in both normal tissue and cancer cells. PMID:26132974

  16. Newly Identified Enterovirus C Genotypes, Identified in the Netherlands through Routine Sequencing of All Enteroviruses Detected in Clinical Materials from 2008 to 2015

    PubMed Central

    Poelman, Randy; Borger, Renze; Niesters, Hubert G. M.

    2016-01-01

    Enteroviruses (EVs) are a group of human and animal viruses that are capable of causing a variety of clinical syndromes. Different genotypes classified into species can be distinguished on the basis of sequence divergence in the VP1 capsid-coding region. Apparently new genotypes are discovered regularly, often as incidental findings in studies investigating respiratory syndromes or as part of poliovirus surveillance. Recently, some EVs have become recognized as significant respiratory pathogens, and a number of new genotypes belonging to species C have been identified. The circulation of these newly identified species C EVs, such as EV-C104, EV-C105, EV-C109, and EV-C117, nevertheless appears to be limited. In this report, we show the results of routine genotyping of all enteroviruses detected in our tertiary care hospital between January 2008 and April 2015. We detected 365 EVs belonging to 40 genotypes. Interestingly, several newly identified species C EVs were detected during the study period. Sequencing of the 5′ untranslated region (5′ UTR) of these viruses shows divergence in this region, which is a target region in many detection assays. PMID:27358467

  17. Newly Identified Enterovirus C Genotypes, Identified in the Netherlands through Routine Sequencing of All Enteroviruses Detected in Clinical Materials from 2008 to 2015.

    PubMed

    Van Leer-Buter, Coretta C; Poelman, Randy; Borger, Renze; Niesters, Hubert G M

    2016-09-01

    Enteroviruses (EVs) are a group of human and animal viruses that are capable of causing a variety of clinical syndromes. Different genotypes classified into species can be distinguished on the basis of sequence divergence in the VP1 capsid-coding region. Apparently new genotypes are discovered regularly, often as incidental findings in studies investigating respiratory syndromes or as part of poliovirus surveillance. Recently, some EVs have become recognized as significant respiratory pathogens, and a number of new genotypes belonging to species C have been identified. The circulation of these newly identified species C EVs, such as EV-C104, EV-C105, EV-C109, and EV-C117, nevertheless appears to be limited. In this report, we show the results of routine genotyping of all enteroviruses detected in our tertiary care hospital between January 2008 and April 2015. We detected 365 EVs belonging to 40 genotypes. Interestingly, several newly identified species C EVs were detected during the study period. Sequencing of the 5' untranslated region (5' UTR) of these viruses shows divergence in this region, which is a target region in many detection assays.

  18. Genome Sequence of Cauliflower Mosaic Virus Identified in Earwigs (Doru luteipes) through a Metagenomic Approach

    PubMed Central

    Godinho, Márcio Tadeu; Paula, Débora Pires; Varsani, Arvind

    2017-01-01

    ABSTRACT Here we report the first complete genome sequence of a cauliflower mosaic virus from Brazil, obtained from the gut content of the predator earwig (Doru luteipes). This virus has a genome of 8,030 nucleotides (nt) and shares 97% genome-wide identity with an isolate from Argentina. PMID:28302781

  19. Draft Genome Sequence of "Candidatus Mycoplasma haemobos," a Hemotropic Mycoplasma Identified in Cattle in Mexico.

    PubMed

    Martínez-Ocampo, Fernando; Rodríguez-Camarillo, Sergio D; Amaro-Estrada, Itzel; Quiroz-Castañeda, Rosa Estela

    2016-07-07

    We present here the draft genome sequence of the first "Candidatus Mycoplasma haemobos" strain found in cattle in Mexico. This hemotropic mycoplasma causes acute and chronic disease in animals. This genome is a starting point for studying the role of this mycoplasma in coinfections and synergistic mechanisms associated with the disease.

  20. Identifying Learning Behaviors by Contextualizing Differential Sequence Mining with Action Features and Performance Evolution

    ERIC Educational Resources Information Center

    Kinnebrew, John S.; Biswas, Gautam

    2012-01-01

    Our learning-by-teaching environment, Betty's Brain, captures a wealth of data on students' learning interactions as they teach a virtual agent. This paper extends an exploratory data mining methodology for assessing and comparing students' learning behaviors from these interaction traces. The core algorithm employs sequence mining techniques to…

  1. Identifying microbial fitness determinants by insertion sequencing using genome-wide transposon mutant libraries.

    PubMed

    Goodman, Andrew L; Wu, Meng; Gordon, Jeffrey I

    2011-11-17

    Insertion sequencing (INSeq) is a method for determining the insertion site and relative abundance of large numbers of transposon mutants in a mixed population of isogenic mutants of a sequenced microbial species. INSeq is based on a modified mariner transposon containing MmeI sites at its ends, allowing cleavage at chromosomal sites 16-17 bp from the inserted transposon. Genomic regions adjacent to the transposons are amplified by linear PCR with a biotinylated primer. Products are bound to magnetic beads, digested with MmeI and barcoded with sample-specific linkers appended to each restriction fragment. After limited PCR amplification, fragments are sequenced using a high-throughput instrument. The sequence of each read can be used to map the location of a transposon in the genome. Read count measures the relative abundance of that mutant in the population. Solid-phase library preparation makes this protocol rapid (18 h), easy to scale up, amenable to automation and useful for a variety of samples. A protocol for characterizing libraries of transposon mutant strains clonally arrayed in a multiwell format is provided.

  2. Genome Sequence of Cauliflower Mosaic Virus Identified in Earwigs (Doru luteipes) through a Metagenomic Approach.

    PubMed

    Godinho, Márcio Tadeu; Paula, Débora Pires; Varsani, Arvind; Ribeiro, Simone Graça

    2017-03-16

    Here we report the first complete genome sequence of a cauliflower mosaic virus from Brazil, obtained from the gut content of the predator earwig (Doru luteipes). This virus has a genome of 8,030 nucleotides (nt) and shares 97% genome-wide identity with an isolate from Argentina.

  3. Identifying, Sequencing and Managing Intellectual Risks to Students: Discussion in the Foreign Language Literature Course.

    ERIC Educational Resources Information Center

    Nance, Kimberly A.

    Student apprehension about discussing intellectually "risky" ideas in the foreign language literature class can be addressed through construction of a classroom environment in which students gain confidence. The governing principle is the sequencing of risk. Students perceive risks to be in: (1) making a linguistic error; (2) making an error of…

  4. Diverse Array of New Viral Sequences Identified in Worldwide Populations of the Asian Citrus Psyllid (Diaphorina citri) Using Viral Metagenomics

    PubMed Central

    Nouri, Shahideh; Salem, Nidá; Nigg, Jared C.

    2015-01-01

    ABSTRACT The Asian citrus psyllid, Diaphorina citri, is the natural vector of the causal agent of Huanglongbing (HLB), or citrus greening disease. Together; HLB and D. citri represent a major threat to world citrus production. As there is no cure for HLB, insect vector management is considered one strategy to help control the disease, and D. citri viruses might be useful. In this study, we used a metagenomic approach to analyze viral sequences associated with the global population of D. citri. By sequencing small RNAs and the transcriptome coupled with bioinformatics analysis, we showed that the virus-like sequences of D. citri are diverse. We identified novel viral sequences belonging to the picornavirus superfamily, the Reoviridae, Parvoviridae, and Bunyaviridae families, and an unclassified positive-sense single-stranded RNA virus. Moreover, a Wolbachia prophage-related sequence was identified. This is the first comprehensive survey to assess the viral community from worldwide populations of an agricultural insect pest. Our results provide valuable information on new putative viruses, some of which may have the potential to be used as biocontrol agents. IMPORTANCE Insects have the most species of all animals, and are hosts to, and vectors of, a great variety of known and unknown viruses. Some of these most likely have the potential to be important fundamental and/or practical resources. In this study, we used high-throughput next-generation sequencing (NGS) technology and bioinformatics analysis to identify putative viruses associated with Diaphorina citri, the Asian citrus psyllid. D. citri is the vector of the bacterium causing Huanglongbing (HLB), currently the most serious threat to citrus worldwide. Here, we report several novel viral sequences associated with D. citri. PMID:26676774

  5. Genome sequencing identifies Listeria fleischmannii subsp. coloradonensis subsp. nov., isolated from a ranch.

    PubMed

    den Bakker, Henk C; Manuel, Clyde S; Fortes, Esther D; Wiedmann, Martin; Nightingale, Kendra K

    2013-09-01

    Twenty Listeria-like isolates were obtained from environmental samples collected on a cattle ranch in northern Colorado; all of these isolates were found to share an identical partial sigB sequence, suggesting close relatedness. The isolates were similar to members of the genus Listeria in that they were Gram-stain-positive, short rods, oxidase-negative and catalase-positive; the isolates were similar to Listeria fleischmannii because they were non-motile at 25 °C. 16S rRNA gene sequencing for representative isolates and whole genome sequencing for one isolate was performed. The genome of the type strain of Listeria fleischmannii (strain LU2006-1(T)) was also sequenced. The draft genomes were very similar in size and the average MUMmer nucleotide identity across 91% of the genomes was 95.16%. Genome sequence data were used to design primers for a six-gene multi-locus sequence analysis (MLSA) scheme. Phylogenies based on (i) the near-complete 16S rRNA gene, (ii) 31 core genes and (iii) six housekeeping genes illustrated the close relationship of these Listeria-like isolates to Listeria fleischmannii LU2006-1(T). Sufficient genetic divergence of the Listeria-like isolates from the type strain of Listeria fleischmannii and differing phenotypic characteristics warrant these isolates to be classified as members of a distinct infraspecific taxon, for which the name Listeria fleischmannii subsp. coloradonensis subsp. nov. is proposed. The type strain is TTU M1-001(T) ( =BAA-2414(T) =DSM 25391(T)). The isolates of Listeria fleischmannii subsp. coloradonensis subsp. nov. differ from the nominate subspecies by the inability to utilize melezitose, turanose and sucrose, and the ability to utilize inositol. The results also demonstrate the utility of whole genome sequencing to facilitate identification of novel taxa within a well-described genus. The genomes of both subspecies of Listeria fleischmannii contained putative enhancin genes; the Listeria fleischmannii subsp

  6. Amino acid positions subject to multiple co-evolutionary constraints can be robustly identified by their eigenvector network centrality scores

    PubMed Central

    Parente, Daniel J.; Ray, J. Christian J.; Swint-Kruse, Liskin

    2015-01-01

    As proteins evolve, amino acid positions key to protein structure or function are subject to mutational constraints. These positions can be detected by analyzing sequence families for amino acid conservation or for co-evolution between pairs of positions. Co-evolutionary scores are usually rank-ordered and thresholded to reveal the top pairwise scores, but they also can be treated as weighted networks. Here, we used network analyses to bypass a major complication of co-evolution studies: For a given sequence alignment, alternative algorithms usually identify different, top pairwise scores. We reconciled results from five commonly-used, mathematically divergent algorithms (ELSC, McBASC, OMES, SCA, and ZNMI), using the LacI/GalR and 1,6-bisphosphate aldolase protein families as models. Calculations used unthresholded co-evolution scores from which column-specific properties such as sequence entropy and random noise were subtracted; “central” positions were identified by calculating various network centrality scores. When compared among algorithms, network centrality methods, particularly eigenvector centrality, showed markedly better agreement than comparisons of the top pairwise scores. Positions with large centrality scores occurred at key structural locations and/or were functionally sensitive to mutations. Further, the top central positions often differed from those with top pairwise co-evolution scores: Instead of a few strong scores, central positions often had multiple, moderate scores. We conclude that eigenvector centrality calculations reveal a robust evolutionary pattern of constraints – detectable by divergent algorithms – that occur at key protein locations. Finally, we discuss the fact that multiple patterns co-exist in evolutionary data that, together, give rise to emergent protein functions. PMID:26503808

  7. Many amino acid substitution variants identified in DNA repair genes during human population screenings are predicted to impact protein function

    SciTech Connect

    Xi, T; Jones, I M; Mohrenweiser, H W

    2003-11-03

    Over 520 different amino acid substitution variants have been previously identified in the systematic screening of 91 human DNA repair genes for sequence variation. Two algorithms were employed to predict the impact of these amino acid substitutions on protein activity. Sorting Intolerant From Tolerant (SIFT) classified 226 of 508 variants (44%) as ''Intolerant''. Polymorphism Phenotyping (PolyPhen) classed 165 of 489 amino acid substitutions (34%) as ''Probably or Possibly Damaging''. Another 9-15% of the variants were classed as ''Potentially Intolerant or Damaging''. The results from the two algorithms are highly associated, with concordance in predicted impact observed for {approx}62% of the variants. Twenty one to thirty one percent of the variant proteins are predicted to exhibit reduced activity by both algorithms. These variants occur at slightly lower individual allele frequency than do the variants classified as ''Tolerant'' or ''Benign''. Both algorithms correctly predicted the impact of 26 functionally characterized amino acid substitutions in the APE1 protein on biochemical activity, with one exception. It is concluded that a substantial fraction of the missense variants observed in the general human population are functionally relevant. These variants are expected to be the molecular genetic and biochemical basis for the associations of reduced DNA repair capacity phenotypes with elevated cancer risk.

  8. Draft Genome Sequence of Cyanobacterium sp. Strain IPPAS B-1200 with a Unique Fatty Acid Composition

    PubMed Central

    Starikov, Alexander Y.; Usserbaeva, Aizhan A.; Sinetova, Maria A.; Sarsekeyeva, Fariza K.; Zayadan, Bolatkhan K.; Ustinova, Vera V.; Kupriyanova, Elena V.; Los, Dmitry A.

    2016-01-01

    Here, we report the draft genome of Cyanobacterium sp. IPPAS strain B-1200, isolated from Lake Balkhash, Kazakhstan, and characterized by the unique fatty acid composition of its membrane lipids, which are enriched with myristic and myristoleic acids. The approximate genome size is 3.4 Mb, and the predicted number of coding sequences is 3,119. PMID:27856596

  9. A survey of single nucleotide polymorphisms identified from whole-genome sequencing and their functional effect in the porcine genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genetic variants detected from sequence have been used to successfully identify causal variants and map complex traits in several organisms. High and moderate impact variants, those expected to alter or disrupt the protein coded by a gene and those that regulate protein production, likely have a mor...

  10. CandiSSR: An Efficient Pipeline used for Identifying Candidate Polymorphic SSRs Based on Multiple Assembled Sequences

    PubMed Central

    Xia, En-Hua; Yao, Qiu-Yang; Zhang, Hai-Bin; Jiang, Jian-Jun; Zhang, Li-Ping; Gao, Li-Zhi

    2016-01-01

    Simple sequence repeats (SSRs), also known as microsatellites, are ubiquitous short tandem duplications commonly found in genomes and/or transcriptomes of diverse organisms. They represent one of the most powerful molecular markers for genetic analysis and breeding programs because of their high mutation rate and neutral evolution. However, traditionally experimental screening of the SSR polymorphic status and their subsequent applicability to genetic studies are extremely labor-intensive and time-consuming. Thankfully, the recently decreased costs of next generation sequencing and increasing availability of large genome and/or transcriptome sequences have provided an excellent opportunity and sources for large-scale mining this type of molecular markers. However, current tools are limited. Thus we here developed a new pipeline, CandiSSR, to identify candidate polymorphic SSRs (PolySSRs) based on the multiple assembled sequences. The pipeline allows users to identify putative PolySSRs not only from the transcriptome datasets but also from multiple assembled genome sequences. In addition, two confidence metrics including standard deviation and missing rate of the SSR repetitions are provided to systematically assess the feasibility of the detected PolySSRs for subsequent application to genetic characterization. Meanwhile, primer pairs for each identified PolySSR are also automatically designed and further evaluated by the global sequence similarities of the primer-binding region, ensuring the successful rate of the marker development. Screening rice genomes with CandiSSR and subsequent experimental validation showed an accuracy rate of over 90%. Besides, the application of CandiSSR has successfully identified a large number of PolySSRs in the Arabidopsis genomes and Camellia transcriptomes. CandiSSR and the PolySSR marker sources are publicly available at: http://www.plantkingdomgdb.com/CandiSSR/index.html. PMID:26779212

  11. Sequencing and computational analysis of complete genome sequences of Citrus yellow mosaic badna virus from acid lime and pummelo.

    PubMed

    Borah, Basanta K; Johnson, A M Anthony; Sai Gopal, D V R; Dasgupta, Indranil

    2009-08-01

    Citrus yellow mosaic badna virus (CMBV), a member of the Family Caulimoviridae, Genus Badnavirus, is the causative agent of Citrus mosaic disease in India. Although the virus has been detected in several citrus species, only two full-length genomes, one each from Sweet orange and Rangpur lime, are available in publicly accessible databases. In order to obtain a better understanding of the genetic variability of the virus in other citrus mosaic-affected citrus species, we performed the cloning and sequence analysis of complete genomes of CMBV from two additional citrus species, Acid lime and Pummelo. We show that CMBV genomes from the two hosts share high homology with previously reported CMBV sequences and hence conclude that the new isolates represent variants of the virus present in these species. Based on in silico sequence analysis, we predict the possible function of the protein encoded by one of the five ORFs.

  12. Parvalbumins from coelacanth muscle. III. Amino acid sequence of the major component.

    PubMed

    Jauregui-Adell, J; Pechere, J F

    1978-09-26

    The primary structure of the major parvalbumin (pI = 4.52) from coelacanth muscle (Latimeria chalumnae) has been determined. Sequence analysis of the tryptic peptides, in some cases obtained with beta-trypsin, accounts for the total amino acid content of the protein. Chymotryptic peptides provide appropriate sequence overlaps, to complete the localization of the tryptic peptides. Examination of the amino acid sequence of this protein shows the typical structure of a beta-parvalbumin. Its position in the dendrogram of related calcium-binding proteins corresponds to that usually accepted for crossopterygians.

  13. Analysis of cloned cDNA and genomic sequences for phytochrome: complete amino acid sequences for two gene products expressed in etiolated Avena.

    PubMed Central

    Hershey, H P; Barker, R F; Idler, K B; Lissemore, J L; Quail, P H

    1985-01-01

    Cloned cDNA and genomic sequences have been analyzed to deduce the amino acid sequence of phytochrome from etiolated Avena. Restriction endonuclease site polymorphism between clones indicates that at least four phytochrome genes are expressed in this tissue. Sequence analysis of two complete and one partial coding region shows approximately 98% homology at both the nucleotide and amino acid levels, with the majority of amino acid changes being conservative. High sequence homology is also found in the 5'-untranslated region but significant divergence occurs in the 3'-untranslated region. The phytochrome polypeptides are 1128 amino acid residues long corresponding to a molecular mass of 125 kdaltons. The known protein sequence at the chromophore attachment site occurs only once in the polypeptide, establishing that phytochrome has a single chromophore per monomer covalently linked to Cys-321. Computer analyses of the amino acid sequences have provided predictions regarding a number of structural features of the phytochrome molecule. PMID:3001642

  14. Purification, characterization and partial amino acid sequence of glycogen synthase from Saccharomyces cerevisiae.

    PubMed Central

    Carabaza, A; Arino, J; Fox, J W; Villar-Palasi, C; Guinovart, J J

    1990-01-01

    Glycogen synthase from Saccharomyces cerevisiae was purified to homogeneity. The enzyme showed a subunit molecular mass of 80 kDa. The holoenzyme appears to be a tetramer. Antibodies developed against purified yeast glycogen synthase inactivated the enzyme in yeast extracts and allowed the detection of the protein in Western blots. Amino acid analysis showed that the enzyme is very rich in glutamate and/or glutamine residues. The N-terminal sequence (11 amino acid residues) was determined. In addition, selected tryptic-digest peptides were purified by reverse-phase h.p.l.c. and submitted to gas-phase sequencing. Up to eight sequences (79 amino acid residues) could be aligned with the human muscle enzyme sequence. Levels of identity range between 37 and 100%, indicating that, although human and yeast glycogen synthases probably share some conserved regions, significant differences in their primary structure should be expected. Images Fig. 1. Fig. 2. Fig. 3. PMID:2114092

  15. Amino acid sequence of anionic peroxidase from the windmill palm tree Trachycarpus fortunei.

    PubMed

    Baker, Margaret R; Zhao, Hongwei; Sakharov, Ivan Yu; Li, Qing X

    2014-12-10

    Palm peroxidases are extremely stable and have uncommon substrate specificity. This study was designed to fill in the knowledge gap about the structures of a peroxidase from the windmill palm tree Trachycarpus fortunei. The complete amino acid sequence and partial glycosylation were determined by MALDI-top-down sequencing of native windmill palm tree peroxidase (WPTP), MALDI-TOF/TOF MS/MS of WPTP tryptic peptides, and cDNA sequencing. The propeptide of WPTP contained N- and C-terminal signal sequences which contained 21 and 17 amino acid residues, respectively. Mature WPTP was 306 amino acids in length, and its carbohydrate content ranged from 21% to 29%. Comparison to closely related royal palm tree peroxidase revealed structural features that may explain differences in their substrate specificity. The results can be used to guide engineering of WPTP and its novel applications.

  16. Systems-level metabolic flux profiling identifies fatty acid synthesis as a target for antiviral therapy

    PubMed Central

    Munger, Joshua; Bennett, Bryson D; Parikh, Anuraag; Feng, Xiao-Jiang; McArdle, Jessica; Rabitz, Herschel A; Shenk, Thomas; Rabinowitz, Joshua D

    2010-01-01

    Viruses rely on the metabolic network of their cellular hosts to provide energy and building blocks for viral replication. We developed a flux measurement approach based on liquid chromatography–tandem mass spectrometry to quantify changes in metabolic activity induced by human cytomegalovirus (HCMV). This approach reliably elucidated fluxes in cultured mammalian cells by monitoring metabolome labeling kinetics after feeding cells 13C-labeled forms of glucose and glutamine. Infection with HCMV markedly upregulated flux through much of the central carbon metabolism, including glycolysis. Particularly notable increases occurred in flux through the tricarboxylic acid cycle and its efflux to the fatty acid biosynthesis pathway. Pharmacological inhibition of fatty acid biosynthesis suppressed the replication of both HCMV and influenza A, another enveloped virus. These results show that fatty acid synthesis is essential for the replication of two divergent enveloped viruses and that systems-level metabolic flux profiling can identify metabolic targets for antiviral therapy. PMID:18820684

  17. TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations.

    PubMed

    Abascal, Federico; Zardoya, Rafael; Telford, Maximilian J

    2010-07-01

    We present TranslatorX, a web server designed to align protein-coding nucleotide sequences based on their corresponding amino acid translations. Many comparisons between biological sequences (nucleic acids and proteins) involve the construction of multiple alignments. Alignments represent a statement regarding the homology between individual nucleotides or amino acids within homologous genes. As protein-coding DNA sequences evolve as triplets of nucleotides (codons) and it is known that sequence similarity degrades more rapidly at the DNA than at the amino acid level, alignments are generally more accurate when based on amino acids than on their corresponding nucleotides. TranslatorX novelties include: (i) use of all documented genetic codes and the possibility of assigning different genetic codes for each sequence; (ii) a battery of different multiple alignment programs; (iii) translation of ambiguous codons when possible; (iv) an innovative criterion to clean nucleotide alignments with GBlocks based on protein information; and (v) a rich output, including Jalview-powered graphical visualization of the alignments, codon-based alignments coloured according to the corresponding amino acids, measures of compositional bias and first, second and third codon position specific alignments. The TranslatorX server is freely available at http://translatorx.co.uk.

  18. Unique features of a global human ectoparasite identified through sequencing of the bed bug genome

    PubMed Central

    Benoit, Joshua B.; Adelman, Zach N.; Reinhardt, Klaus; Dolan, Amanda; Poelchau, Monica; Jennings, Emily C.; Szuter, Elise M.; Hagan, Richard W.; Gujar, Hemant; Shukla, Jayendra Nath; Zhu, Fang; Mohan, M.; Nelson, David R.; Rosendale, Andrew J.; Derst, Christian; Resnik, Valentina; Wernig, Sebastian; Menegazzi, Pamela; Wegener, Christian; Peschel, Nicolai; Hendershot, Jacob M.; Blenau, Wolfgang; Predel, Reinhard; Johnston, Paul R.; Ioannidis, Panagiotis; Waterhouse, Robert M.; Nauen, Ralf; Schorn, Corinna; Ott, Mark-Christoph; Maiwald, Frank; Johnston, J. Spencer; Gondhalekar, Ameya D.; Scharf, Michael E.; Peterson, Brittany F.; Raje, Kapil R.; Hottel, Benjamin A.; Armisén, David; Crumière, Antonin Jean Johan; Refki, Peter Nagui; Santos, Maria Emilia; Sghaier, Essia; Viala, Sèverine; Khila, Abderrahman; Ahn, Seung-Joon; Childers, Christopher; Lee, Chien-Yueh; Lin, Han; Hughes, Daniel S. T.; Duncan, Elizabeth J.; Murali, Shwetha C.; Qu, Jiaxin; Dugan, Shannon; Lee, Sandra L.; Chao, Hsu; Dinh, Huyen; Han, Yi; Doddapaneni, Harshavardhan; Worley, Kim C.; Muzny, Donna M.; Wheeler, David; Panfilio, Kristen A.; Vargas Jentzsch, Iris M.; Vargo, Edward L.; Booth, Warren; Friedrich, Markus; Weirauch, Matthew T.; Anderson, Michelle A. E.; Jones, Jeffery W.; Mittapalli, Omprakash; Zhao, Chaoyang; Zhou, Jing-Jiang; Evans, Jay D.; Attardo, Geoffrey M.; Robertson, Hugh M.; Zdobnov, Evgeny M.; Ribeiro, Jose M. C.; Gibbs, Richard A.; Werren, John H.; Palli, Subba R.; Schal, Coby; Richards, Stephen

    2016-01-01

    The bed bug, Cimex lectularius, has re-established itself as a ubiquitous human ectoparasite throughout much of the world during the past two decades. This global resurgence is likely linked to increased international travel and commerce in addition to widespread insecticide resistance. Analyses of the C. lectularius sequenced genome (650 Mb) and 14,220 predicted protein-coding genes provide a comprehensive representation of genes that are linked to traumatic insemination, a reduced chemosensory repertoire of genes related to obligate hematophagy, host–symbiont interactions, and several mechanisms of insecticide resistance. In addition, we document the presence of multiple putative lateral gene transfer events. Genome sequencing and annotation establish a solid foundation for future research on mechanisms of insecticide resistance, human–bed bug and symbiont–bed bug associations, and unique features of bed bug biology that contribute to the unprecedented success of C. lectularius as a human ectoparasite. PMID:26836814

  19. Genome Sequence of Erythromelalgia-Related Poxvirus Identifies it as an Ectromelia Virus Strain

    PubMed Central

    Mendez-Rios, Jorge D.; Martens, Craig A.; Bruno, Daniel P.; Porcella, Stephen F.; Zheng, Zhi-Ming; Moss, Bernard

    2012-01-01

    Erythromelagia is a condition characterized by attacks of burning pain and inflammation in the extremeties. An epidemic form of this syndrome occurs in secondary students in rural China and a virus referred to as erythromelalgia-associated poxvirus (ERPV) was reported to have been recovered from throat swabs in 1987. Studies performed at the time suggested that ERPV belongs to the orthopoxvirus genus and has similarities with ectromelia virus, the causative agent of mousepox. We have determined the complete genome sequence of ERPV and demonstrated that it has 99.8% identity to the Naval strain of ectromelia virus and a slighly lower identity to the Moscow strain. Small DNA deletions in the Naval genome that are absent from ERPV may suggest that the sequenced strain of Naval was not the immediate progenitor of ERPV. PMID:22558090

  20. Complete genome sequence of a novel monopartite geminivirus identified in mulberry (Morus alba L.).

    PubMed

    Lu, Quan-You; Wu, Zu-Jian; Xia, Zhi-Song; Xie, Lian-Hui

    2015-08-01

    The genome sequence of a novel geminivirus from mulberry samples exhibiting crinkle leaf symptoms is reported. The sequence consisted of 2952 nt, containing four open reading frames (ORFs) in the viral-sense strand and two ORFs in the complementary-sense strand. The size of the genome and the conserved origin of replication are similar to those of members of the family Geminiviridae, but the genomic organization, number of ORFs, and especially five contiguous GAAAAA repeats positioned upstream of ORF1 distinguish it from other geminiviruses. Phylogenetic analysis coupled with ORF analysis suggests that this is a novel virus that does not fit into the established seven genera of the family Geminiviridae. The virus, found in Zhenjiang, Jiangsu province, China, is tentatively named mulberry crinkle leaf virus isolate Jiangsu (MCLV-js).

  1. Unique features of a global human ectoparasite identified through sequencing of the bed bug genome.

    PubMed

    Benoit, Joshua B; Adelman, Zach N; Reinhardt, Klaus; Dolan, Amanda; Poelchau, Monica; Jennings, Emily C; Szuter, Elise M; Hagan, Richard W; Gujar, Hemant; Shukla, Jayendra Nath; Zhu, Fang; Mohan, M; Nelson, David R; Rosendale, Andrew J; Derst, Christian; Resnik, Valentina; Wernig, Sebastian; Menegazzi, Pamela; Wegener, Christian; Peschel, Nicolai; Hendershot, Jacob M; Blenau, Wolfgang; Predel, Reinhard; Johnston, Paul R; Ioannidis, Panagiotis; Waterhouse, Robert M; Nauen, Ralf; Schorn, Corinna; Ott, Mark-Christoph; Maiwald, Frank; Johnston, J Spencer; Gondhalekar, Ameya D; Scharf, Michael E; Peterson, Brittany F; Raje, Kapil R; Hottel, Benjamin A; Armisén, David; Crumière, Antonin Jean Johan; Refki, Peter Nagui; Santos, Maria Emilia; Sghaier, Essia; Viala, Sèverine; Khila, Abderrahman; Ahn, Seung-Joon; Childers, Christopher; Lee, Chien-Yueh; Lin, Han; Hughes, Daniel S T; Duncan, Elizabeth J; Murali, Shwetha C; Qu, Jiaxin; Dugan, Shannon; Lee, Sandra L; Chao, Hsu; Dinh, Huyen; Han, Yi; Doddapaneni, Harshavardhan; Worley, Kim C; Muzny, Donna M; Wheeler, David; Panfilio, Kristen A; Vargas Jentzsch, Iris M; Vargo, Edward L; Booth, Warren; Friedrich, Markus; Weirauch, Matthew T; Anderson, Michelle A E; Jones, Jeffery W; Mittapalli, Omprakash; Zhao, Chaoyang; Zhou, Jing-Jiang; Evans, Jay D; Attardo, Geoffrey M; Robertson, Hugh M; Zdobnov, Evgeny M; Ribeiro, Jose M C; Gibbs, Richard A; Werren, John H; Palli, Subba R; Schal, Coby; Richards, Stephen

    2016-02-02

    The bed bug, Cimex lectularius, has re-established itself as a ubiquitous human ectoparasite throughout much of the world during the past two decades. This global resurgence is likely linked to increased international travel and commerce in addition to widespread insecticide resistance. Analyses of the C. lectularius sequenced genome (650 Mb) and 14,220 predicted protein-coding genes provide a comprehensive representation of genes that are linked to traumatic insemination, a reduced chemosensory repertoire of genes related to obligate hematophagy, host-symbiont interactions, and several mechanisms of insecticide resistance. In addition, we document the presence of multiple putative lateral gene transfer events. Genome sequencing and annotation establish a solid foundation for future research on mechanisms of insecticide resistance, human-bed bug and symbiont-bed bug associations, and unique features of bed bug biology that contribute to the unprecedented success of C. lectularius as a human ectoparasite.

  2. Genome sequence of erythromelalgia-related poxvirus identifies it as an ectromelia virus strain.

    PubMed

    Mendez-Rios, Jorge D; Martens, Craig A; Bruno, Daniel P; Porcella, Stephen F; Zheng, Zhi-Ming; Moss, Bernard

    2012-01-01

    Erythromelagia is a condition characterized by attacks of burning pain and inflammation in the extremeties. An epidemic form of this syndrome occurs in secondary students in rural China and a virus referred to as erythromelalgia-associated poxvirus (ERPV) was reported to have been recovered from throat swabs in 1987. Studies performed at the time suggested that ERPV belongs to the orthopoxvirus genus and has similarities with ectromelia virus, the causative agent of mousepox. We have determined the complete genome sequence of ERPV and demonstrated that it has 99.8% identity to the Naval strain of ectromelia virus and a slighly lower identity to the Moscow strain. Small DNA deletions in the Naval genome that are absent from ERPV may suggest that the sequenced strain of Naval was not the immediate progenitor of ERPV.

  3. Amino acid sequence of homologous rat atrial peptides: natriuretic activity of native and synthetic forms.

    PubMed Central

    Seidah, N G; Lazure, C; Chrétien, M; Thibault, G; Garcia, R; Cantin, M; Genest, J; Nutt, R F; Brady, S F; Lyle, T A

    1984-01-01

    A substance called atrial natriuretic factor (ANF), localized in secretory granules of atrial cardiocytes, was isolated as four homologous natriuretic peptides from homogenates of rat atria. The complete sequence of the longest form showed that it is composed of 33 amino acids. The three other shorter forms (2-33, 3-33, and 8-33) represent amino-terminally truncated versions of the 33 amino acid parent molecule as shown by analysis of sequence, amino acid composition, or both. The proposed primary structure agrees entirely with the amino acid composition and reveals no significant sequence homology with any known protein or segment of protein. The short form ANF-(8-33) was synthesized by a multi-fragment condensation approach and the synthetic product was shown to exhibit specific activity comparable to that of the natural ANF-(3-33). PMID:6232612

  4. Nucleotide and deduced amino acid sequences of a new subtilisin from an alkaliphilic Bacillus isolate.

    PubMed

    Saeki, Katsuhisa; Magallones, Marietta V; Takimura, Yasushi; Hatada, Yuji; Kobayashi, Tohru; Kawai, Shuji; Ito, Susumu

    2003-10-01

    The gene for a new subtilisin from the alkaliphilic Bacillus sp. KSM-LD1 was cloned and sequenced. The open reading frame of the gene encoded a 97 amino-acid prepro-peptide plus a 307 amino-acid mature enzyme that contained a possible catalytic triad of residues, Asp32, His66, and Ser224. The deduced amino acid sequence of the mature enzyme (LD1) showed approximately 65% identity to those of subtilisins SprC and SprD from alkaliphilic Bacillus sp. LG12. The amino acid sequence identities of LD1 to those of previously reported true subtilisins and high-alkaline proteases were below 60%. LD1 was characteristically stable during incubation with surfactants and chemical oxidants. Interestingly, an oxidizable Met residue is located next to the catalytic Ser224 of the enzyme as in the cases of the oxidation-susceptible subtilisins reported to date.

  5. Burkholderia pseudomallei sequencing identifies genomic clades with distinct recombination, accessory, and epigenetic profiles

    PubMed Central

    Nandi, Tannistha; Holden, Matthew T.G.; Didelot, Xavier; Mehershahi, Kurosh; Boddey, Justin A.; Beacham, Ifor; Peak, Ian; Harting, John; Baybayan, Primo; Guo, Yan; Wang, Susana; How, Lee Chee; Sim, Bernice; Essex-Lopresti, Angela; Sarkar-Tyson, Mitali; Nelson, Michelle; Smither, Sophie; Ong, Catherine; Aw, Lay Tin; Hoon, Chua Hui; Michell, Stephen; Studholme, David J.; Titball, Richard; Chen, Swaine L.; Parkhill, Julian

    2015-01-01

    Burkholderia pseudomallei (Bp) is the causative agent of the infectious disease melioidosis. To investigate population diversity, recombination, and horizontal gene transfer in closely related Bp isolates, we performed whole-genome sequencing (WGS) on 106 clinical, animal, and environmental strains from a restricted Asian locale. Whole-genome phylogenies resolved multiple genomic clades of Bp, largely congruent with multilocus sequence typing (MLST). We discovered widespread recombination in the Bp core genome, involving hundreds of regions associated with multiple haplotypes. Highly recombinant regions exhibited functional enrichments that may contribute to virulence. We observed clade-specific patterns of recombination and accessory gene exchange, and provide evidence that this is likely due to ongoing recombination between clade members. Reciprocally, interclade exchanges were rarely observed, suggesting mechanisms restricting gene flow between clades. Interrogation of accessory elements revealed that each clade harbored a distinct complement of restriction-modification (RM) systems, predicted to cause clade-specific patterns of DNA methylation. Using methylome sequencing, we confirmed that representative strains from separate clades indeed exhibit distinct methylation profiles. Finally, using an E. coli system, we demonstrate that Bp RM systems can inhibit uptake of non-self DNA. Our data suggest that RM systems borne on mobile elements, besides preventing foreign DNA invasion, may also contribute to limiting exchanges of genetic material between individuals of the same species. Genomic clades may thus represent functional units of genetic isolation in Bp, modulating intraspecies genetic diversity. PMID:25236617

  6. Burkholderia pseudomallei sequencing identifies genomic clades with distinct recombination, accessory, and epigenetic profiles.

    PubMed

    Nandi, Tannistha; Holden, Matthew T G; Holden, Mathew T G; Didelot, Xavier; Mehershahi, Kurosh; Boddey, Justin A; Beacham, Ifor; Peak, Ian; Harting, John; Baybayan, Primo; Guo, Yan; Wang, Susana; How, Lee Chee; Sim, Bernice; Essex-Lopresti, Angela; Sarkar-Tyson, Mitali; Nelson, Michelle; Smither, Sophie; Ong, Catherine; Aw, Lay Tin; Hoon, Chua Hui; Michell, Stephen; Studholme, David J; Titball, Richard; Chen, Swaine L; Parkhill, Julian; Tan, Patrick

    2015-01-01

    Burkholderia pseudomallei (Bp) is the causative agent of the infectious disease melioidosis. To investigate population diversity, recombination, and horizontal gene transfer in closely related Bp isolates, we performed whole-genome sequencing (WGS) on 106 clinical, animal, and environmental strains from a restricted Asian locale. Whole-genome phylogenies resolved multiple genomic clades of Bp, largely congruent with multilocus sequence typing (MLST). We discovered widespread recombination in the Bp core genome, involving hundreds of regions associated with multiple haplotypes. Highly recombinant regions exhibited functional enrichments that may contribute to virulence. We observed clade-specific patterns of recombination and accessory gene exchange, and provide evidence that this is likely due to ongoing recombination between clade members. Reciprocally, interclade exchanges were rarely observed, suggesting mechanisms restricting gene flow between clades. Interrogation of accessory elements revealed that each clade harbored a distinct complement of restriction-modification (RM) systems, predicted to cause clade-specific patterns of DNA methylation. Using methylome sequencing, we confirmed that representative strains from separate clades indeed exhibit distinct methylation profiles. Finally, using an E. coli system, we demonstrate that Bp RM systems can inhibit uptake of non-self DNA. Our data suggest that RM systems borne on mobile elements, besides preventing foreign DNA invasion, may also contribute to limiting exchanges of genetic material between individuals of the same species. Genomic clades may thus represent functional units of genetic isolation in Bp, modulating intraspecies genetic diversity.

  7. Whole genome sequencing to identify host genetic risk factors for severe outcomes of hepatitis a virus infection.

    PubMed

    Long, Dustin; Fix, Oren K; Deng, Xutao; Seielstad, Mark; Lauring, Adam S

    2014-10-01

    Acute liver failure is a severe, but rare, outcome of hepatitis A virus infection. Unusual presentations of prevalent infections have often been attributed to pathogen-specific immune deficits that exhibit Mendelian inheritance. Genome-wide resequencing of unrelated cases has proven to be a powerful approach for identifying highly penetrant risk alleles that underlie such syndromes. Rare mutations likely to affect protein expression or function can be identified from sequence data, and their association with a similarly rare phenotype rests on their existence in multiple affected individuals. A rare or novel sequence variant that is enriched to a significant degree in a genetically diverse cohort suggests a candidate susceptibility allele. Whole genome sequencing of ten individuals from ethnically diverse backgrounds with HAV-associated acute liver failure was performed. A set of rational filtering criteria was used to identify genetic variants that are rare in the population, but enriched in this cohort. Single nucleotide polymorphisms, insertions, and deletions were considered and autosomal dominant, autosomal recessive, and polygenic models were applied. Analysis of the protein-coding exome identified no single gene with putatively deleterious mutations shared by multiple individuals, arguing against a simple Mendelian model of inheritance. A number of rare variants were significantly enriched in this cohort, consistent with a complex and genetically heterogeneous trait. Several of the variants identified in this genome-wide study lie within genes important to hepatic pathophysiology and are candidate susceptibility alleles for hepatitis A virus infection.

  8. Shark myelin basic protein: amino acid sequence, secondary structure, and self-association.

    PubMed

    Milne, T J; Atkins, A R; Warren, J A; Auton, W P; Smith, R

    1990-09-01

    Myelin basic protein (MBP) from the Whaler shark (Carcharhinus obscurus) has been purified from acid extracts of a chloroform/methanol pellet from whole brains. The amino acid sequence of the majority of the protein has been determined and compared with the sequences of other MBPs. The shark protein has only 44% homology with the bovine protein, but, in common with other MBPs, it has basic residues distributed throughout the sequence and no extensive segments that are predicted to have an ordered secondary structure in solution. Shark MBP lacks the triproline sequence previously postulated to form a hairpin bend in the molecule. The region containing the putative consensus sequence for encephalitogenicity in the guinea pig contains several substitutions, thus accounting for the lack of activity of the shark protein. Studies of the secondary structure and self-association have shown that shark MBP possesses solution properties similar to those of the bovine protein, despite the extensive differences in primary structure.

  9. Leaf Transcriptome Sequencing for Identifying Genic-SSR Markers and SNP Heterozygosity in Crossbred Mango Variety 'Amrapali' (Mangifera indica L.).

    PubMed

    Mahato, Ajay Kumar; Sharma, Nimisha; Singh, Akshay; Srivastav, Manish; Jaiprakash; Singh, Sanjay Kumar; Singh, Anand Kumar; Sharma, Tilak Raj; Singh, Nagendra Kumar

    2016-01-01

    Mango (Mangifera indica L.) is called "king of fruits" due to its sweetness, richness of taste, diversity, large production volume and a variety of end usage. Despite its huge economic importance genomic resources in mango are scarce and genetics of useful horticultural traits are poorly understood. Here we generated deep coverage leaf RNA sequence data for mango parental varieties 'Neelam', 'Dashehari' and their hybrid 'Amrapali' using next generation sequencing technologies. De-novo sequence assembly generated 27,528, 20,771 and 35,182 transcripts for the three genotypes, respectively. The transcripts were further assembled into a non-redundant set of 70,057 unigenes that were used for SSR and SNP identification and annotation. Total 5,465 SSR loci were identified in 4,912 unigenes with 288 type I SSR (n ≥ 20 bp). One hundred type I SSR markers were randomly selected of which 43 yielded PCR amplicons of expected size in the first round of validation and were designated as validated genic-SSR markers. Further, 22,306 SNPs were identified by aligning high quality sequence reads of the three mango varieties to the reference unigene set, revealing significantly enhanced SNP heterozygosity in the hybrid Amrapali. The present study on leaf RNA sequencing of mango varieties and their hybrid provides useful genomic resource for genetic improvement of mango.

  10. Complete genome sequence of Enterococcus mundtii QU 25, an efficient L-(+)-lactic acid-producing bacterium.

    PubMed

    Shiwa, Yuh; Yanase, Hiroaki; Hirose, Yuu; Satomi, Shohei; Araya-Kojima, Tomoko; Watanabe, Satoru; Zendo, Takeshi; Chibazakura, Taku; Shimizu-Kadota, Mariko; Yoshikawa, Hirofumi; Sonomoto, Kenji

    2014-08-01

    Enterococcus mundtii QU 25, a non-dairy bacterial strain of ovine faecal origin, can ferment both cellobiose and xylose to produce l-lactic acid. The use of this strain is highly desirable for economical l-lactate production from renewable biomass substrates. Genome sequence determination is necessary for the genetic improvement of this strain. We report the complete genome sequence of strain QU 25, primarily determined using Pacific Biosciences sequencing technology. The E. mundtii QU 25 genome comprises a 3 022 186-bp single circular chromosome (GC content, 38.6%) and five circular plasmids: pQY182, pQY082, pQY039, pQY024, and pQY003. In all, 2900 protein-coding sequences, 63 tRNA genes, and 6 rRNA operons were predicted in the QU 25 chromosome. Plasmid pQY024 harbours genes for mundticin production. We found that strain QU 25 produces a bacteriocin, suggesting that mundticin-encoded genes on plasmid pQY024 were functional. For lactic acid fermentation, two gene clusters were identified-one involved in the initial metabolism of xylose and uptake of pentose and the second containing genes for the pentose phosphate pathway and uptake of related sugars. This is the first complete genome sequence of an E. mundtii strain. The data provide insights into lactate production in this bacterium and its evolution among enterococci.

  11. Complete cDNA and derived amino acid sequence of human factor V

    SciTech Connect

    Jenny, R.J.; Pittman, D.D.; Toole, J.J.; Kriz, R.W.; Aldape, R.A.; Hewick, R.M.; Kaufman, R.J.; Mann, K.G.

    1987-07-01

    cDNA clones encoding human factor V have been isolated from an oligo(dT)-primed human fetal liver cDNA library prepared with vector Charon 21A. The cDNA sequence of factor V from three overlapping clones includes a 6672-base-pair (bp) coding region, a 90-bp 5' untranslated region, and a 163-bp 3' untranslated region within which is a poly(A)tail. The deduced amino acid sequence consists of 2224 amino acids inclusive of a 28-amino acid leader peptide. Direct comparison with human factor VIII reveals considerable homology between proteins in amino acid sequence and domain structure: a triplicated A domain and duplicated C domain show approx. 40% identity with the corresponding domains in factor VIII. As in factor VIII, the A domains of factor V share approx. 40% amino acid-sequence homology with the three highly conserved domains in ceruloplasmin. The B domain of factor V contains 35 tandem and approx. 9 additional semiconserved repeats of nine amino acids of the form Asp-Leu-Ser-Gln-Thr-Thr/Asn-Leu-Ser-Pro and 2 additional semiconserved repeats of 17 amino acids. Factor V contains 37 potential N-linked glycosylation sites, 25 of which are in the B domain, and a total of 19 cysteine residues.

  12. The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color

    PubMed Central

    2013-01-01

    Background Theobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated cacao type. The availability of its genome sequence and methods for identifying genes responsible for important cacao traits will aid cacao researchers and breeders. Results We describe the sequencing and assembly of the genome of Theobroma cacao L. cultivar Matina 1-6. The genome of the Matina 1-6 cultivar is 445 Mbp, which is significantly larger than a sequenced Criollo cultivar, and more typical of other cultivars. The chromosome-scale assembly, version 1.1, contains 711 scaffolds covering 346.0 Mbp, with a contig N50 of 84.4 kbp, a scaffold N50 of 34.4 Mbp, and an evidence-based gene set of 29,408 loci. Version 1.1 has 10x the scaffold N50 and 4x the contig N50 as Criollo, and includes 111 Mb more anchored sequence. The version 1.1 assembly has 4.4% gap sequence, while Criollo has 10.9%. Through a combination of haplotype, association mapping and gene expression analyses, we leverage this robust reference genome to identify a promising candidate gene responsible for pod color variation. We demonstrate that green/red pod color in cacao is likely regulated by the R2R3 MYB transcription factor TcMYB113, homologs of which determine pigmentation in Rosaceae, Solanaceae, and Brassicaceae. One SNP within the target site for a highly conserved trans-acting siRNA in dicots, found within TcMYB113, seems to affect transcript levels of this gene and therefore pod color variation. Conclusions We report a high-quality sequence and annotation of Theobroma cacao L. and demonstrate its utility in identifying candidate genes regulating traits. PMID:23731509

  13. Exome sequencing identifies a DNAJB6 mutation in a family with dominantly-inherited limb-girdle muscular dystrophy.

    PubMed

    Couthouis, Julien; Raphael, Alya R; Siskind, Carly; Findlay, Andrew R; Buenrostro, Jason D; Greenleaf, William J; Vogel, Hannes; Day, John W; Flanigan, Kevin M; Gitler, Aaron D

    2014-05-01

    Limb-girdle muscular dystrophy primarily affects the muscles of the hips and shoulders (the "limb-girdle" muscles), although it is a heterogeneous disorder that can present with varying symptoms. There is currently no cure. We sought to identify the genetic basis of limb-girdle muscular dystrophy type 1 in an American family of Northern European descent using exome sequencing. Exome sequencing was performed on DNA samples from two affected siblings and one unaffected sibling and resulted in the identification of eleven candidate mutations that co-segregated with the disease. Notably, this list included a previously reported mutation in DNAJB6, p.Phe89Ile, which was recently identified as a cause of limb-girdle muscular dystrophy type 1D. Additional family members were Sanger sequenced and the mutation in DNAJB6 was only found in affected individuals. Subsequent haplotype analysis indicated that this DNAJB6 p.Phe89Ile mutation likely arose independently of the previously reported mutation. Since other published mutations are located close by in the G/F domain of DNAJB6, this suggests that the area may represent a mutational hotspot. Exome sequencing provided an unbiased and effective method for identifying the genetic etiology of limb-girdle muscular dystrophy type 1 in a previously genetically uncharacterized family. This work further confirms the causative role of DNAJB6 mutations in limb-girdle muscular dystrophy type 1D.

  14. KinasePhos 2.0: a web server for identifying protein kinase-specific phosphorylation sites based on sequences and coupling patterns.

    PubMed

    Wong, Yung-Hao; Lee, Tzong-Yi; Liang, Han-Kuen; Huang, Chia-Mao; Wang, Ting-Yuan; Yang, Yi-Huan; Chu, Chia-Huei; Huang, Hsien-Da; Ko, Ming-Tat; Hwang, Jenn-Kang

    2007-07-01

    Due to the importance of protein phosphorylation in cellular control, many researches are undertaken to predict the kinase-specific phosphorylation sites. Referred to our previous work, KinasePhos 1.0, incorporated profile hidden Markov model (HMM) with flanking residues of the kinase-specific phosphorylation sites. Herein, a new web server, KinasePhos 2.0, incorporates support vector machines (SVM) with the protein sequence profile and protein coupling pattern, which is a novel feature used for identifying phosphorylation sites. The coupling pattern [XdZ] denotes the amino acid coupling-pattern of amino acid types X and Z that are separated by d amino acids. The differences or quotients of coupling strength C(XdZ) between the positive set of phosphorylation sites and the background set of whole protein sequences from Swiss-Prot are computed to determine the number of coupling patterns for training SVM models. After the evaluation based on k-fold cross-validation and Jackknife cross-validation, the average predictive accuracy of phosphorylated serine, threonine, tyrosine and histidine are 90, 93, 88 and 93%, respectively. KinasePhos 2.0 performs better than other tools previously developed. The proposed web server is freely available at http://KinasePhos2.mbc.nctu.edu.tw/.

  15. An analysis of amino acid sequences surrounding archaeal glycoprotein sequons.

    PubMed

    Abu-Qarn, Mehtap; Eichler, Jerry

    2007-05-01

    Despite having provided the first example of a prokaryal glycoprotein, little is known of the rules governing the N-glycosylation process in Archaea. As in Eukarya and Bacteria, archaeal N-glycosylation takes place at the Asn residues of Asn-X-Ser/Thr sequons. Since not all sequons are utilized, it is clear that other factors, including the context in which a sequon exists, affect glycosylation efficiency. As yet, the contribution to N-glycosylation made by sequon-bordering residues and other related factors in Archaea remains unaddressed. In the following, the surroundings of Asn residues confirmed by experiment as modified were analyzed in an attempt to define sequence rules and requirements for archaeal N-glycosylation.

  16. Genomic Profile of Chronic Lymphocytic Leukemia in Korea Identified by Targeted Sequencing

    PubMed Central

    Park, Si Nae; Huh, Sunghoon; Im, Kyongok; Choi, Sungbin; Chung, Hye Yoon; Huh, JooRyung; Seo, Eul-Ju; Lee, Je-Hwan; Bang, Duhee; Lee, Dong Soon

    2016-01-01

    Chronic lymphocytic leukemia (CLL) is extremely rare in Asian countries and there has been one report on genetic changes for 5 genes (TP53, SF3B1, NOTCH1, MYD88, and BIRC3) by Sanger sequencing in Chinese CLL. Yet studies of CLL in Asian countries using Next generation sequencing have not been reported. We aimed to characterize the genomic profiles of Korean CLL and to find out ethnic differences in somatic mutations with prognostic implications. We performed targeted sequencing for 87 gene panel using next-generation sequencing along with G-banding and fluorescent in situ hybridization (FISH) for chromosome 12, 13q14.3 deletion, 17p13 deletion, and 11q22 deletion. Overall, 36 out of 48 patients (75%) harbored at least one mutation and mean number of mutation per patient was 1.6 (range 0–6). Aberrant karyotypes were observed in 30.4% by G-banding and 66.7% by FISH. Most recurrent mutation (>10% frequency) was ATM (20.8%) followed by TP53 (14.6%), SF3B1 (10.4%), KLHL6 (8.3%), and BCOR (6.25%). Mutations of MYD88 was associated with moderate adverse prognosis by multiple comparisons (P = 0.055). Mutation frequencies of MYD88, SAMHD1, EGR2, DDX3X, ZMYM3, and MED12 showed similar incidence with Caucasians, while mutation frequencies of ATM, TP53, KLHL6, BCOR and CDKN2A tend to be higher in Koreans than in Caucasians. Especially, ATM mutation showed 1.5 fold higher incidence than Caucasians, while mutation frequencies of SF3B1, NOTCH1, CHD2 and POT1 tend to be lower in Koreans than in Caucasians. However, mutation frequencies between Caucasians and Koreans were not significantly different statistically, probably due to low number of patients. Collectively, mutational profile and adverse prognostic genes in Korean CLL were different from those of Caucasians, suggesting an ethnic difference, while profile of cytogenetic aberrations was similar to those of Caucasians. PMID:27959900

  17. A simple ligation-based method to increase the information density in sequencing reactions used to deconvolute nucleic acid selections

    PubMed Central

    Childs-Disney, Jessica L.; Disney, Matthew D.

    2008-01-01

    Herein, a method is described to increase the information density of sequencing experiments used to deconvolute nucleic acid selections. The method is facile and should be applicable to any selection experiment. A critical feature of this method is the use of biotinylated primers to amplify and encode a BamHI restriction site on both ends of a PCR product. After amplification, the PCR reaction is captured onto streptavidin resin, washed, and digested directly on the resin. Resin-based digestion affords clean product that is devoid of partially digested products and unincorporated PCR primers. The product's complementary ends are annealed and ligated together with T4 DNA ligase. Analysis of ligation products shows formation of concatemers of different length and little detectable monomer. Sequencing results produced data that routinely contained three to four copies of the library. This method allows for more efficient formulation of structure-activity relationships since multiple active sequences are identified from a single clone. PMID:18065718

  18. A new HLA-B*51 variant, B*5158, identified by sequence-based typing in a Korean individual.

    PubMed

    Roh, E Y; Shin, S; Yoon, J H; Ahn, B M; Chang, J Y

    2009-04-01

    A novel human leukocyte antigen (HLA)-B*51 allele, officially named HLA-B*5158, was identified in the cord blood from Korean. HLA-B*5158 allele shows single nucleotide difference from B*510101 in exon 2 at nucleotide position 214 (C/T), resulting in an amino acid substitution, Trp48Arg.

  19. Disease-targeted sequencing of ion channel genes identifies de novo mutations in patients with non-familial Brugada syndrome.

    PubMed

    Juang, Jyh-Ming Jimmy; Lu, Tzu-Pin; Lai, Liang-Chuan; Ho, Chia-Chuan; Liu, Yen-Bin; Tsai, Chia-Ti; Lin, Lian-Yu; Yu, Chih-Chieh; Chen, Wen-Jone; Chiang, Fu-Tien; Yeh, Shih-Fan Sherri; Lai, Ling-Ping; Chuang, Eric Y; Lin, Jiunn-Lee

    2014-10-23

    Brugada syndrome (BrS) is one of the ion channelopathies associated with sudden cardiac death (SCD). The most common BrS-associated gene (SCN5A) only accounts for approximately 20-25% of BrS patients. This study aims to identify novel mutations across human ion channels in non-familial BrS patients without SCN5A variants through disease-targeted sequencing. We performed disease-targeted multi-gene sequencing across 133 human ion channel genes and 12 reported BrS-associated genes in 15 unrelated, non-familial BrS patients without SCN5A variants. Candidate variants were validated by mass spectrometry and Sanger sequencing. Five de novo mutations were identified in four genes (SCNN1A, KCNJ16, KCNB2, and KCNT1) in three BrS patients (20%). Two of the three patients presented SCD and one had syncope. Interestingly, the two patients presented with SCD had compound mutations (SCNN1A:Arg350Gln and KCNB2:Glu522Lys; SCNN1A:Arg597* and KCNJ16:Ser261Gly). Importantly, two SCNN1A mutations were identified from different families. The KCNT1:Arg1106Gln mutation was identified in a patient with syncope. Bioinformatics algorithms predicted severe functional interruptions in these four mutation loci, suggesting their pivotal roles in BrS. This study identified four novel BrS-associated genes and indicated the effectiveness of this disease-targeted sequencing across ion channel genes for non-familial BrS patients without SCN5A variants.

  20. Transcriptome characterization by RNA sequencing identifies a major molecular and clinical subdivision in chronic lymphocytic leukemia

    PubMed Central

    Ferreira, Pedro G.; Jares, Pedro; Rico, Daniel; Gómez-López, Gonzalo; Martínez-Trillos, Alejandra; Villamor, Neus; Ecker, Simone; González-Pérez, Abel; Knowles, David G.; Monlong, Jean; Johnson, Rory; Quesada, Victor; Djebali, Sarah; Papasaikas, Panagiotis; López-Guerra, Mónica; Colomer, Dolors; Royo, Cristina; Cazorla, Maite; Pinyol, Magda; Clot, Guillem; Aymerich, Marta; Rozman, Maria; Kulis, Marta; Tamborero, David; Gouin, Anaïs; Blanc, Julie; Gut, Marta; Gut, Ivo; Puente, Xose S.; Pisano, David G.; Martin-Subero, José Ignacio; López-Bigas, Nuria; López-Guillermo, Armando; Valencia, Alfonso; López-Otín, Carlos; Campo, Elías; Guigó, Roderic

    2014-01-01

    Chronic lymphocytic leukemia (CLL) has heterogeneous clinical and biological behavior. Whole-genome and -exome sequencing has contributed to the characterization of the mutational spectrum of the disease, but the underlying transcriptional profile is still poorly understood. We have performed deep RNA sequencing in different subpopulations of normal B-lymphocytes and CLL cells from a cohort of 98 patients, and characterized the CLL transcriptional landscape with unprecedented resolution. We detected thousands of transcriptional elements differentially expressed between the CLL and normal B cells, including protein-coding genes, noncoding RNAs, and pseudogenes. Transposable elements are globally derepressed in CLL cells. In addition, two thousand genes—most of which are not differentially expressed—exhibit CLL-specific splicing patterns. Genes involved in metabolic pathways showed higher expression in CLL, while genes related to spliceosome, proteasome, and ribosome were among the most down-regulated in CLL. Clustering of the CLL samples according to RNA-seq derived gene expression levels unveiled two robust molecular subgroups, C1 and C2. C1/C2 subgroups and the mutational status of the immunoglobulin heavy variable (IGHV) region were the only independent variables in predicting time to treatment in a multivariate analysis with main clinico-biological features. This subdivision was validated in an independent cohort of patients monitored through DNA microarrays. Further analysis shows that B-cell receptor (BCR) activation in the microenvironment of the lymph node may be at the origin of the C1/C2 differences. PMID:24265505

  1. Exome sequencing identifies highly recurrent MED12 somatic mutations in breast fibroadenoma.

    PubMed

    Lim, Weng Khong; Ong, Choon Kiat; Tan, Jing; Thike, Aye Aye; Ng, Cedric Chuan Young; Rajasegaran, Vikneswari; Myint, Swe Swe; Nagarajan, Sanjanaa; Nasir, Nur Diyana Md; McPherson, John R; Cutcutache, Ioana; Poore, Gregory; Tay, Su Ting; Ooi, Wei Siong; Tan, Veronique Kiak Mien; Hartman, Mikael; Ong, Kong Wee; Tan, Benita K T; Rozen, Steven G; Tan, Puay Hoon; Tan, Patrick; Teh, Bin Tean

    2014-08-01

    Fibroadenomas are the most common breast tumors in women under 30 (refs. 1,2). Exome sequencing of eight fibroadenomas with matching whole-blood samples revealed recurrent somatic mutations solely in MED12, which encodes a Mediator complex subunit. Targeted sequencing of an additional 90 fibroadenomas confirmed highly frequent MED12 exon 2 mutations (58/98, 59%) that are probably somatic, with 71% of mutations occurring in codon 44. Using laser capture microdissection, we show that MED12 fibroadenoma mutations are present in stromal but not epithelial mammary cells. Expression profiling of MED12-mutated and wild-type fibroadenomas revealed that MED12 mutations are associated with dysregulated estrogen signaling and extracellular matrix organization. The fibroadenoma MED12 mutation spectrum is nearly identical to that of previously reported MED12 lesions in uterine leiomyoma but not those of other tumors. Benign tumors of the breast and uterus, both of which are key target tissues of estrogen, may thus share a common genetic basis underpinned by highly frequent and specific MED12 mutations.

  2. Transcriptome sequencing identifies novel persistent viruses in herbicide resistant wild-grasses.

    PubMed

    Sabbadin, Federico; Glover, Rachel; Stafford, Rebecca; Rozado-Aguirre, Zuriñe; Boonham, Neil; Adams, Ian; Mumford, Rick; Edwards, Robert

    2017-02-06

    Herbicide resistance in wild grasses is widespread in the UK, with non-target site resistance (NTSR) to multiple chemistries being particularly problematic in weed control. As a complex trait, NTSR is driven by complex evolutionary pressures and the growing awareness of the role of the phytobiome in plant abiotic stress tolerance, led us to sequence the transcriptomes of herbicide resistant and susceptible populations of black-grass and annual rye-grass for the presence of endophytes. Black-grass (Alopecurus myosuroides; Am) populations, displaying no overt disease symptoms, contained three previously undescribed viruses belonging to the Partititiviridae (AMPV1 and AMPV2) and Rhabdoviridae (AMVV1) families. These infections were widespread in UK black-grass populations and evidence was obtained for similar viruses being present in annual rye grass (Lolium rigidum), perennial rye-grass (Lolium perenne) and meadow fescue (Festuca pratensis). In black-grass, while no direct causative link was established linking viral infection to herbicide resistance, transcriptome sequencing showed a high incidence of infection in the NTSR Peldon population. The widespread infection of these weeds by little characterised and persistent viruses and their potential evolutionary role in enhancing plant stress tolerance mechanisms including NTSR warrants further investigation.

  3. Transcriptome sequencing identifies novel persistent viruses in herbicide resistant wild-grasses

    PubMed Central

    Sabbadin, Federico; Glover, Rachel; Stafford, Rebecca; Rozado-Aguirre, Zuriñe; Boonham, Neil; Adams, Ian; Mumford, Rick; Edwards, Robert

    2017-01-01

    Herbicide resistance in wild grasses is widespread in the UK, with non-target site resistance (NTSR) to multiple chemistries being particularly problematic in weed control. As a complex trait, NTSR is driven by complex evolutionary pressures and the growing awareness of the role of the phytobiome in plant abiotic stress tolerance, led us to sequence the transcriptomes of herbicide resistant and susceptible populations of black-grass and annual rye-grass for the presence of endophytes. Black-grass (Alopecurus myosuroides; Am) populations, displaying no overt disease symptoms, contained three previously undescribed viruses belonging to the Partititiviridae (AMPV1 and AMPV2) and Rhabdoviridae (AMVV1) families. These infections were widespread in UK black-grass populations and evidence was obtained for similar viruses being present in annual rye grass (Lolium rigidum), perennial rye-grass (Lolium perenne) and meadow fescue (Festuca pratensis). In black-grass, while no direct causative link was established linking viral infection to herbicide resistance, transcriptome sequencing showed a high incidence of infection in the NTSR Peldon population. The widespread infection of these weeds by little characterised and persistent viruses and their potential evolutionary role in enhancing plant stress tolerance mechanisms including NTSR warrants further investigation. PMID:28165016

  4. Peptide array on cellulose support--a screening tool to identify peptides with dipeptidyl-peptidase IV inhibitory activity within the sequence of α-lactalbumin.

    PubMed

    Lacroix, Isabelle M E; Li-Chan, Eunice C Y

    2014-11-13

    The inhibition of the enzyme dipeptidyl-peptidase IV (DPP-IV) is an effective pharmacotherapeutic approach for the management of type 2 diabetes. Recent findings have suggested that dietary proteins, including bovine α-lactalbumin, could be precursors of peptides able to inhibit DPP-IV. However, information on the location of active peptide sequences within the proteins is far from being comprehensive. Moreover, the traditional approach to identify bioactive peptides from foods can be tedious and long. Therefore, the objective of this study was to use peptide arrays to screen α-lactalbumin-derived peptides for their interaction with DPP-IV. Deca-peptides spanning the entire α-lactalbumin sequence, with a frame shift of 1 amino acid between successive sequences, were synthesized on cellulose membranes using "SPOT" technology, and their binding to and inhibition of DPP-IV was studied. Among the 114 α-lactalbumin-derived decamers investigated, the peptides 60WCKDDQNPHS69 (αK(i) = 76 µM), 105LAHKALCSEK114 (K(i) = 217 µM) and 110LCSEKLDQWL119 (K(i) = 217 µM) were among the strongest DPP-IV inhibitors. While the SPOT- and traditionally-synthesized peptides showed consistent trends in DPP-IV inhibitory activity, the cellulose-bound peptides' binding behavior was not correlated to their ability to inhibit the enzyme. This research showed, for the first time, that peptide arrays are useful screening tools to identify DPP-IV inhibitory peptides from dietary proteins.

  5. Classification of mouse VK groups based on the partial amino acid sequence to the first invariant tryptophan: impact of 14 new sequences from IgG myeloma proteins.

    PubMed

    Potter, M; Newell, J B; Rudikoff, S; Haber, E

    1982-12-01

    Fourteen new VK sequences derived from BALB/c IgG myeloma proteins were determined to the first invariant tryptophan (Trp 35). These partial sequences were compared with 65 other published VK sequences using a computer program. The 79 sequences were organized according to the length of the sequence from the amino terminus to the first invariant tryptophan (Trp 35), into seven groups (33, 34, 35, 36, 39, 40 and 41aa). A distance matrix of all 79 sequences was then computed, i.e. the number of amino acid substitutions necessary to convert one sequence to another was determined. From these data a dendrogram was constructed. Most of the VK sequences fell into clusters or closely related groups. The definition of a sequence group is arbitrary but facilitates the classification of VK proteins. We used 12 substitutions as the basis for defining a sequence group based on the known number of substitutions that are found in the VK21 proteins. By this criterion there were 18 groups in the Trp 35 dendrogram. Twelve of the 14 new sequences fell into one of these sequence groups; two formed new sequence groups. Collective amino acid sequencing is still encountering new VK structures indicating more sequences will be required to attain an accurate estimate of the total number of VK groups. Updated dendrograms can be quickly generated to include newly generated sequences.

  6. Detection and isolation of nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1997-04-01

    A method for detecting a target nucleic acid sequence in a sample is provided using hybridization probes which competitively hybridize to a target nucleic acid. According to the method, a target nucleic acid sequence is hybridized to first and second hybridization probes which are complementary to overlapping portions of the target nucleic acid sequence, the first hybridization probe including a first complexing agent capable of forming a binding pair with a second complexing agent and the second hybridization probe including a detectable marker. The first complexing agent attached to the first hybridization probe is contacted with a second complexing agent, the second complexing agent being attached to a solid support such that when the first and second complexing agents are attached, target nucleic acid sequences hybridized to the first hybridization probe become immobilized on to the solid support. The immobilized target nucleic acids are then separated and detected by detecting the detectable marker attached to the second hybridization probe. A kit for performing the method is also provided. 7 figs.

  7. Detection and isolation of nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1997-01-01

    A method for detecting a target nucleic acid sequence in a sample is provided using hybridization probes which competitively hybridize to a target nucleic acid. According to the method, a target nucleic acid sequence is hybridized to first and second hybridization probes which are complementary to overlapping portions of the target nucleic acid sequence, the first hybridization probe including a first complexing agent capable of forming a binding pair with a second complexing agent and the second hybridization probe including a detectable marker. The first complexing agent attached to the first hybridization probe is contacted with a second complexing agent, the second complexing agent being attached to a solid support such that when the first and second complexing agents are attached, target nucleic acid sequences hybridized to the first hybridization probe become immobilized on to the solid support. The immobilized target nucleic acids are then separated and detected by detecting the detectable marker attached to the second hybridization probe. A kit for performing the method is also provided.

  8. Amino acid sequence around the active-site serine residue in the acyltransferase domain of goat mammary fatty acid synthetase.

    PubMed Central

    Mikkelsen, J; Højrup, P; Rasmussen, M M; Roepstorff, P; Knudsen, J

    1985-01-01

    Goat mammary fatty acid synthetase was labelled in the acyltransferase domain by formation of O-ester intermediates by incubation with [1-14C]acetyl-CoA and [2-14C]malonyl-CoA. Tryptic-digest and CNBr-cleavage peptides were isolated and purified by high-performance reverse-phase and ion-exchange liquid chromatography. The sequences of the malonyl- and acetyl-labelled peptides were shown to be identical. The results confirm the hypothesis that both acetyl and malonyl groups are transferred to the mammalian fatty acid synthetase complex by the same transferase. The sequence is compared with those of other fatty acid synthetase transferases. PMID:3922356

  9. Whole exome sequencing identifies three recessive FIG4 mutations in an apparently dominant pedigree with Charcot-Marie-Tooth disease.

    PubMed

    Menezes, Manoj P; Waddell, Leigh; Lenk, Guy M; Kaur, Simranpreet; MacArthur, Daniel G; Meisler, Miriam H; Clarke, Nigel F

    2014-08-01

    Charcot-Marie-Tooth disease (CMT) is genetically heterogeneous and classification based on motor nerve conduction velocity and inheritance is used to direct genetic testing. With the less common genetic forms of CMT, identifying the causative genetic mutation by Sanger sequencing of individual genes can be time-consuming and costly. Next-generation sequencing technologies show promise for clinical testing in diseases where a similar phenotype is caused by different genes. We report the unusual occurrence of CMT4J, caused by mutations in FIG4, in a apparently dominant pedigree. The affected proband and her mother exhibit different disease severities associated with different combinations of compound heterozygous FIG4 mutations, identified by whole exome sequencing. The proband was also shown to carry a de novo nonsense mutation in the dystrophin gene, which may contribute to her more severe phenotype. This study is a cautionary reminder that in families with two generations affected, explanations other than dominant inheritance are possible, such as recessive inheritance due to three mutations segregating in the family. It also emphasises the advantages of next-generation sequencing approaches that screen multiple CMT genes at once for patients in whom the common genes have been excluded.

  10. Exome sequencing identifies a spectrum of mutation frequencies in advanced and lethal prostate cancers

    PubMed Central

    Kumar, Akash; White, Thomas A.; MacKenzie, Alexandra P.; Clegg, Nigel; Lee, Choli; Dumpit, Ruth F.; Coleman, Ilsa; Ng, Sarah B.; Salipante, Stephen J.; Rieder, Mark J.; Nickerson, Deborah A.; Corey, Eva; Lange, Paul H.; Morrissey, Colm; Vessella, Robert L.; Nelson, Peter S.; Shendure, Jay

    2011-01-01

    To catalog protein-altering mutations that may drive the development of prostate cancers and their progression to metastatic disease systematically, we performed whole-exome sequencing of 23 prostate cancers derived from 16 different lethal metastatic tumors and three high-grade primary carcinomas. All tumors were propagated in mice as xenografts, designated the LuCaP series, to model phenotypic variation, such as responses to cancer-directed therapeutics. Although corresponding normal tissue was not available for most tumors, we were able to take advantage of increasingly deep catalogs of human genetic variation to remove most germline variants. On average, each tumor genome contained ∼200 novel nonsynonymous variants, of which the vast majority was specific to individual carcinomas. A subset of genes was recurrently altered across tumors derived from different individuals, including TP53, DLK2, GPC6, and SDF4. Unexpectedly, three prostate cancer genomes exhibited substantially higher mutation frequencies, with 2,000–4,000 novel coding variants per exome. A comparison of castration-resistant and castration-sensitive pairs of tumor lines derived from the same prostate cancer highlights mutations in the Wnt pathway as potentially contributing to the development of castration resistance. Collectively, our results indicate that point mutations arising in coding regions of advanced prostate cancers are common but, with notable exceptions, very few genes are mutated in a substantial fraction of tumors. We also report a previously undescribed subtype of prostate cancers exhibiting “hypermutated” genomes, with potential implications for resistance to cancer therapeutics. Our results also suggest that increasingly deep catalogs of human germline variation may challenge the necessity of sequencing matched tumor-normal pairs. PMID:21949389

  11. Fatty acids identified in the Burmese python promote beneficial cardiac growth.

    PubMed

    Riquelme, Cecilia A; Magida, Jason A; Harrison, Brooke C; Wall, Christopher E; Marr, Thomas G; Secor, Stephen M; Leinwand, Leslie A

    2011-10-28

    Burmese pythons display a marked increase in heart mass after a large meal. We investigated the molecular mechanisms of this physiological heart growth with the goal of applying this knowledge to the mammalian heart. We found that heart growth in pythons is characterized by myocyte hypertrophy in the absence of cell proliferation and by activation of physiological signal transduction pathways. Despite high levels of circulating lipids, the postprandial python heart does not accumulate triglycerides or fatty acids. Instead, there is robust activation of pathways of fatty acid transport and oxidation combined with increased expression and activity of superoxide dismutase, a cardioprotective enzyme. We also identified a combination of fatty acids in python plasma that promotes physiological heart growth when injected into either pythons or mice.

  12. Ligation with nucleic acid sequence-based amplification.

    PubMed

    Ong, Carmichael; Tai, Warren; Sarma, Aartik; Opal, Steven M; Artenstein, Andrew W; Tripathi, Anubhav

    2012-01-01

    This work presents a novel method for detecting nucleic acid targets using a ligation step along with an isothermal, exponential amplification step. We use an engineered ssDNA with two variable regions on the ends, allowing us to design the probe for optimal reaction kinetics and primer binding. This two-part probe is ligated by T4 DNA Ligase only when both parts bind adjacently to the target. The assay demonstrates that the expected 72-nt RNA product appears only when the synthetic target, T4 ligase, and both probe fragments are present during the ligation step. An extraneous 38-nt RNA product also appears due to linear amplification of unligated probe (P3), but its presence does not cause a false-positive result. In addition, 40 mmol/L KCl in the final amplification mix was found to be optimal. It was also found that increasing P5 in excess of P3 helped with ligation and reduced the extraneous 38-nt RNA product. The assay was also tested with a single nucleotide polymorphism target, changing one base at the ligation site. The assay was able to yield a negative signal despite only a single-base change. Finally, using P3 and P5 with longer binding sites results in increased overall sensitivity of the reaction, showing that increasing ligation efficiency can improve the assay overall. We believe that this method can be used effectively for a number of diagnostic assays.

  13. Multilocus sequence typing of Mycoplasma hyorhinis strains identified by a real-time TaqMan PCR assay.

    PubMed

    Tocqueville, Véronique; Ferré, Séverine; Nguyen, Ngoc Hong Phuc; Kempf, Isabelle; Marois-Créhan, Corinne

    2014-05-01

    A real-time TaqMan PCR assay based on the gene encoding the protein p37 was developed to detect Mycoplasma hyorhinis. Its specificity was validated with 29 epidemiologically unrelated M. hyorhinis strains (28 field strains and one reference strain) and other mycoplasma species or with other microorganisms commonly found in pigs. The estimated detection limit of this qPCR assay was 125 microorganism equivalents/μl. The same 29 epidemiologically unrelated M. hyorhinis strains and four previously fully sequenced strains were typed by two portable typing methods, the sequencing of the p37 gene and a multilocus sequence typing (MLST) scheme. The first method revealed 18 distinct nucleotide sequences and insufficient discriminatory power (0.934). The MLST scheme was developed with the sequenced genomes of the M. hyorhinis strains HUB-1, GDL-1, MCLD, and SK76 and based on the genes dnaA, rpoB, gyrB, gltX, adk, and gmk. In total, 2,304 bp of sequence was analyzed for each strain. MLST was capable of subdividing the 33 strains into 29 distinct sequence types. The discriminatory power of the method was >0.95, which is the threshold value for interpreting typing results with confidence (D=0.989). Population analysis showed that recombination in M. hyorhinis occurs and that strains are diverse but with a certain clonality (one unique clonal complex was identified). The new qPCR assay and the robust MLST scheme are available for the acquisition of new knowledge on M. hyorhinis epidemiology. A web-accessible database has been set up for the M. hyorhinis MLST scheme at http://pubmlst.org/mhyorhinis/.

  14. Identifying T Cell Receptors from High-Throughput Sequencing: Dealing with Promiscuity in TCRα and TCRβ Pairing

    PubMed Central

    Thomas, Paul G.; Mold, Jeff E.

    2017-01-01

    Characterisation of the T cell receptors (TCR) involved in immune responses is important for the design of vaccines and immunotherapies for cancer and autoimmune disease. The specificity of the interaction between the TCR heterodimer and its peptide-MHC ligand derives largely from the juxtaposed hypervariable CDR3 regions on the TCRα and TCRβ chains, and obtaining the paired sequences of these regions is a standard for functionally defining the TCR. A brute force approach to identifying the TCRs in a population of T cells is to use high-throughput single-cell sequencing, but currently this process remains costly and risks missing small clones. Alternatively, CDR3α and CDR3β sequences can be associated using their frequency of co-occurrence in independent samples, but this approach can be confounded by the sharing of CDR3α and CDR3β across clones, commonly observed within epitope-specific T cell populations. The accurate, exhaustive, and economical recovery of TCR sequences from such populations therefore remains a challenging problem. Here we describe an algorithm for performing frequency-based pairing (alphabetr) that accommodates CDR3α- and CDR3β-sharing, cells expressing two TCRα chains, and multiple forms of sequencing error. The algorithm also yields accurate estimates of clonal frequencies. PMID:28103239

  15. Nucleolar targeting of proteins by the tandem array of basic amino acid stretches identified in the RNA polymerase I-associated factor PAF49

    SciTech Connect

    Ushijima, Ryujiro; Matsuyama, Toshifumi; Nagata, Izumi; Yamamoto, Kazuo

    2008-05-16

    There is accumulating evidence to indicate that the regulation of subnuclear compartmentalization plays important roles in cellular processes. The RNA polymerase I-associated factor PAF49 has been shown to accumulate in the nucleolus in growing cells, but disperse into the nucleoplasm in growth-arrested cells. Serial deletion analysis revealed that amino acids 199-338 were necessary for the nucleolar localization of PAF49. Combinatorial point mutation analysis indicated that the individual basic amino acid stretches (BS) within the central (BS1-4) and the C-terminal (BS5 and 6) regions may cooperatively confer the nucleolar localization of PAF49. Addition of the basic stretches in tandem to a heterologous protein, such as the interferon regulatory factor-3, translocated the tagged protein into the nucleolus, even in the presence of an intrinsic nuclear export sequence. Thus, tandem array of the basic amino acid stretches identified here functions as a dominant nucleolar targeting sequence.

  16. Identity-by-descent filtering of exome sequence data identifies PIGV mutations in hyperphosphatasia mental retardation syndrome.

    PubMed

    Krawitz, Peter M; Schweiger, Michal R; Rödelsperger, Christian; Marcelis, Carlo; Kölsch, Uwe; Meisel, Christian; Stephani, Friederike; Kinoshita, Taroh; Murakami, Yoshiko; Bauer, Sebastian; Isau, Melanie; Fischer, Axel; Dahl, Andreas; Kerick, Martin; Hecht, Jochen; Köhler, Sebastian; Jäger, Marten; Grünhagen, Johannes; de Condor, Birgit Jonske; Doelken, Sandra; Brunner, Han G; Meinecke, Peter; Passarge, Eberhard; Thompson, Miles D; Cole, David E; Horn, Denise; Roscioli, Tony; Mundlos, Stefan; Robinson, Peter N

    2010-10-01

    Hyperphosphatasia mental retardation (HPMR) syndrome is an autosomal recessive form of mental retardation with distinct facial features and elevated serum alkaline phosphatase. We performed whole-exome sequencing in three siblings of a nonconsanguineous union with HPMR and performed computational inference of regions identical by descent in all siblings to establish PIGV, encoding a member of the GPI-anchor biosynthesis pathway, as the gene mutated in HPMR. We identified homozygous or compound heterozygous mutations in PIGV in three additional families.

  17. Novel Exons and Splice Variants in the Human Antibody Heavy Chain Identified by Single Cell and Single Molecule Sequencing

    PubMed Central

    Vollmers, Christopher; Penland, Lolita; Kanbar, Jad N.; Quake, Stephen R.

    2015-01-01

    Antibody heavy chains contain a variable and a constant region. The constant region of the antibody heavy chain is encoded by multiple groups of exons which define the isotype and therefore many functional characteristics of the antibody. We performed both single B cell RNAseq and long read single molecule sequencing of antibody heavy chain transcripts and were able to identify novel exons for IGHA1 and IGHA2 as well as novel isoforms for IGHM antibody heavy chain. PMID:25611855

  18. Thin-film technology for direct visual detection of nucleic acid sequences: applications in clinical research.

    PubMed

    Jenison, Robert D; Bucala, Richard; Maul, Diana; Ward, David C

    2006-01-01

    Certain optical conditions permit the unaided eye to detect thickness changes on surfaces on the order of 20 A, which are of similar dimensions to monomolecular interactions between proteins or hybridization of complementary nucleic acid sequences. Such detection exploits specific interference of reflected white light, wherein thickness changes are perceived as surface color changes. This technology, termed thin-film detection, allows for the visualization of subattomole amounts of nucleic acid targets, even in complex clinical samples. Thin-film technology has been applied to a broad range of clinically relevant indications, including the detection of pathogenic bacterial and viral nucleic acid sequences and the discrimination of sequence variations in human genes causally related to susceptibility or severity of disease.

  19. Conservation of Shannon's redundancy for proteins. [information theory applied to amino acid sequences

    NASA Technical Reports Server (NTRS)

    Gatlin, L. L.

    1974-01-01

    Concepts of information theory are applied to examine various proteins in terms of their redundancy in natural originators such as animals and plants. The Monte Carlo method is used to derive information parameters for random protein sequences. Real protein sequence parameters are compared with the standard parameters of protein sequences having a specific length. The tendency of a chain to contain some amino acids more frequently than others and the tendency of a chain to contain certain amino acid pairs more frequently than other pairs are used as randomness measures of individual protein sequences. Non-periodic proteins are generally found to have random Shannon redundancies except in cases of constraints due to short chain length and genetic codes. Redundant characteristics of highly periodic proteins are discussed. A degree of periodicity parameter is derived.

  20. RNA internal standard synthesis by nucleic acid sequence-based amplification for competitive quantitative amplification reactions.

    PubMed

    Lo, Wan-Yu; Baeumner, Antje J

    2007-02-15

    Nucleic acid sequence-based amplification (NASBA) reactions have been demonstrated to successfully synthesize new sequences based on deletion and insertion reactions. Two RNA internal standards were synthesized for use in competitive amplification reactions in which quantitative analysis can be achieved by coamplifying the internal standard with the wild type sample. The sequences were created in two consecutive NASBA reactions using the E. coli clpB mRNA sequence as model analyte. The primer sequences of the wild type sequence were maintained, and a 20-nt-long segment inside the amplicon region was exchanged for a new segment of similar GC content and melting temperature. The new RNA sequence was thus amplifiable using the wild type primers and detectable via a new inserted sequence. In the first reaction, the forwarding primer and an additional 20-nt-long sequence was deleted and replaced by a new 20-nt-long sequence. In the second reaction, a forwarding primer containing as 5' overhang sequence the wild type primer sequence was used. The presence of pure internal standard was verified using electrochemiluminescence and RNA lateral-flow biosensor analysis. Additional sequence deletion in order to shorten the internal standard amplicons and thus generate higher detection signals was found not to be required. Finally, a competitive NASBA reaction between one internal standard and the wild type sequence was carried out proving its functionality. This new rapid construction method via NASBA provides advantages over the traditional techniques since it requires no traditional cloning procedures, no thermocyclers, and can be completed in less than 4 h.

  1. Whole exome sequencing identified novel CRB1 mutations in Chinese and Indian populations with autosomal recessive retinitis pigmentosa

    PubMed Central

    Yang, Yin; Yang, Yeming; Huang, Lulin; Zhai, Yaru; Li, Jie; Jiang, Zhilin; Gong, Bo; Fang, Hao; Kim, Ramasamy; Yang, Zhenglin; Sundaresan, Periasamy; Zhu, Xianjun; Zhou, Yu

    2016-01-01

    Retinitis pigmentosa (RP) is a leading cause of inherited blindness characterized by progressive degeneration of the retinal photoreceptor cells. This study aims to identify genetic mutations in a Chinese family RP-2236, an Indian family RP-IC-90 and 100 sporadic Indian individuals with autosomal recessive RP (arRP). Whole exome sequencing was performed on the index patients of RP-2236, RP-IC-90 and all of the 100 sporadic Indian patients. Direct Sanger sequencing was used to validate the mutations identified. Four novel mutations and one reported mutation in the crumbs homolog 1 (CRB1) gene, which has been known to cause severe retinal dystrophies, were identified. A novel homozygous splicing mutation c.2129-1G>C was found in the three patients In family RP-2236. A homozygous point mutation p.R664C was found in RP-IC-90. A novel homozygous mutation p.G1310C was identified in patient I-44, while novel compound heterozygous mutations p.N629D and p.A593T were found in patient I-7. All mutations described above were not present in the 1000 normal controls. In conclusion, we identified four novel mutations in CRB1 in a cohort of RP patients from the Chinese and Indian populations. Our data enlarges the CRB1 mutation spectrums and may provide new target loci for RP diagnose and treatment. PMID:27670293

  2. Whole genome sequencing identifies a deletion in protein phosphatase 2A that affects its stability and localization in Chlamydomonas reinhardtii.

    PubMed

    Lin, Huawen; Miller, Michelle L; Granas, David M; Dutcher, Susan K

    2013-01-01

    Whole genome sequencing is a powerful tool in the discovery of single nucleotide polymorphisms (SNPs) and small insertions/deletions (indels) among mutant strains, which simplifies forward genetics approaches. However, identification of the causative mutation among a large number of non-causative SNPs in a mutant strain remains a big challenge. In the unicellular biflagellate green alga Chlamydomonas reinhardtii, we generated a SNP/indel library that contains over 2 million polymorphisms from four wild-type strains, one highly polymorphic strain that is frequently used in meiotic mapping, ten mutant strains that have flagellar assembly or motility defects, and one mutant strain, imp3, which has a mating defect. A comparison of polymorphisms in the imp3 strain and the other 15 strains allowed us to identify a deletion of the last three amino acids, Y313F314L315, in a protein phosphatase 2A catalytic subunit (PP2A3) in the imp3 strain. Introduction of a wild-type HA-tagged PP2A3 rescues the mutant phenotype, but mutant HA-PP2A3 at Y313 or L315 fail to rescue. Our immunoprecipitation results indicate that the Y313, L315, or YFLΔ mutations do not affect the binding of PP2A3 to the scaffold subunit, PP2A-2r. In contrast, the Y313, L315, or YFLΔ mutations affect both the stability and the localization of PP2A3. The PP2A3 protein is less abundant in these mutants and fails to accumulate in the basal body area as observed in transformants with either wild-type HA-PP2A3 or a HA-PP2A3 with a V310T change. The accumulation of HA-PP2A3 in the basal body region disappears in mated dikaryons, which suggests that the localization of PP2A3 may be essential to the mating process. Overall, our results demonstrate that the terminal YFL tail of PP2A3 is important in the regulation on Chlamydomonas mating.

  3. Clinically Relevant Variants Identified in Thoracic Aortic Aneurysm Patients by Research Exome Sequencing

    PubMed Central

    Schubert, Jeffrey A.; Landis, Benjamin J.; Shikany, Amy R.; Hinton, Robert B.; Ware, Stephanie M.

    2016-01-01

    Thoracic aortic aneurysm (TAA) is a genetically heterogeneous disease involving subclinical and progressive dilation of the thoracic aorta, which can lead to life-threatening complications such as dissection or rupture. Genetic testing is important for risk stratification and identification of at risk family members, and clinically available genetic testing panels have been expanding rapidly. However, when past testing results are normal, there is little evidence to guide decision-making about the indications and timing to pursue additional clinical genetic testing. Results from research based genetic testing can help inform this process. Here we present 10 TAA patients who have a family history of disease and who enrolled in research-based exome testing. Nine of these ten patients had previous clinical genetic testing that did not identify the cause of disease. We sought to determine the number of rare variants in 23 known TAA associated genes identified by research-based exome testing. In total, we found 10 rare variants in six patients. Likely pathogenic variants included a TGFB2 variant in one patient and a SMAD3 variant in another. These variants have been reported previously in individuals with similar phenotypes. Variants of uncertain significance of particular interest included novel variants in MYLK and MFAP5, which were identified in a third patient. In total, clinically reportable rare variants were found in 6/10 (60%) patients, with at least 2/10 (20%) patients having likely pathogenic variants identified. These data indicate that consideration of re-testing is important in TAA patients with previous negative or inconclusive results. PMID:26854089

  4. De novo transcriptome sequencing to identify the sex-determination genes in Hyriopsis schlegelii.

    PubMed

    Shi, Jianwu; Hong, Yijiang; Sheng, Junqing; Peng, Kou; Wang, Junhua

    2015-01-01

    This study presents the first analysis of expressed transcripts in the spermary and ovary of Hyriopsis schlegelii (H. schlegelii). A total of 132,055 unigenes were obtained and 31,781 of these genes were annotated. In addition, 19,511 upregulated and 25,911 downregulated unigenes were identified in the spermary. Ten sex-determination genes were selected and further analyzed by real-time PCR. In addition, mammalian genes reported to govern sex-determination pathways, including Sry, Dmrt1, Dmrt2, Sox9, GATA4, and WT1 in males and Wnt4, Rspo1, Foxl2, and β-catenin in females, were also identified in H. schlegelii. These results suggest that H. schlegelii and mammals use similar gene regulatory mechanisms to control sex determination. Moreover, genes associated with dosage compensation mechanisms, such as Msl1, Msl2, and Msl3, and hermaphrodite phenotypes, such as Tra-1, Tra-2α, Tra-2β, Fem1A, Fem1B, and Fem1C, were also identified in H. schlegelii. The identification of these genes indicates that diverse regulatory mechanisms regulate sexual polymorphism in H. schlegelii.

  5. Detection of DBD-carbamoyl amino acids in amino acid sequence and D/L configuration determination of peptides with fluorogenic Edman reagent 7-[(N,N-dimethylamino)sulfonyl]-2,1,3-benzoxadiazol-4-yl isothiocyanate.

    PubMed

    Huang, Y; Matsunaga, H; Toriba, A; Santa, T; Fukushima, T; Imai, K

    1999-06-01

    A method for amino acid sequence and D/L configuration identification of peptides by using fluorogenic Edman reagent 7-[(N, N-dimethylamino)sulfonyl]-2,1,3-benzoxadiazol-4-yl isothiocyanate (DBD-NCS) has been developed. This method was based on the Edman degradation principle with some modifications. A peptide or protein was coupled with DBD-NCS under basic conditions and then cyclized/cleaved to produce DBD-thiazolinone (TZ) derivative by BF3, a Lewis acid, which could significantly suppress the amino acid racemization. The liberated DBD-TZ amino acid was hydrolyzed to DBD-thiocarbamoyl (TC) amino acid under a weakly acidic condition and then oxidized by NaNO2/H+ to DBD-carbamoyl (CA) amino acid which was a stable and had a strong fluorescence intensity. The individual DBD-CA amino acids were separated on a reversed-phase high-performance liquid chromatography (RP-HPLC) for amino acid sequencing and their enantiomers were resolved on a chiral stationary-phase HPLC for identifying their D/L configurations. Combination of the two HPLC systems, the amino acid sequence and D/L configuration of peptides could be determined. This method will be useful for searching D-amino-acid-containing peptides in animals.

  6. Whole Exome Sequencing Identifies CRB1 Defect in an Unusual Maculopathy Phenotype

    PubMed Central

    Tsang, Stephen H.; Burke, Tomas; Oll, Maris; Yzer, Suzanne; Lee, Winston; Xie, Yajing (Angela); Allikmets, Rando

    2014-01-01

    Objective To report a new phenotype caused by mutations in the CRB1 gene in a family with 2 affected siblings. Design Molecular genetics and observational case studies. Participants Two affected siblings and 3 unaffected family members. Methods Each subject received a complete ophthalmic examination together with color fundus photography, fundus autofluorescence (FAF), and spectral domain optical coherence tomography (SD-OCT). Microperimetry 1 (MP-1) mapping and electroretinogram (ERG) analysis were performed on the proband. Screening for disease-causing mutations was performed by whole exome sequencing in 3 family members followed by segregation analyses in the entire family. Main Outcome Measures Appearance of the macula as examined by clinical examination, fundus photography, FAF imaging, SD-OCT, and visual function by MP-1 and ERG. Results The proband and her affected brother exhibited unusual, previously unreported, findings of a macular dystrophy with relative sparing of the retinal periphery beyond the vascular arcades. The FAF imaging showed severely affected areas of hypoautofluorescence that extended nasally beyond the optic disc in both eyes. A central macular patch of retinal pigment epithelium (RPE) sparing was evident in both eyes on FAF, whereas photoreceptor sparing was documented in the right eye only using SD-OCT. The affected brother presented with irregular patterns of autofluorescence in both eyes characterized by concentric rings of alternating hyper- and hypoautofluorescence, and foveal sparing of photoreceptors and RPE, as seen on SD-OCT, bilaterally. After negative results in screening for mutations in candidate genes including ABCA4 and PRPH2, DNA from 3 members of the family, including both affected siblings and their mother, was screened by whole exome sequencing resulting in identification of 2 CRB1 missense mutations, c.C3991T:p.R1331C and c.C4142T:p.P1381L, which segregated with the disease in the family. Of the 2, the p.R1331C CRB1

  7. A biotin enrichment strategy identifies novel carbonylated amino acids in proteins from human plasma.

    PubMed

    Havelund, Jesper F; Wojdyla, Katarzyna; Davies, Michael J; Jensen, Ole N; Møller, Ian Max; Rogowska-Wrzesinska, Adelina

    2017-03-06

    Protein carbonylation is an irreversible protein oxidation correlated with oxidative stress, various diseases and ageing. Here we describe a peptide-centric approach for identification and characterisation of up to 14 different types of carbonylated amino acids in proteins. The modified residues are derivatised with biotin-hydrazide, enriched and characterised by tandem mass spectrometry. The strength of the method lies in an improved elution of biotinylated peptides from monomeric avidin resin using hot water (95°C) and increased sensitivity achieved by reduction of analyte losses during sample preparation and chromatography. For the first time MS/MS data analysis utilising diagnostic biotin fragment ions is used to pinpoint sites of biotin labelling and improve the confidence of carbonyl peptide assignments. We identified a total of 125 carbonylated residues in bovine serum albumin after extensive in vitro metal ion-catalysed oxidation. Furthermore, we assigned 133 carbonylated sites in 36 proteins in native human plasma protein samples. The optimised workflow enabled detection of 10 hitherto undetected types of carbonylated amino acids in proteins: aldehyde and ketone modifications of leucine, valine, alanine, isoleucine, glutamine, lysine and glutamic acid (+14Da), an oxidised form of methionine - aspartate semialdehyde (-32Da) - and decarboxylated glutamic acid and aspartic acid (-30Da).

  8. Transcriptome Sequencing of Gynostemma pentaphyllum to Identify Genes and Enzymes Involved in Triterpenoid Biosynthesis

    PubMed Central

    Ma, Chengtong; Qian, Jieying; Lan, Xiuwan; Chao, Naixia; Sun, Jian

    2016-01-01

    G. pentaphyllum (Gynostemma pentaphyllum), a creeping herbaceous perennial with many important medicinal properties, is widely distributed in Asia. Gypenosides (triterpenoid saponins), the main effective components of G. pentaphyllum, are well studied. FPS (farnesyl pyrophosphate synthase), SS (squalene synthase), and SE (squalene epoxidase) are the main enzymes involved in the synthesis of triterpenoid saponins. Considering the important medicinal functions of G. pentaphyllum, it is necessary to investigate the transcriptomic information of G. pentaphyllum to facilitate future studies of transcriptional regulation. After sequencing G. pentaphyllum, we obtained 50,654,708 unigenes. Next, we used RPKM (reads per kilobases per million reads) to calculate expression of the unigenes and we performed comparison of our data to that contained in five common databases to annotate different aspects of the unigenes. Finally, we noticed that FPS, SS, and SE showed differential expression of enzymes in DESeq. Leaves showed the highest expression of FPS, SS, and SE relative to the other two tissues. Our research provides transcriptomic information of G. pentaphyllum in its natural environment and we found consistency in unigene expression, enzymes expression (FPS, SS, and SE), and the distribution of gypenosides content in G. pentaphyllum. Our results will enable future related studies of G. pentaphyllum. PMID:28097124

  9. Novel variants in MLL confer to bladder cancer recurrence identified by whole-exome sequencing

    PubMed Central

    Wang, Yongqiang; Huang, Yi; Liu, Huan; Li, Feida; He, Luyun; Sun, Da; Yu, Yuan; Li, Qiaoling; Huang, Peide; Zhang, Meng; Zhao, Xin; Bi, Tengteng; Zhuang, Xuehan; Zhang, Liyan; Lu, Jingxiao; Sun, Xiaojuan; Zhou, Fangjian; Liu, Chunxiao; Yang, Guosheng; Hou, Yong; Fan, Zusen; Cai, Zhiming

    2016-01-01

    Bladder cancer (BC) is distinguished by high rate of recurrence after surgery, but the underlying mechanisms remain poorly understood. Here we performed the whole-exome sequencing of 37 BC individuals including 20 primary and 17 recurrent samples in which the primary and recurrent samples were not from the same patient. We uncovered that MLL, EP400, PRDM2, ANK3 and CHD5 exclusively altered in recurrent BCs. Specifically, the recurrent BCs and bladder cancer cells with MLL mutation displayed increased histone H3 tri-methyl K4 (H3K4me3) modification in tissue and cell levels and showed enhanced expression of GATA4 and ETS1 downstream. What's more, MLL mutated bladder cancer cells obtained with CRISPR/Cas9 showed increased ability of drug-resistance to epirubicin (a chemotherapy drug for bladder cancer) than wild type cells. Additionally, the BC patients with high expression of GATA4 and ETS1 significantly displayed shorter lifespan than patients with low expression. Our study provided an overview of the genetic basis of recrudescent bladder cancer and discovered that genetic alterations of MLL were involved in BC relapse. The increased modification of H3K4me3 and expression of GATA4 and ETS1 would be the promising targets for the diagnosis and therapy of relapsed bladder cancer. PMID:26625313

  10. Novel variants in MLL confer to bladder cancer recurrence identified by whole-exome sequencing.

    PubMed

    Wu, Song; Yang, Zhao; Ye, Rui; An, Dan; Li, Chong; Wang, Yitian; Wang, Yongqiang; Huang, Yi; Liu, Huan; Li, Feida; He, Luyun; Sun, Da; Yu, Yuan; Li, Qiaoling; Huang, Peide; Zhang, Meng; Zhao, Xin; Bi, Tengteng; Zhuang, Xuehan; Zhang, Liyan; Lu, Jingxiao; Sun, Xiaojuan; Zhou, Fangjian; Liu, Chunxiao; Yang, Guosheng; Hou, Yong; Fan, Zusen; Cai, Zhiming

    2016-01-19

    Bladder cancer (BC) is distinguished by high rate of recurrence after surgery, but the underlying mechanisms remain poorly understood. Here we performed the whole-exome sequencing of 37 BC individuals including 20 primary and 17 recurrent samples in which the primary and recurrent samples were not from the same patient. We uncovered that MLL, EP400, PRDM2, ANK3 and CHD5 exclusively altered in recurrent BCs. Specifically, the recurrent BCs and bladder cancer cells with MLL mutation displayed increased histone H3 tri-methyl K4 (H3K4me3) modification in tissue and cell levels and showed enhanced expression of GATA4 and ETS1 downstream. What's more, MLL mutated bladder cancer cells obtained with CRISPR/Cas9 showed increased ability of drug-resistance to epirubicin (a chemotherapy drug for bladder cancer) than wild type cells. Additionally, the BC patients with high expression of GATA4 and ETS1 significantly displayed shorter lifespan than patients with low expression. Our study provided an overview of the genetic basis of recrudescent bladder cancer and discovered that genetic alterations of MLL were involved in BC relapse. The increased modification of H3K4me3 and expression of GATA4 and ETS1 would be the promising targets for the diagnosis and therapy of relapsed bladder cancer.

  11. CDH1 mutations in gastric cancer patients from northern Brazil identified by Next- Generation Sequencing (NGS).

    PubMed

    El-Husny, Antonette; Raiol-Moraes, Milene; Amador, Marcos; Ribeiro-Dos-Santos, André M; Montagnini, André; Barbosa, Silvanira; Silva, Artur; Assumpção, Paulo; Ishak, Geraldo; Santos, Sidney; Pinto, Pablo; Cruz, Aline; Ribeiro-Dos-Santos, Ândrea

    2016-05-13

    Gastric cancer is considered to be the fifth highest incident tumor worldwide and the third leading cause of cancer deaths. Developing regions report a higher number of sporadic cases, but there are only a few local studies related to hereditary cases of gastric cancer in Brazil to confirm this fact. CDH1 germline mutations have been described both in familial and sporadic cases, but there is only one recent molecular description of individuals from Brazil. In this study we performed Next Generation Sequencing (NGS) to assess CDH1 germline mutations in individuals who match the clinical criteria for Hereditary Diffuse Gastric Cancer (HDGC), or who exhibit very early diagnosis of gastric cancer. Among five probands we detected CDH1 germline mutations in two cases (40%). The mutation c.1023T > G was found in a HDGC family and the mutation c.1849G > A, which is nearly exclusive to African populations, was found in an early-onset case of gastric adenocarcinoma. The mutations described highlight the existence of gastric cancer cases caused by CDH1 germline mutations in northern Brazil, although such information is frequently ignored due to the existence of a large number of environmental factors locally. Our report represent the first CDH1 mutations in HDGC described from Brazil by an NGS platform.

  12. CDH1 mutations in gastric cancer patients from northern Brazil identified by Next- Generation Sequencing (NGS)

    PubMed Central

    El-Husny, Antonette; Raiol-Moraes, Milene; Amador, Marcos; Ribeiro-dos-Santos, André M.; Montagnini, André; Barbosa, Silvanira; Silva, Artur; Assumpção, Paulo; Ishak, Geraldo; Santos, Sidney; Pinto, Pablo; Cruz, Aline; Ribeiro-dos-Santos, Ândrea

    2016-01-01

    Abstract Gastric cancer is considered to be the fifth highest incident tumor worldwide and the third leading cause of cancer deaths. Developing regions report a higher number of sporadic cases, but there are only a few local studies related to hereditary cases of gastric cancer in Brazil to confirm this fact. CDH1 germline mutations have been described both in familial and sporadic cases, but there is only one recent molecular description of individuals from Brazil. In this study we performed Next Generation Sequencing (NGS) to assess CDH1 germline mutations in individuals who match the clinical criteria for Hereditary Diffuse Gastric Cancer (HDGC), or who exhibit very early diagnosis of gastric cancer. Among five probands we detected CDH1 germline mutations in two cases (40%). The mutation c.1023T > G was found in a HDGC family and the mutation c.1849G > A, which is nearly exclusive to African populations, was found in an early-onset case of gastric adenocarcinoma. The mutations described highlight the existence of gastric cancer cases caused by CDH1 germline mutations in northern Brazil, although such information is frequently ignored due to the existence of a large number of environmental factors locally. Our report represent the first CDH1 mutations in HDGC described from Brazil by an NGS platform. PMID:27192129

  13. Single epicardial cell transcriptome sequencing identifies Caveolin 1 as an essential factor in zebrafish heart regeneration

    PubMed Central

    Cao, Jingli; Navis, Adam; Cox, Ben D.; Dickson, Amy L.; Gemberling, Matthew; Karra, Ravi; Bagnat, Michel; Poss, Kenneth D.

    2016-01-01

    In contrast to mammals, adult zebrafish have a high capacity to regenerate damaged or lost myocardium through proliferation of cardiomyocytes spared from damage. The epicardial sheet covering the heart is activated by injury and aids muscle regeneration through paracrine effects and as a multipotent cell source, and has received recent attention as a target in cardiac repair strategies. Although it is recognized that epicardium is required for muscle regeneration and itself has high regenerative potential, the extent of cellular heterogeneity within epicardial tissue is largely unexplored. Here, we performed transcriptome analysis on dozens of epicardial lineage cells purified from zebrafish harboring a transgenic reporter for the pan-epicardial gene tcf21. Hierarchical clustering analysis suggested the presence of at least three epicardial cell subsets defined by expression signatures. We validated many new pan-epicardial and epicardial markers by alternative expression assays. Additionally, we explored the function of the scaffolding protein and main component of caveolae, caveolin 1 (cav1), which was present in each epicardial subset. In BAC transgenic zebrafish, cav1 regulatory sequences drove strong expression in ostensibly all epicardial cells and in coronary vascular endothelial cells. Moreover, cav1 mutant zebrafish generated by genome editing showed grossly normal heart development and adult cardiac anatomy, but displayed profound defects in injury-induced cardiomyocyte proliferation and heart regeneration. Our study defines a new platform for the discovery of epicardial lineage markers, genetic tools, and mechanisms of heart regeneration. PMID:26657776

  14. TCR sequencing facilitates diagnosis and identifies mature T cells as the cell of origin in CTCL

    PubMed Central

    O'Malley, John T.; Williamson, David W.; Scott, Laura-Louise; Elco, Christopher P.; Teague, Jessica E.; Gehad, Ahmed; Lowry, Elizabeth L.; LeBoeuf, Nicole R.; Krueger, James G.; Robins, Harlan S.; Kupper, Thomas S.; Clark, Rachael A.

    2016-01-01

    Early diagnosis of CTCL is difficult and takes on average six years after presentation, in part because the clinical appearance and histopathology of CTCL can resemble that of benign inflammatory skin diseases. Detection of a malignant T cell clone is critical in making the diagnosis of CTCL but the TCRγ PCR analysis in current clinical use detect clones in only a subset of patients. High-throughput TCR sequencing (HTS) detected T cell clones in 46/46 CTCL patients, was more sensitive and specific than TCRγ PCR, and successfully discriminated CTCL from benign inflammatory diseases. HTS also accurately assessed responses to therapy and facilitated diagnosis of disease recurrence. In patients with new skin lesions and no involvement of blood by flow cytometry, HTS demonstrated hematogenous spread of small numbers of malignant T cells. Analysis of CTCL TCRγ genes demonstrated that CTCL is a malignancy derived from mature T cells. There was a maximal T cell density in skin in benign inflammatory diseases that was exceeded in CTCL, suggesting a niche of finite size may exist for benign T cells in skin. Lastly, immunostaining demonstrated that the malignant T cell clones in mycosis fungoides and leukemic CTCL localized to different anatomic compartments in the skin. In summary, HTS accurately diagnosed CTCL in all stages, discriminated CTCL from benign inflammatory skin diseases and provided insights into the cell of origin and location of malignant CTCL cells in skin. PMID:26446955

  15. Sequence of the canine herpesvirus thymidine kinase gene: taxon-preferred amino acid residues in the alphaherpesviral thymidine kinases.

    PubMed

    Rémond, M; Sheldrick, P; Lebreton, F; Foulon, T

    1995-12-01

    Multiple sequence alignments of evolutionarily related proteins are finding increasing use as indicators of critical amino acid residues necessary for structural stability or involved in functional domains responsible for catalytic activities. In the past, a number of alignments have provided such information for the herpesviral thymidine kinases, for which three-dimensional structures are not yet available. We have sequenced the thymidine kinase gene of a canine herpesvirus, and with a multiple alignment have identified amino acids preferentially conserved in either of two taxons, the genera Varicellovirus and Simplexvirus, of the subfamily Alphaherpesvirinae. Since some regions of the thymidine kinases show otherwise elevated levels of substitutional tolerance, these conserved amino acids are candidates for critical residues which have become fixed through selection during the evolutionary divergence of these enzymes. Several pairs with distinctive patterns of distribution among the various viruses occur in or near highly conserved sequence motifs previously proposed to form the catalytic site, and we speculate that they may represent interacting, co-ordinately variable residues.

  16. Complete amino acid sequence of an acidic, cardiotoxic phospholipase A2 from the venom of Ophiophagus hannah (King Cobra): a novel cobra venom enzyme with "pancreatic loop".

    PubMed

    Huang, M Z; Gopalakrishnakone, P; Chung, M C; Kini, R M

    1997-02-15

    A phospholipase A2 (OHV A-PLA2) from the venom of Ophiophagus hannah (King cobra) is an acidic protein exhibiting cardiotoxicity, myotoxicity, and antiplatelet activity. The complete amino acid sequence of OHV A-PLA2 has been determined using a combination of Edman degradation and mass spectrometric techniques. OHV A-PLA2 is composed of a single chain of 124 amino acid residues with 14 cysteines and a calculated molecular weight of 13719 Da. It contains the loop of residues (62-66) found in pancreatic PLA2s and hence belongs to class IB enzymes. This pancreatic loop is between two proline residues (Pro 59 and Pro 68) and contains several hydrophilic amino acids (Ser and Asp). This region has high degree of conformational flexibility and is on the surface of the molecule, and hence it may be a potential protein-protein interaction site. A relatively low sequence homology is found between OHV A-PLA2 and other known cardiotoxic PLA2s, and hence a contiguous segment could not be identified as a site responsible for the cardiotoxic activity.

  17. Coronavirus genome: prediction of putative functional domains in the non-structural polyprotein by comparative amino acid sequence analysis.

    PubMed Central

    Gorbalenya, A E; Koonin, E V; Donchenko, A P; Blinov, V M

    1989-01-01

    Amino acid sequences of 2 giant non-structural polyproteins (F1 and F2) of infectious bronchitis virus (IBV), a member of Coronaviridae, were compared, by computer-assisted methods, to sequences of a number of other positive strand RNA viral and cellular proteins. By this approach, juxtaposed putative RNA-dependent RNA polymerase, nucleic acid binding ("finger"-like) and RNA helicase domains were identified in F2. Together, these domains might constitute the core of the protein complex involved in the primer-dependent transcription, replication and recombination of coronaviruses. In F1, two cysteine protease-like domains and a growth factor-like one were revealed. One of the putative proteases of IBV is similar to 3C proteases of picornaviruses and related enzymes of como- nepo- and potyviruses. Search of IBV F1 and F2 sequences for sites similar to those cleaved by the latter proteases and intercomparison of the surrounding sequence stretches revealed 13 dipeptides Q/S(G) which are probably cleaved by the coronavirus 3C-like protease. Based on these observations, a partial tentative scheme for the functional organization and expression strategy of the non-structural polyproteins of IBV was proposed. It implies that, despite the general similarity to other positive strand RNA viruses, and particularly to potyviruses, coronaviruses possess a number of unique structural and functional features. PMID:2526320

  18. Novel somatic mutations identified by whole-exome sequencing in muscle-invasive transitional cell carcinoma of the bladder.

    PubMed

    Pan, Huixing; Xu, Xiaojian; Wu, Deyao; Qiu, Qiaocheng; Zhou, Shoujun; He, Xuefeng; Zhou, Yunfeng; Qu, Ping; Hou, Jianquan; He, Jun; Zhou, Jian

    2016-02-01

    Transitional cell carcinoma (TCC) is the one of the most commonly observed types of cancer globally. The identification of novel disease-associated genes in TCC has had a significant effect on the diagnosis and treatment of bladder cancer; however, there may be a large number of novel genes that have not been identified. In the present study, the exomes of two individuals who were diagnosed with muscle-invasive TCC (MI-TCC) were sequenced to investigate potential variants. Subsequently, following algorithm and filter analysis, Sanger sequencing was used to validate the results of deep sequencing. Immunohistochemistry (IHC) was employed to observe the differences in HECT, C2 and WW domain-containing E3 ubiquitin protein ligase 1 (HECW1) protein expression between tumor tissues and para-carcinoma tissues. A total of 6 nonsynonymous mutation genes were identified in MI-TCC, identified as copine VII, RNA binding motif protein, X-linked-like 3, acyl-CoA synthetase medium-chain family member 2A, HECW1, zinc finger protein 273 and trichohyalin. Furthermore, 5 cases were identified to possess a HECW1 gene mutation in 61 MI-TCC specimens, and all of these were point mutations located at exon 11 on chromosome 7. The mutation categories of HECW1 had 4 missense mutations and 1 nonsense mutation. IHC revealed that HECW1 protein was expressed at significantly increased levels in MI-TCC compared with normal bladder urothelium (P<0.001). The present study provided a novel approach for investigating genetic changes in the MI-TCC exome, and identified the novel mutant gene HECW1, which may possess a significant role in the pathogenesis of TCC.

  19. Whole-exome sequencing identifies novel MPL and JAK2 mutations in triple-negative myeloproliferative neoplasms

    PubMed Central

    Milosevic Feenstra, Jelena D.; Nivarthi, Harini; Gisslinger, Heinz; Leroy, Emilie; Rumi, Elisa; Chachoua, Ilyas; Bagienski, Klaudia; Kubesova, Blanka; Pietra, Daniela; Gisslinger, Bettina; Milanesi, Chiara; Jäger, Roland; Chen, Doris; Berg, Tiina; Schalling, Martin; Schuster, Michael; Bock, Christoph; Constantinescu, Stefan N.; Cazzola, Mario

    2016-01-01

    Essential thrombocythemia (ET) and primary myelofibrosis (PMF) are chronic diseases characterized by clonal hematopoiesis and hyperproliferation of terminally differentiated myeloid cells. The disease is driven by somatic mutations in exon 9 of CALR or exon 10 of MPL or JAK2-V617F in >90% of the cases, whereas the remaining cases are termed “triple negative.” We aimed to identify the disease-causing mutations in the triple-negative cases of ET and PMF by applying whole-exome sequencing (WES) on paired tumor and control samples from 8 patients. We found evidence of clonal hematopoiesis in 5 of 8 studied cases based on clonality analysis and presence of somatic genetic aberrations. WES identified somatic mutations in 3 of 8 cases. We did not detect any novel recurrent somatic mutations. In 3 patients with clonal hematopoiesis analyzed by WES, we identified a somatic MPL-S204P, a germline MPL-V285E mutation, and a germline JAK2-G571S variant. We performed Sanger sequencing of the entire coding region of MPL in 62, and of JAK2 in 49 additional triple-negative cases of ET or PMF. New somatic (T119I, S204F, E230G, Y591D) and 1 germline (R321W) MPL mutation were detected. All of the identified MPL mutations were gain-of-function when analyzed in functional assays. JAK2 variants were identified in 5 of 57 triple-negative cases analyzed by WES and Sanger sequencing combined. We could demonstrate that JAK2-V625F and JAK2-F556V are gain-of-function mutations. Our results suggest that triple-negative cases of ET and PMF do not represent a homogenous disease entity. Cases with polyclonal hematopoiesis might represent hereditary disorders. PMID:26423830

  20. Use of a Drosophila Genome-Wide Conserved Sequence Database to Identify Functionally Related cis-Regulatory Enhancers

    PubMed Central

    Brody, Thomas; Yavatkar, Amarendra S; Kuzin, Alexander; Kundu, Mukta; Tyson, Leonard J; Ross, Jermaine; Lin, Tzu-Yang; Lee, Chi-Hon; Awasaki, Takeshi; Lee, Tzumin; Odenwald, Ward F

    2012-01-01

    Background: Phylogenetic footprinting has revealed that cis-regulatory enhancers consist of conserved DNA sequence clusters (CSCs). Currently, there is no systematic approach for enhancer discovery and analysis that takes full-advantage of the sequence information within enhancer CSCs. Results: We have generated a Drosophila genome-wide database of conserved DNA consisting of >100,000 CSCs derived from EvoPrints spanning over 90% of the genome. cis-Decoder database search and alignment algorithms enable the discovery of functionally related enhancers. The program first identifies conserved repeat elements within an input enhancer and then searches the database for CSCs that score highly against the input CSC. Scoring is based on shared repeats as well as uniquely shared matches, and includes measures of the balance of shared elements, a diagnostic that has proven to be useful in predicting cis-regulatory function. To demonstrate the utility of these tools, a temporally-restricted CNS neuroblast enhancer was used to identify other functionally related enhancers and analyze their structural organization. Conclusions: cis-Decoder reveals that co-regulating enhancers consist of combinations of overlapping shared sequence elements, providing insights into the mode of integration of multiple regulating transcription factors. The database and accompanying algorithms should prove useful in the discovery and analysis of enhancers involved in any developmental process. Developmental Dynamics 241:169–189, 2012. © 2011 Wiley Periodicals, Inc. Key findings A genome-wide catalog of Drosophila conserved DNA sequence clusters. cis-Decoder discovers functionally related enhancers. Functionally related enhancers share balanced sequence element copy numbers. Many enhancers function during multiple phases of development. PMID:22174086

  1. High-Throughput Transcriptome Sequencing Identifies Candidate Genetic Modifiers of Vulnerability to Fetal Alcohol Spectrum Disorders

    PubMed Central

    Garic, Ana; Berres, Mark E.; Smith, Susan M.

    2014-01-01

    Introduction FASD is a leading cause of neurodevelopmental disability. Genetic factors can modify vulnerability to FASD, but these elements are poorly characterized. Methods We performed high-throughput transcriptional profiling to identify gene candidates that could potentially modify vulnerability to ethanol’s neurotoxicity. We interrogated a unique genetic resource, neuroprogenitor cells from two closely-related Gallus gallus lines having well-characterized robust or attenuated ethanol responses with respect to intracellular calcium mobilization and CaMKII / β-catenin-dependent apoptosis. Samples were not exposed to ethanol prior to analysis. Results We identified 363 differentially expressed genes in neuroprogenitors from these two lines. KEGG analysis revealed several gene clusters having significantly differential enrichment in gene expression. The largest and most significant cluster comprised ribosomal proteins (38 genes, p = 1.85 × 10−47). Other significantly enriched gene clusters included metabolism (25 genes, p = 0.0098), oxidative phosphorylation (18 genes, p = 1.10 × 10−11), spliceosome (13 genes, p = 7.02 × 10−8) and protein processing in the endoplasmic reticulum (9 genes, p = 0.0011). Inspection of GO-terms identified 24 genes involved in the calcium/β-catenin signals that mediate ethanol's neurotoxicity in this model, including β-catenin itself and both calmodulin isoforms. Conclusions Four of the identified pathways with altered transcript abundance mediate the flow of cellular information from RNA to protein. Importantly, ribosome biogenesis also senses nucleolar stress and regulates p53-mediated apoptosis in neural crest. Human ribosomopathies produce craniofacial malformations and eleven known ribosomopathy genes were differentially expressed in this model of neural crest apoptosis. Rapid changes in ribosome expression are consistently observed in ethanol-treated mouse embryo neural folds, a model that is developmentally similar

  2. Analytical Framework for Identifying and Differentiating Recent Hitchhiking and Severe Bottleneck Effects from Multi-Locus DNA Sequence Data

    SciTech Connect

    Sargsyan, Ori

    2012-05-25

    Hitchhiking and severe bottleneck effects have impact on the dynamics of genetic diversity of a population by inducing homogenization at a single locus and at the genome-wide scale, respectively. As a result, identification and differentiation of the signatures of such events from DNA sequence data at a single locus is challenging. This study develops an analytical framework for identifying and differentiating recent homogenization events at multiple neutral loci in low recombination regions. The dynamics of genetic diversity at a locus after a recent homogenization event is modeled according to the infinite-sites mutation model and the Wright-Fisher model of reproduction with constant population size. In this setting, I derive analytical expressions for the distribution, mean, and variance of the number of polymorphic sites in a random sample of DNA sequences from a locus affected by a recent homogenization event. Based on this framework, three likelihood-ratio based tests are presented for identifying and differentiating recent homogenization events at multiple loci. Lastly, I apply the framework to two data sets. First, I consider human DNA sequences from four non-coding loci on different chromosomes for inferring evolutionary history of modern human populations. The results suggest, in particular, that recent homogenization events at the loci are identifiable when the effective human population size is 50000 or greater in contrast to 10000, and the estimates of the recent homogenization events are agree with the “Out of Africa” hypothesis. Second, I use HIV DNA sequences from HIV-1-infected patients to infer the times of HIV seroconversions. The estimates are contrasted with other estimates derived as the mid-time point between the last HIV-negative and first HIV-positive screening tests. Finally, the results show that significant discrepancies can exist between the estimates.

  3. Analytical Framework for Identifying and Differentiating Recent Hitchhiking and Severe Bottleneck Effects from Multi-Locus DNA Sequence Data

    DOE PAGES

    Sargsyan, Ori

    2012-05-25

    Hitchhiking and severe bottleneck effects have impact on the dynamics of genetic diversity of a population by inducing homogenization at a single locus and at the genome-wide scale, respectively. As a result, identification and differentiation of the signatures of such events from DNA sequence data at a single locus is challenging. This study develops an analytical framework for identifying and differentiating recent homogenization events at multiple neutral loci in low recombination regions. The dynamics of genetic diversity at a locus after a recent homogenization event is modeled according to the infinite-sites mutation model and the Wright-Fisher model of reproduction withmore » constant population size. In this setting, I derive analytical expressions for the distribution, mean, and variance of the number of polymorphic sites in a random sample of DNA sequences from a locus affected by a recent homogenization event. Based on this framework, three likelihood-ratio based tests are presented for identifying and differentiating recent homogenization events at multiple loci. Lastly, I apply the framework to two data sets. First, I consider human DNA sequences from four non-coding loci on different chromosomes for inferring evolutionary history of modern human populations. The results suggest, in particular, that recent homogenization events at the loci are identifiable when the effective human population size is 50000 or greater in contrast to 10000, and the estimates of the recent homogenization events are agree with the “Out of Africa” hypothesis. Second, I use HIV DNA sequences from HIV-1-infected patients to infer the times of HIV seroconversions. The estimates are contrasted with other estimates derived as the mid-time point between the last HIV-negative and first HIV-positive screening tests. Finally, the results show that significant discrepancies can exist between the estimates.« less

  4. Amino acid sequences of two nonspecific lipid-transfer proteins from germinated castor bean.

    PubMed

    Takishima, K; Watanabe, S; Yamada, M; Suga, T; Mamiya, G

    1988-11-01

    The amino acid sequence of two nonspecific lipid-transfer proteins (nsLTP) B and C from germinated castor bean seeds have been determined. Both the proteins consist of 92 residues, as for nsLTP previously reported, and their calculated Mr values are 9847 and 9593 for nsLTP-B and nsLTP-C, respectively. The sequences of nsLTP-B and nsLTP-C, compared to the known sequence of nsLTP-A from the same source, are 68% and 35% similar, respectively. No variation was found at the positions of the cysteine residues, indicating that they might be involved in disulfide bridges.

  5. A classification of glycosyl hydrolases based on amino acid sequence similarities.

    PubMed Central

    Henrissat, B

    1991-01-01

    The amino acid sequences of 301 glycosyl hydrolases and related enzymes have been compared. A total of 291 sequences corresponding to 39 EC entries could be classified into 35 families. Only ten sequences (less than 5% of the sample) could not be assigned to any family. With the sequences available for this analysis, 18 families were found to be monospecific (containing only one EC number) and 17 were found to be polyspecific (containing at least two EC numbers). Implications on the folding characteristics and mechanism of action of these enzymes and on the evolution of carbohydrate metabolism are discussed. With the steady increase in sequence and structural data, it is suggested that the enzyme classification system should perhaps be revised. PMID:1747104

  6. Small RNA deep sequencing identifies novel and salt-stress-regulated microRNAs from roots of Medicago sativa and Medicago truncatula.

    PubMed

    Long, Rui-Cai; Li, Ming-Na; Kang, Jun-Mei; Zhang, Tie-Jun; Sun, Yan; Yang, Qing-Chuan

    2015-05-01

    Small 21- to 24-nucleotide (nt) ribonucleic acids (RNAs), notably the microRNA (miRNA), are emerging as a posttranscriptional regulation mechanism. Salt stress is one of the primary abiotic stresses that cause the crop losses worldwide. In saline lands, root growth and function of plant are determined by the action of environmental salt stress through specific genes that adapt root development to the restrictive condition. To elucidate the role of miRNAs in salt stress regulation in Medicago, we used a high-throughput sequencing approach to analyze four small RNA libraries from roots of Zhongmu-1 (Medicago sativa) and Jemalong A17 (Medicago truncatula), which were treated with 300 mM NaCl for 0 and 8 h. Each library generated about 20 million short sequences and contained predominantly small RNAs of 24-nt length, followed by 21-nt and 22-nt small RNAs. Using sequence analysis, we identified 385 conserved miRNAs from 96 families, along with 68 novel candidate miRNAs. Of all the 68 predicted novel miRNAs, 15 miRNAs were identified to have miRNA*. Statistical analysis on abundance of sequencing read revealed specific miRNA showing contrasting expression patterns between M. sativa and M. truncatula roots, as well as between roots treated for 0 and 8 h. The expression of 10 conserved and novel miRNAs was also quantified by quantitative real-time reverse transcription polymerase chain reaction (qRT-PCR). The miRNA precursor and target genes were predicted by bioinformatics analysis. We concluded that the salt stress related conserved and novel miRNAs may have a large variety of target mRNAs, some of which might play key roles in salt stress regulation of Medicago.

  7. Whole-exome sequencing identifies OR2W3 mutation as a cause of autosomal dominant retinitis pigmentosa

    PubMed Central

    Ma, Xiangyu; Guan, Liping; Wu, Wei; Zhang, Yao; Zheng, Wei; Gao, Yu-Tang; Long, Jirong; Wu, Na; Wu, Long; Xiang, Ying; Xu, Bin; Shen, Miaozhong; Chen, Yanhua; Wang, Yuewen; Yin, Ye; Li, Yingrui; Xu, Haiwei; Xu, Xun; Li, Yafei

    2015-01-01

    Retinitis pigmentosa (RP), a heterogeneous group of inherited ocular diseases, is a genetic condition that causes retinal degeneration and eventual vision loss. Though some genes have been identified to be associated with RP, still a large part of the clinical cases could not be explained. Here we reported a four-generation Chinese family with RP, during which 6 from 9 members of the second generation affected the disease. To identify the genetic defect in this family, whole-exome sequencing together with validation analysis by Sanger sequencing were performed to find possible pathogenic mutations. After a pipeline of database filtering, including public databases and in-house databases, a novel missense mutation, c. 424 C > T transition (p.R142W) in OR2W3 gene, was identified as a potentially causative mutation for autosomal dominant RP. The mutation co-segregated with the disease phenotype over four generations. This mutation was validated in another independent three-generation family. RT-PCR analysis also identified that OR2W3 gene was expressed in HESC-RPE cell line. The results will not only enhance our current understanding of the genetic basis of RP, but also provide helpful clues for designing future studies to further investigate genetic factors for familial RP. PMID:25783483

  8. Sequencing EVC and EVC2 identifies mutations in two-thirds of Ellis-van Creveld syndrome patients.

    PubMed

    Tompson, Stuart W J; Ruiz-Perez, Victor L; Blair, Helen J; Barton, Stephanie; Navarro, Victoria; Robson, Joanne L; Wright, Michael J; Goodship, Judith A

    2007-01-01

    Ellis-van Creveld syndrome (EvC) is caused by mutations in EVC and EVC2, genes in a divergent orientation separated by only 2.6 kb. We systematically sought mutations in both genes in a panel of 65 affected individuals to assess the proportion of cases resulting from mutations in each gene. We PCR amplified and sequenced the coding exons of both genes. We investigated mutations that could affect splicing by in vitro splicing assays and cDNA analysis. We have identified EVC mutations in 20 cases (31%); in all of these we have detected the mutation on each allele. We have identified EVC2 mutations in 25 cases (38%); in 22 of these we have isolated a mutation on each allele. The majority of the mutations introduce a premature termination codon. We sequenced the region between the two genes in 10 of the 20 cases in which we had not identified a mutation in either gene, revealing only one SNP that was not a common polymorphism. As we have not identified mutations in either gene in 20 cases (31%) it is possible that there is further genetic heterogeneity.

  9. Identifying genomic changes associated with insecticide resistance in the dengue mosquito Aedes aegypti by deep targeted sequencing.

    PubMed

    Faucon, Frederic; Dusfour, Isabelle; Gaude, Thierry; Navratil, Vincent; Boyer, Frederic; Chandre, Fabrice; Sirisopa, Patcharawan; Thanispong, Kanutcharee; Juntarajumnong, Waraporn; Poupardin, Rodolphe; Chareonviriyaphap, Theeraphap; Girod, Romain; Corbel, Vincent; Reynaud, Stephane; David, Jean-Philippe

    2015-09-01

    The capacity of mosquitoes to resist insecticides threatens the control of diseases such as dengue and malaria. Until alternative control tools are implemented, characterizing resistance mechanisms is crucial for managing resistance in natural populations. Insecticide biodegradation by detoxification enzymes is a common resistance mechanism; however, the genomic changes underlying this mechanism have rarely been identified, precluding individual resistance genotyping. In particular, the role of copy number variations (CNVs) and polymorphisms of detoxification enzymes have never been investigated at the genome level, although they can represent robust markers of metabolic resistance. In this context, we combined target enrichment with high-throughput sequencing for conducting the first comprehensive screening of gene amplifications and polymorphisms associated with insecticide resistance in mosquitoes. More than 760 candidate genes were captured and deep sequenced in several populations of the dengue mosquito Ae. aegypti displaying distinct genetic backgrounds and contrasted resistance levels to the insecticide deltamethrin. CNV analysis identified 41 gene amplifications associated with resistance, most affecting cytochrome P450s overtranscribed in resistant populations. Polymorphism analysis detected more than 30,000 variants and strong selection footprints in specific genomic regions. Combining Bayesian and allele frequency filtering approaches identified 55 nonsynonymous variants strongly associated with resistance. Both CNVs and polymorphisms were conserved within regions but differed across continents, confirming that genomic changes underlying metabolic resistance to insecticides are not universal. By identifying novel DNA markers of insecticide resistance, this study opens the way for tracking down metabolic changes developed by mosquitoes to resist insecticides within and among populations.

  10. Identifying genomic changes associated with insecticide resistance in the dengue mosquito Aedes aegypti by deep targeted sequencing

    PubMed Central

    Faucon, Frederic; Dusfour, Isabelle; Gaude, Thierry; Navratil, Vincent; Boyer, Frederic; Chandre, Fabrice; Sirisopa, Patcharawan; Thanispong, Kanutcharee; Juntarajumnong, Waraporn; Poupardin, Rodolphe; Chareonviriyaphap, Theeraphap; Girod, Romain; Corbel, Vincent; Reynaud, Stephane; David, Jean-Philippe

    2015-01-01

    The capacity of mosquitoes to resist insecticides threatens the control of diseases such as dengue and malaria. Until alternative control tools are implemented, characterizing resistance mechanisms is crucial for managing resistance in natural populations. Insecticide biodegradation by detoxification enzymes is a common resistance mechanism; however, the genomic changes underlying this mechanism have rarely been identified, precluding individual resistance genotyping. In particular, the role of copy number variations (CNVs) and polymorphisms of detoxification enzymes have never been investigated at the genome level, although they can represent robust markers of metabolic resistance. In this context, we combined target enrichment with high-throughput sequencing for conducting the first comprehensive screening of gene amplifications and polymorphisms associated with insecticide resistance in mosquitoes. More than 760 candidate genes were captured and deep sequenced in several populations of the dengue mosquito Ae. aegypti displaying distinct genetic backgrounds and contrasted resistance levels to the insecticide deltamethrin. CNV analysis identified 41 gene amplifications associated with resistance, most affecting cytochrome P450s overtranscribed in resistant populations. Polymorphism analysis detected more than 30,000 variants and strong selection footprints in specific genomic regions. Combining Bayesian and allele frequency filtering approaches identified 55 nonsynonymous variants strongly associated with resistance. Both CNVs and polymorphisms were conserved within regions but differed across continents, confirming that genomic changes underlying metabolic resistance to insecticides are not universal. By identifying novel DNA markers of insecticide resistance, this study opens the way for tracking down metabolic changes developed by mosquitoes to resist insecticides within and among populations. PMID:26206155

  11. Complete amino acid sequence of the N-terminal extension of calf skin type III procollagen.

    PubMed Central

    Brandt, A; Glanville, R W; Hörlein, D; Bruckner, P; Timpl, R; Fietzek, P P; Kühn, K

    1984-01-01

    The N-terminal extension peptide of type III procollagen, isolated from foetal-calf skin, contains 130 amino acid residues. To determine its amino acid sequence, the peptide was reduced and carboxymethylated or aminoethylated and fragmented with trypsin, Staphylococcus aureus V8 proteinase and bacterial collagenase. Pyroglutamate aminopeptidase was used to deblock the N-terminal collagenase fragment to enable amino acid sequencing. The type III collagen extension peptide is homologous to that of the alpha 1 chain of type I procollagen with respect to a three-domain structure. The N-terminal 79 amino acids, which contain ten of the 12 cysteine residues, form a compact globular domain. The next 39 amino acids are in a collagenase triplet sequence (Gly- Xaa - Yaa )n with a high hydroxyproline content. Finally, another short non-collagenous domain of 12 amino acids ends at the cleavage site for procollagen aminopeptidase, which cleaves a proline-glutamine bond. In contrast with type I procollagen, the type III procollagen extension peptides contain interchain disulphide bridges located at the C-terminus of the triple-helical domain. PMID:6331392

  12. Bioinformatic analysis of neurotropic HIV envelope sequences identifies polymorphisms in the gp120 bridging sheet that increase macrophage-tropism through enhanced interactions with CCR5

    SciTech Connect

    Mefford, Megan E.; Kunstman, Kevin; Wolinsky, Steven M.; Gabuzda, Dana

    2015-07-15

    Macrophages express low levels of the CD4 receptor compared to T-cells. Macrophage-tropic HIV strains replicating in brain of untreated patients with HIV-associated dementia (HAD) express Envs that are adapted to overcome this restriction through mechanisms that are poorly understood. Here, bioinformatic analysis of env sequence datasets together with functional studies identified polymorphisms in the β3 strand of the HIV gp120 bridging sheet that increase M-tropism. D197, which results in loss of an N-glycan located near the HIV Env trimer apex, was detected in brain in some HAD patients, while position 200 was estimated to be under positive selection. D197 and T/V200 increased fusion and infection of cells expressing low CD4 by enhancing gp120 binding to CCR5. These results identify polymorphisms in the HIV gp120 bridging sheet that overcome the restriction to macrophage infection imposed by low CD4 through enhanced gp120–CCR5 interactions, thereby promoting infection of brain and other macrophage-rich tissues. - Highlights: • We analyze HIV Env sequences and identify amino acids in beta 3 of the gp120 bridging sheet that enhance macrophage tropism. • These amino acids at positions 197 and 200 are present in brain of some patients with HIV-associated dementia. • D197 results in loss of a glycan near the HIV Env trimer apex, which may increase exposure of V3. • These variants may promote infection of macrophages in the brain by enhancing gp120–CCR5 interactions.

  13. Phenotypic chemical screening using a zebrafish neural crest EMT reporter identifies retinoic acid as an inhibitor of epithelial morphogenesis

    PubMed Central

    Jimenez, Laura; Wang, Jindong; Morrison, Monique A.; Whatcott, Clifford; Soh, Katherine K.; Warner, Steven; Bearss, David; Jette, Cicely A.; Stewart, Rodney A.

    2016-01-01

    ABSTRACT The epithelial-to-mesenchymal transition (EMT) is a highly conserved morphogenetic program essential for embryogenesis, regeneration and cancer metastasis. In cancer cells, EMT also triggers cellular reprogramming and chemoresistance, which underlie disease relapse and decreased survival. Hence, identifying compounds that block EMT is essential to prevent or eradicate disseminated tumor cells. Here, we establish a whole-animal-based EMT reporter in zebrafish for rapid drug screening, called Tg(snai1b:GFP), which labels epithelial cells undergoing EMT to produce sox10-positive neural crest (NC) cells. Time-lapse and lineage analysis of Tg(snai1b:GFP) embryos reveal that cranial NC cells delaminate from two regions: an early population delaminates adjacent to the neural plate, whereas a later population delaminates from within the dorsal neural tube. Treating Tg(snai1b:GFP) embryos with candidate small-molecule EMT-inhibiting compounds identified TP-0903, a multi-kinase inhibitor that blocked cranial NC cell delamination in both the lateral and medial populations. RNA sequencing (RNA-Seq) analysis and chemical rescue experiments show that TP-0903 acts through stimulating retinoic acid (RA) biosynthesis and RA-dependent transcription. These studies identify TP-0903 as a new therapeutic for activating RA in vivo and raise the possibility that RA-dependent inhibition of EMT contributes to its prior success in eliminating disseminated cancer cells. PMID:26794130

  14. Metagenomic sequencing of bile from gallstone patients to identify different microbial community patterns and novel biliary bacteria.

    PubMed

    Shen, Hongzhang; Ye, Fuqiang; Xie, Lu; Yang, Jianfeng; Li, Zhen; Xu, Peisong; Meng, Fei; Li, Lei; Chen, Ying; Bo, Xiaochen; Ni, Ming; Zhang, Xiaofeng

    2015-12-02

    Despite the high worldwide prevalence of gallstone disease, the role of the biliary microbiota in gallstone pathogenesis remains obscure. Next-generation sequencing offers advantages for systematically understanding the human microbiota; however, there have been few such investigations of the biliary microbiome. Here, we performed whole-metagenome shotgun (WMS) sequencing and 16S rRNA sequencing on bile samples from 15 Chinese patients with gallstone disease. Microbial communities of most individuals were clustered into two types, according to the relative enrichment of different intestinal bacterial species. In the bile samples, oral cavity/respiratory tract inhabitants were more prevalent than intestinal inhabitants and existed in both community types. Unexpectedly, the two types were not associated with fever status or surgical history, and many bacteria were patient-specific. We identified 13 novel biliary bacteria based on WMS sequencing, as well as genes encoding putative proteins related to gallstone formation and bile resistance (e.g., β-glucuronidase and multidrug efflux pumps). Bile samples from gallstone patients had reduced microbial diversity compared to healthy faecal samples. Patient samples were enriched in pathways related to oxidative stress and flagellar assembly, whereas carbohydrate metabolic pathways showed varying behaviours. As the first biliary WMS survey, our study reveals the complexity and specificity of biliary microecology.

  15. 37 CFR 1.824 - Form and format for nucleotide and/or amino acid sequence submissions in computer readable form.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... nucleotide and/or amino acid sequence submissions in computer readable form. 1.824 Section 1.824 Patents... And/or Amino Acid Sequences § 1.824 Form and format for nucleotide and/or amino acid sequence... readable form may be created by any means, such as word processors, nucleotide/amino acid sequence...

  16. 37 CFR 1.824 - Form and format for nucleotide and/or amino acid sequence submissions in computer readable form.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... nucleotide and/or amino acid sequence submissions in computer readable form. 1.824 Section 1.824 Patents... And/or Amino Acid Sequences § 1.824 Form and format for nucleotide and/or amino acid sequence... readable form may be created by any means, such as word processors, nucleotide/amino acid sequence...

  17. 37 CFR 1.824 - Form and format for nucleotide and/or amino acid sequence submissions in computer readable form.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... nucleotide and/or amino acid sequence submissions in computer readable form. 1.824 Section 1.824 Patents... And/or Amino Acid Sequences § 1.824 Form and format for nucleotide and/or amino acid sequence... readable form may be created by any means, such as word processors, nucleotide/amino acid sequence...

  18. Hybridization-based antibody cDNA recovery for the production of recombinant antibodies identified by repertoire sequencing.

    PubMed

    Valdés-Alemán, Javier; Téllez-Sosa, Juan; Ovilla-Muñoz, Marbella; Godoy-Lozano, Elizabeth; Velázquez-Ramírez, Daniel; Valdovinos-Torres, Humberto; Gómez-Barreto, Rosa E; Martinez-Barnetche, Jesús

    2014-01-01

    High-throughput sequencing of the antibody repertoire is enabling a thorough analysis of B cell diversity and clonal selection, which may improve the novel antibody discovery process. Theoretically, an adequate bioinformatic analysis could allow identification of candidate antigen-specific antibodies, requiring their recombinant production for experimental validation of their specificity. Gene synthesis is commonly used for the generation of recombinant antibodies identified in silico. Novel strategies that bypass gene synthesis could offer more accessible antibody identification and validation alternatives. We developed a hybridization-based recovery strategy that targets the complementarity-determining region 3 (CDRH3) for the enrichment of cDNA of candidate antigen-specific antibody sequences. Ten clonal groups of interest were identified through bioinformatic analysis of the heavy chain antibody repertoire of mice immunized with hen egg white lysozyme (HEL). cDNA from eight of the targeted clonal groups was recovered efficiently, leading to the generation of recombinant antibodies. One representative heavy chain sequence from each clonal group recovered was paired with previously reported anti-HEL light chains to generate full antibodies, later tested for HEL-binding capacity. The recovery process proposed represents a simple and scalable molecular strategy that could enhance antibody identification and specificity assessment, enabling a more cost-efficient generation of recombinant antibodies.

  19. Comparative sequence analysis of enteroaggregative Escherichia coli heat-stable enterotoxin 1 identified in Korean and Japanese Escherichia coli strains.

    PubMed

    Seo, Dong Joo; Choi, SunKeum; Jeon, Su Been; Jeong, Suntak; Park, Hyunkyung; Lee, Bog-Hieu; Kim, Geun-Bae; Yang, Soo-Jin; Nishikawa, Yoshikazu; Choi, Changsun

    2017-02-21

    The aim of this study was to compare the sequence of the astA gene found in 8 Korean and 11 Japanese Escherichia coli isolates. Conventional PCR was used to amplify the astA gene from the chromosomal and plasmid DNA preparation samples of each isolate using commercial DNA extraction kits. Cloning of the PCR products, sequence analysis, and pulse field gel electrophoresis (PFGE) were sequentially performed. An identical copy of astA in each isolate were found for 8 Korean and 8 Japanese E. coli strains isolated from bovine, porcine, and healthy human carriers. Among these, 1 Korean and 4 Japanese isolates carried a stop mutation at residue 16. Three Japanese outbreak strains (V199, V638, and 96-127-23) carried multiple clones of astA gene with multiple amino acids changes at residues 11, 16, 20, 23, 30, 33, and 34. Compared with the non-diarrheal isolates, clonal diversity and sequence variations of the astA gene in outbreak isolates may be associated with virulence potential of EAST1.

  20. MINT: software to identify motifs and short-range interactions in trajectories of nucleic acids

    PubMed Central

    Górska, Anna; Jasiński, Maciej; Trylska, Joanna

    2015-01-01

    Structural biology experiments and structure prediction tools have provided many high-resolution three-dimensional structures of nucleic acids. Also, molecular dynamics force field parameters have been adapted to simulating charged and flexible nucleic acid structures on microsecond time scales. Therefore, we can generate the dynamics of DNA or RNA molecules, but we still lack adequate tools for the analysis of the resulting huge amounts of data. We present MINT (Motif Identifier for Nucleic acids Trajectory) — an automatic tool for analyzing three-dimensional structures of RNA and DNA, and their full-atom molecular dynamics trajectories or other conformation sets (e.g. X-ray or nuclear magnetic resonance-derived structures). For each RNA or DNA conformation MINT determines the hydrogen bonding network resolving the base pairing patterns, identifies secondary structure motifs (helices, junctions, loops, etc.) and pseudoknots. MINT also estimates the energy of stacking and phosphate anion-base interactions. For many conformations, as in a molecular dynamics trajectory, MINT provides averages of the above structural and energetic features and their evolution. We show MINT functionality based on all-atom explicit solvent molecular dynamics trajectory of the 30S ribosomal subunit. PMID:26024667

  1. Complete amino acid sequence of branched-chain amino acid aminotransferase (transaminase B) of Salmonella typhimurium, identification of the coenzyme-binding site and sequence comparison analysis

    SciTech Connect

    Feild, M.J.

    1988-01-01

    The complete amino acid sequence of the subunit of branched-chain amino acid aminotransferase of Salmonella typhimurium was determined by automated Edman degradation of peptide fragments generated by chemical and enzymatic digestion of S-carboxymethylated and S-pyridylethylated transaminase B. Peptide fragments of transaminase B were generated by treatment of the enzyme with trypsin, Staphylococcus aureus V8 protease, endoproteinase Lys-C, and cyanogen bromide. Protocols were developed for separation of the peptide fragments by reverse-phase high performance liquid chromatography (HPLC), ion-exchange HPLC, and SDS-urea gel electrophoresis. The enzyme subunit contains 308 amino acid residues and has a molecular weight of 33,920 daltons. The coenzyme-binding site was determined by treatment of the enzyme, containing bound pyridoxal 5-phosphate, with tritiated sodium borohydride prior to trypsin digestion. Monitoring radioactivity incorporation and peptide map comparisons with an apoenzyme tryptic digest, allowed identification of the pyridoxylated-peptide which was isolated by reverse-phase HPLC and sequenced. The coenzyme-binding site is a lysyl residue at position 159. Some peptides were further characterized by fast atom bombardment mass spectrometry.

  2. Clinical next generation sequencing to identify actionable aberrations in a phase I program

    PubMed Central

    Boland, Genevieve M.; Piha-Paul, Sarina A.; Subbiah, Vivek; Routbort, Mark; Herbrich, Shelley M.; Baggerly, Keith; Patel, Keyur P.; Brusco, Lauren; Horombe, Chacha; Naing, Aung; Fu, Siqing; Hong, David S.; Janku, Filip; Johnson, Amber; Broaddus, Russell; Luthra, Raja; Shaw, Kenna; Mendelsohn, John; Mills, Gordon B.; Meric-Bernstam, Funda

    2015-01-01

    Purpose We determined the frequency of recurrent hotspot mutations in 46 cancer-related genes across tumor histologies in patients with advanced cancer. Methods We reviewed data from 500 consecutive patients who underwent genomic profiling on an IRB-approved prospective clinical protocol in the Phase I program at the MD Anderson Cancer Center. Archival tumor DNA was tested for 740 hotspot mutations in 46 genes (Ampli-Seq Cancer Panel; Life Technologies, CA). Results Of the 500 patients, 362 had at least one reported mutation/variant. The most common likely somatic mutations were within TP53 (36%), KRAS (11%), and PIK3CA (9%) genes. Sarcoma (20%) and kidney (30%) had the lowest proportion of likely somatic mutations detected, while pancreas (100%), colorectal (89%), melanoma (86%), and endometrial (75%) had the highest. There was high concordance in 62 patients with paired primary tumors and metastases analyzed. 151 (30%) patients had alterations in potentially actionable genes. 37 tumor types were enrolled; both rare actionable mutations in common tumor types and actionable mutations in rare tumor types were identified. Conclusion Multiplex testing in the CLIA environment facilitates genomic characterization across multiple tumor lineages and identification of novel opportunities for genotype-driven trials. PMID:26015395

  3. High-throughput genomic sequencing of cassava bacterial blight strains identifies conserved effectors to target for durable resistance.

    PubMed

    Bart, Rebecca; Cohn, Megan; Kassen, Andrew; McCallum, Emily J; Shybut, Mikel; Petriello, Annalise; Krasileva, Ksenia; Dahlbeck, Douglas; Medina, Cesar; Alicai, Titus; Kumar, Lava; Moreira, Leandro M; Rodrigues Neto, Júlio; Verdier, Valerie; Santana, María Angélica; Kositcharoenkul, Nuttima; Vanderschuren, Hervé; Gruissem, Wilhelm; Bernal, Adriana; Staskawicz, Brian J

    2012-07-10

    Cassava bacterial blight (CBB), incited by Xanthomonas axonopodis pv. manihotis (Xam), is the most important bacterial disease of cassava, a staple food source for millions of people in developing countries. Here we present a widely applicable strategy for elucidating the virulence components of a pathogen population. We report Illumina-based draft genomes for 65 Xam strains and deduce the phylogenetic relatedness of Xam across the areas where cassava is grown. Using an extensive database of effector proteins from animal and plant pathogens, we identify the effector repertoire for each sequenced strain and use a comparative sequence analysis to deduce the least polymorphic of the conserved effectors. These highly conserved effectors have been maintained over 11 countries, three continents, and 70 y of evolution and as such represent ideal targets for developing resistance strategies.

  4. Long Interspersed Element Sequencing (L1-Seq): A Method to Identify Somatic LINE-1 Insertions in the Human Genome

    PubMed Central

    Doucet, Tara T.; Kazazian, Haig H.

    2017-01-01

    L1-seq is a high-throughput sequencing technique which is utilized to identify novel L1 insertions in genomic DNA samples of interest. Using special diagnostic nucleotides unique to the youngest and most active L1 sequence, we can amplify new somatic insertions. This technique has helped to establish the number of L1 insertions present in the general population as well as the variation among individuals with regard to their complement of active L1 elements. More recently, this technique has been employed to assess the level of retrotransposition occurring in various diseases such as cancer. These efforts try to establish a connection between the process of retrotransposition and disease development and/or progression. PMID:26895047

  5. Transcriptome Sequencing Identified Genes and Gene Ontologies Associated with Early Freezing Tolerance in Maize

    PubMed Central

    Li, Zhao; Hu, Guanghui; Liu, Xiangfeng; Zhou, Yao; Li, Yu; Zhang, Xu; Yuan, Xiaohui; Zhang, Qian; Yang, Deguang; Wang, Tianyu; Zhang, Zhiwu

    2016-01-01

    Originating in a tropical climate, maize has faced great challenges as cultivation has expanded to the majority of the world's temperate zones. In these zones, frost and cold temperatures are major factors that prevent maize from reaching its full yield potential. Among 30 elite maize inbred lines adapted to northern China, we identified two lines of extreme, but opposite, freezing tolerance levels—highly tolerant and highly sensitive. During the seedling stage of these two lines, we used RNA-seq to measure changes in maize whole genome transcriptome before and after freezing treatment. In total, 19,794 genes were expressed, of which 4550 exhibited differential expression due to either treatment (before or after freezing) or line type (tolerant or sensitive). Of the 4550 differently expressed genes, 948 exhibited differential expression due to treatment within line or lines under freezing condition. Analysis of gene ontology found that these 948 genes were significantly enriched for binding functions (DNA binding, ATP binding, and metal ion binding), protein kinase activity, and peptidase activity. Based on their enrichment, literature support, and significant levels of differential expression, 30 of these 948 genes were selected for quantitative real-time PCR (qRT-PCR) validation. The validation confirmed our RNA-Seq-based findings, with squared correlation coefficients of 80% and 50% in the tolerance and sensitive lines, respectively. This study provided valuable resources for further studies to enhance understanding of the molecular mechanisms underlying maize early freezing response and enable targeted breeding strategies for developing varieties with superior frost resistance to achieve yield potential. PMID:27774095

  6. The amino acid sequence of cytochromes c-551 from three species of Pseudomonas

    PubMed Central

    Ambler, R. P.; Wynn, Margaret

    1973-01-01

    The amino acid sequences of the cytochromes c-551 from three species of Pseudomonas have been determined. Each resembles the protein from Pseudomonas strain P6009 (now known to be Pseudomonas aeruginosa, not Pseudomonas fluorescens) in containing 82 amino acids in a single peptide chain, with a haem group covalently attached to cysteine residues 12 and 15. In all four sequences 43 residues are identical. Although by bacteriological criteria the organisms are closely related, the differences between pairs of sequences range from 22% to 39%. These values should be compared with the differences in the sequence of mitochondrial cytochrome c between mammals and amphibians (about 18%) or between mammals and insects (about 33%). Detailed evidence for the amino acid sequences of the proteins has been deposited as Supplementary Publication SUP 50015 at the National Lending Library for Science and Technology, Boston Spa, Yorks. LS23 7BQ, U.K., from whom copies can be obtained on the terms indicated in Biochem. J. (1973), 131, 5. PMID:4352718

  7. Draft Genome Sequence of Sorghum Grain Mold Fungus Epicoccum sorghinum, a Producer of Tenuazonic Acid

    PubMed Central

    Oliveira, Rodrigo C.; Davenport, Karen W.; Hovde, Blake; Silva, Danielle; Chain, Patrick S. G.; Correa, Benedito

    2017-01-01

    ABSTRACT The facultative plant pathogen Epicoccum sorghinum is associated with grain mold of sorghum and produces the mycotoxin tenuazonic acid. This fungus can have serious economic impact on sorghum production. Here, we report the draft genome sequence of E. sorghinum (USPMTOX48). PMID:28126937

  8. Snake venom. The amino acid sequence of protein A from Dendroaspis polylepis polylepis (black mamba) venom.

    PubMed

    Joubert, F J; Strydom, D J

    1980-12-01

    Protein A from Dendroaspis polylepis polylepis venom comprises 81 amino acids, including ten half-cystine residues. The complete primary structures of protein A and its variant A' were elucidated. The sequences of proteins A and A', which differ in a single position, show no homology with various neurotoxins and non-neurotoxic proteins and represent a new type of elapid venom protein.

  9. Draft Genome Sequence of Bacillus coagulans NL01, a Wonderful l-Lactic Acid Producer

    PubMed Central

    Zheng, Zhaojuan; Jiang, Ting; Lin, Xi; Zhou, Jie

    2015-01-01

    Here, we report the draft genome sequence of Bacillus coagulans NL01, which could produce high optically pure l-lactic acid using xylose as a sole carbon source. The draft genome is 3,505,081 bp, with 144 contigs. About 3,903 protein-coding genes and 92 rRNAs are predicted from this assembly. PMID:26089419

  10. Next-generation sequencing in routine brain tumor diagnostics enables an integrated diagnosis and identifies actionable targets.

    PubMed

    Sahm, Felix; Schrimpf, Daniel; Jones, David T W; Meyer, Jochen; Kratz, Annekathrin; Reuss, David; Capper, David; Koelsche, Christian; Korshunov, Andrey; Wiestler, Benedikt; Buchhalter, Ivo; Milde, Till; Selt, Florian; Sturm, Dominik; Kool, Marcel; Hummel, Manuela; Bewerunge-Hudler, Melanie; Mawrin, Christian; Schüller, Ulrich; Jungk, Christine; Wick, Antje; Witt, Olaf; Platten, Michael; Herold-Mende, Christel; Unterberg, Andreas; Pfister, Stefan M; Wick, Wolfgang; von Deimling, Andreas

    2016-06-01

    With the number of prognostic and predictive genetic markers in neuro-oncology steadily growing, the need for comprehensive molecular analysis of neuropathology samples has vastly increased. We therefore developed a customized enrichment/hybrid-capture-based next-generation sequencing (NGS) gene panel comprising the entire coding and selected intronic and promoter regions of 130 genes recurrently altered in brain tumors, allowing for the detection of single nucleotide variations, fusions, and copy number aberrations. Optimization of probe design, library generation and sequencing conditions on 150 samples resulted in a 5-workday routine workflow from the formalin-fixed paraffin-embedded sample to neuropathological report. This protocol was applied to 79 retrospective cases with established molecular aberrations for validation and 71 prospective cases for discovery of potential therapeutic targets. Concordance of NGS compared to established, single biomarker methods was 98.0 %, with discrepancies resulting from one case where a TERT promoter mutation was not called by NGS and three ATRX mutations not being detected by Sanger sequencing. Importantly, in samples with low tumor cell content, NGS was able to identify mutant alleles that were not detectable by traditional methods. Information derived from NGS data identified potential targets for experimental therapy in 37/47 (79 %) glioblastomas, 9/10 (90 %) pilocytic astrocytomas, and 5/14 (36 %) medulloblastomas in the prospective target discovery cohort. In conclusion, we present the settings for high-throughput, adaptive next-generation sequencing in routine neuropathology diagnostics. Such an approach will likely become highly valuable in the near future for treatment decision making, as more therapeutic targets emerge and genetic information enters the classification of brain tumors.

  11. Characterization of the dead ringer gene identifies a novel, highly conserved family of sequence-specific DNA-binding proteins.

    PubMed Central

    Gregory, S L; Kortschak, R D; Kalionis, B; Saint, R

    1996-01-01

    We reported the identification of a new family of DNA-binding proteins from our characterization of the dead ringer (dri) gene of Drosophila melanogaster. We show that dri encodes a nuclear protein that contains a sequence-specific DNA-binding domain that bears no similarity to known DNA-binding domains. A number of proteins were found to contain sequences homologous to this domain. Other proteins containing the conserved motif include yeast SWI1, two human retinoblastoma binding proteins, and other mammalian regulatory proteins. A mouse B-cell-specific regulator exhibits 75% identity with DRI over the 137-amino-acid DNA-binding domains of these proteins, indicating a high degree of conservation of this domain. Gel retardation and optimal binding site screens revealed that the in vitro sequence specificity of DRI is strikingly similar to that of many homeodomain proteins, although the sequence and predicted secondary structure do not resemble a homeodomain. The early general expression of dri and the similarity of DRI and homeodomain in vitro DNA-binding specificity compound the problem of understanding the in vivo specificity of action of these proteins. Maternally derived dri product is found throughout the embryo until germ band extension, when dri is expressed in a developmentally regulated set of tissues, including salivary gland ducts, parts of the gut, and a subset of neural cells. The discovery of this new, conserved DNA-binding domain offers an explanation for the regulatory activity of several important members of th