Sample records for acid sequence identified

  1. Method for identifying and quantifying nucleic acid sequence aberrations

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1998-01-01

    A method for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe.

  2. Method for identifying and quantifying nucleic acid sequence aberrations

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1998-07-21

    A method is disclosed for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe. 11 figs.

  3. Composition for nucleic acid sequencing

    DOEpatents

    Korlach, Jonas [Ithaca, NY; Webb, Watt W [Ithaca, NY; Levene, Michael [Ithaca, NY; Turner, Stephen [Ithaca, NY; Craighead, Harold G [Ithaca, NY; Foquet, Mathieu [Ithaca, NY

    2008-08-26

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  4. Method for sequencing nucleic acid molecules

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2006-06-06

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  5. Method for sequencing nucleic acid molecules

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2006-05-30

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  6. Evolutionary Distance of Amino Acid Sequence Orthologs across Macaque Subspecies: Identifying Candidate Genes for SIV Resistance in Chinese Rhesus Macaques

    PubMed Central

    Ross, Cody T.; Roodgar, Morteza; Smith, David Glenn

    2015-01-01

    We use the Reciprocal Smallest Distance (RSD) algorithm to identify amino acid sequence orthologs in the Chinese and Indian rhesus macaque draft sequences and estimate the evolutionary distance between such orthologs. We then use GOanna to map gene function annotations and human gene identifiers to the rhesus macaque amino acid sequences. We conclude methodologically by cross-tabulating a list of amino acid orthologs with large divergence scores with a list of genes known to be involved in SIV or HIV pathogenesis. We find that many of the amino acid sequences with large evolutionary divergence scores, as calculated by the RSD algorithm, have been shown to be related to HIV pathogenesis in previous laboratory studies. Four of the strongest candidate genes for SIVmac resistance in Chinese rhesus macaques identified in this study are CDK9, CXCL12, TRIM21, and TRIM32. Additionally, ANKRD30A, CTSZ, GORASP2, GTF2H1, IL13RA1, MUC16, NMDAR1, Notch1, NT5M, PDCD5, RAD50, and TM9SF2 were identified as possible candidates, among others. We failed to find many laboratory experiments contrasting the effects of Indian and Chinese orthologs at these sites on SIVmac pathogenesis, but future comparative studies might hold fertile ground for research into the biological mechanisms underlying innate resistance to SIVmac in Chinese rhesus macaques. PMID:25884674

  7. Dipeptide Sequence Determination: Analyzing Phenylthiohydantoin Amino Acids by HPLC

    NASA Astrophysics Data System (ADS)

    Barton, Janice S.; Tang, Chung-Fei; Reed, Steven S.

    2000-02-01

    Amino acid composition and sequence determination, important techniques for characterizing peptides and proteins, are essential for predicting conformation and studying sequence alignment. This experiment presents improved, fundamental methods of sequence analysis for an upper-division biochemistry laboratory. Working in pairs, students use the Edman reagent to prepare phenylthiohydantoin derivatives of amino acids for determination of the sequence of an unknown dipeptide. With a single HPLC technique, students identify both the N-terminal amino acid and the composition of the dipeptide. This method yields good precision of retention times and allows use of a broad range of amino acids as components of the dipeptide. Students learn fundamental principles and techniques of sequence analysis and HPLC.

  8. High speed nucleic acid sequencing

    DOEpatents

    Korlach, Jonas [Ithaca, NY; Webb, Watt W [Ithaca, NY; Levene, Michael [Ithaca, NY; Turner, Stephen [Ithaca, NY; Craighead, Harold G [Ithaca, NY; Foquet, Mathieu [Ithaca, NY

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid. Each type of labeled nucleotide comprises an acceptor fluorophore attached to a phosphate portion of the nucleotide such that the fluorophore is removed upon incorporation into a growing strand. Fluorescent signal is emitted via fluorescent resonance energy transfer between the donor fluorophore and the acceptor fluorophore as each nucleotide is incorporated into the growing strand. The sequence is deduced by identifying which base is being incorporated into the growing strand.

  9. Chip-based sequencing nucleic acids

    DOEpatents

    Beer, Neil Reginald

    2014-08-26

    A system for fast DNA sequencing by amplification of genetic material within microreactors, denaturing, demulsifying, and then sequencing the material, while retaining it in a PCR/sequencing zone by a magnetic field. One embodiment includes sequencing nucleic acids on a microchip that includes a microchannel flow channel in the microchip. The nucleic acids are isolated and hybridized to magnetic nanoparticles or to magnetic polystyrene-coated beads. Microreactor droplets are formed in the microchannel flow channel. The microreactor droplets containing the nucleic acids and the magnetic nanoparticles are retained in a magnetic trap in the microchannel flow channel and sequenced.

  10. Solid phase sequencing of double-stranded nucleic acids

    DOEpatents

    Fu, Dong-Jing; Cantor, Charles R.; Koster, Hubert; Smith, Cassandra L.

    2002-01-01

    This invention relates to methods for detecting and sequencing of target double-stranded nucleic acid sequences, to nucleic acid probes and arrays of probes useful in these methods, and to kits and systems which contain these probes. Useful methods involve hybridizing the nucleic acids or nucleic acids which represent complementary or homologous sequences of the target to an array of nucleic acid probes. These probe comprise a single-stranded portion, an optional double-stranded portion and a variable sequence within the single-stranded portion. The molecular weights of the hybridized nucleic acids of the set can be determined by mass spectroscopy, and the sequence of the target determined from the molecular weights of the fragments. Nucleic acids whose sequences can be determined include nucleic acids in biological samples such as patient biopsies and environmental samples. Probes may be fixed to a solid support such as a hybridization chip to facilitate automated determination of molecular weights and identification of the target sequence.

  11. Identifying a base in a nucleic acid

    DOEpatents

    Fodor, Stephen P. A.; Lipshutz, Robert J.; Huang, Xiaohua

    2005-02-08

    Devices and techniques for hybridization of nucleic acids and for determining the sequence of nucleic acids. Arrays of nucleic acids are formed by techniques, preferably high resolution, light-directed techniques. Positions of hybridization of a target nucleic acid are determined by, e.g., epifluorescence microscopy. Devices and techniques are proposed to determine the sequence of a target nucleic acid more efficiently and more quickly through such synthesis and detection techniques.

  12. Development of chemiluminescent probe hybridization, RT-PCR and nucleic acid cycle sequencing assays of Sabin type 3 isolates to identify base pair 472 Sabin type 3 mutants associated with vaccine associated paralytic poliomyelitis.

    PubMed

    Old, M O; Logan, L H; Maldonado, Y A

    1997-11-01

    Sabin type 3 polio vaccine virus is the most common cause of poliovaccine associated paralytic poliomyelitis. Vaccine associated paralytic poliomyelitis cases have been associated with Sabin type 3 revertants containing a single U to C substitution at bp 472 of Sabin type 3. A rapid method of identification of Sabin type 3 bp 472 mutants is described. An enterovirus group-specific probe for use in a chemiluminescent dot blot hybridization assay was developed to identify enterovirus positive viral lysates. A reverse transcription-polymerase chain reaction (RT-PCR) assay producing a 319 bp PCR product containing the Sabin type 3 bp 472 mutation site was then employed to identify Sabin type 3 isolates. Chemiluminescent nucleic acid cycle sequencing of the purified 319 bp PCR product was then employed to identify nucleic acid sequences at bp 472. The enterovirus group probe hybridization procedure and isolation of the Sabin type 3 PCR product were highly sensitive and specific; nucleic acid cycle sequencing corresponded to the known sequence of stock Sabin type 3 isolates. These methods will be used to identify the Sabin type 3 reversion rate from sequential stool samples of infants obtained after the first and second doses of oral poliovirus vaccine.

  13. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion

    PubMed Central

    Thomsen, Martin Christen Frølund; Nielsen, Morten

    2012-01-01

    Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally valuable information related to amino acid depletion. Seq2logo aims at resolving these issues allowing the user to include sequence weighting to correct for data redundancy, pseudo counts to correct for low number of observations and different logotype representations each capturing different aspects related to amino acid enrichment and depletion. Besides allowing input in the format of peptides and MSA, Seq2Logo accepts input as Blast sequence profiles, providing easy access for non-expert end-users to characterize and identify functionally conserved/variable amino acids in any given protein of interest. The output from the server is a sequence logo and a PSSM. Seq2Logo is available at http://www.cbs.dtu.dk/biotools/Seq2Logo (14 May 2012, date last accessed). PMID:22638583

  14. Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification.

    PubMed

    Sinclair, Robert M; Ravantti, Janne J; Bamford, Dennis H

    2017-04-15

    Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly

  15. Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification

    PubMed Central

    Sinclair, Robert M.; Ravantti, Janne J.

    2017-01-01

    ABSTRACT Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids

  16. Detection of nucleic acid sequences by invader-directed cleavage

    DOEpatents

    Brow, Mary Ann D.; Hall, Jeff Steven Grotelueschen; Lyamichev, Victor; Olive, David Michael; Prudent, James Robert

    1999-01-01

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The 5' nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based by charge.

  17. Methods and compositions for efficient nucleic acid sequencing

    DOEpatents

    Drmanac, Radoje

    2006-07-04

    Disclosed are novel methods and compositions for rapid and highly efficient nucleic acid sequencing based upon hybridization with two sets of small oligonucleotide probes of known sequences. Extremely large nucleic acid molecules, including chromosomes and non-amplified RNA, may be sequenced without prior cloning or subcloning steps. The methods of the invention also solve various current problems associated with sequencing technology such as, for example, high noise to signal ratios and difficult discrimination, attaching many nucleic acid fragments to a surface, preparing many, longer or more complex probes and labelling more species.

  18. Methods and compositions for efficient nucleic acid sequencing

    DOEpatents

    Drmanac, Radoje

    2002-01-01

    Disclosed are novel methods and compositions for rapid and highly efficient nucleic acid sequencing based upon hybridization with two sets of small oligonucleotide probes of known sequences. Extremely large nucleic acid molecules, including chromosomes and non-amplified RNA, may be sequenced without prior cloning or subcloning steps. The methods of the invention also solve various current problems associated with sequencing technology such as, for example, high noise to signal ratios and difficult discrimination, attaching many nucleic acid fragments to a surface, preparing many, longer or more complex probes and labelling more species.

  19. Identifying transposon insertions and their effects from RNA-sequencing data.

    PubMed

    de Ruiter, Julian R; Kas, Sjors M; Schut, Eva; Adams, David J; Koudijs, Marco J; Wessels, Lodewyk F A; Jonkers, Jos

    2017-07-07

    Insertional mutagenesis using engineered transposons is a potent forward genetic screening technique used to identify cancer genes in mouse model systems. In the analysis of these screens, transposon insertion sites are typically identified by targeted DNA-sequencing and subsequently assigned to predicted target genes using heuristics. As such, these approaches provide no direct evidence that insertions actually affect their predicted targets or how transcripts of these genes are affected. To address this, we developed IM-Fusion, an approach that identifies insertion sites from gene-transposon fusions in standard single- and paired-end RNA-sequencing data. We demonstrate IM-Fusion on two separate transposon screens of 123 mammary tumors and 20 B-cell acute lymphoblastic leukemias, respectively. We show that IM-Fusion accurately identifies transposon insertions and their true target genes. Furthermore, by combining the identified insertion sites with expression quantification, we show that we can determine the effect of a transposon insertion on its target gene(s) and prioritize insertions that have a significant effect on expression. We expect that IM-Fusion will significantly enhance the accuracy of cancer gene discovery in forward genetic screens and provide initial insight into the biological effects of insertions on candidate cancer genes. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  20. Identifying functionally informative evolutionary sequence profiles.

    PubMed

    Gil, Nelson; Fiser, Andras

    2018-04-15

    Multiple sequence alignments (MSAs) can provide essential input to many bioinformatics applications, including protein structure prediction and functional annotation. However, the optimal selection of sequences to obtain biologically informative MSAs for such purposes is poorly explored, and has traditionally been performed manually. We present Selection of Alignment by Maximal Mutual Information (SAMMI), an automated, sequence-based approach to objectively select an optimal MSA from a large set of alternatives sampled from a general sequence database search. The hypothesis of this approach is that the mutual information among MSA columns will be maximal for those MSAs that contain the most diverse set possible of the most structurally and functionally homogeneous protein sequences. SAMMI was tested to select MSAs for functional site residue prediction by analysis of conservation patterns on a set of 435 proteins obtained from protein-ligand (peptides, nucleic acids and small substrates) and protein-protein interaction databases. Availability and implementation: A freely accessible program, including source code, implementing SAMMI is available at https://github.com/nelsongil92/SAMMI.git. andras.fiser@einstein.yu.edu. Supplementary data are available at Bioinformatics online.

  1. 77 FR 65537 - Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-10-29

    ... DEPARTMENT OF COMMERCE Patent and Trademark Office Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence Disclosures ACTION: Proposed collection; comment request... Patent applications that contain nucleotide and/or amino acid sequence disclosures must include a copy of...

  2. Method of Identifying a Base in a Nucleic Acid

    DOEpatents

    Fodor, Stephen P. A.; Lipshutz, Robert J.; Huang, Xiaohua

    1999-01-01

    Devices and techniques for hybridization of nucleic acids and for determining the sequence of nucleic acids. Arrays of nucleic acids are formed by techniques, preferably high resolution, light-directed techniques. Positions of hybridization of a target nucleic acid are determined by, e.g., epifluorescence microscopy. Devices and techniques are proposed to determine the sequence of a target nucleic acid more efficiently and more quickly through such synthesis and detection techniques.

  3. Complete cDNA sequence and amino acid analysis of a bovine ribonuclease K6 gene.

    PubMed

    Pietrowski, D; Förster, M

    2000-01-01

    The complete cDNA sequence of a ribonuclease k6 gene of Bos Taurus has been determined. It codes for a protein with 154 amino acids and contains the invariant cysteine, histidine and lysine residues as well as the characteristic motifs specific to ribonuclease active sites. The deduced protein sequence is 27 residues longer than other known ribonucleases k6 and shows amino acids exchanges which could reflect a strain specificity or polymorphism within the bovine genome. Based on sequence similarity we have termed the identified gene bovine ribonuclease k6 b (brk6b).

  4. NullSeq: A Tool for Generating Random Coding Sequences with Desired Amino Acid and GC Contents.

    PubMed

    Liu, Sophia S; Hockenberry, Adam J; Lancichinetti, Andrea; Jewett, Michael C; Amaral, Luís A N

    2016-11-01

    The existence of over- and under-represented sequence motifs in genomes provides evidence of selective evolutionary pressures on biological mechanisms such as transcription, translation, ligand-substrate binding, and host immunity. In order to accurately identify motifs and other genome-scale patterns of interest, it is essential to be able to generate accurate null models that are appropriate for the sequences under study. While many tools have been developed to create random nucleotide sequences, protein coding sequences are subject to a unique set of constraints that complicates the process of generating appropriate null models. There are currently no tools available that allow users to create random coding sequences with specified amino acid composition and GC content for the purpose of hypothesis testing. Using the principle of maximum entropy, we developed a method that generates unbiased random sequences with pre-specified amino acid and GC content, which we have developed into a python package. Our method is the simplest way to obtain maximally unbiased random sequences that are subject to GC usage and primary amino acid sequence constraints. Furthermore, this approach can easily be expanded to create unbiased random sequences that incorporate more complicated constraints such as individual nucleotide usage or even di-nucleotide frequencies. The ability to generate correctly specified null models will allow researchers to accurately identify sequence motifs which will lead to a better understanding of biological processes as well as more effective engineering of biological systems.

  5. Kit for detecting nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    2001-01-01

    A kit is provided for detecting a target nucleic acid sequence in a sample, the kit comprising: a first hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the first hybridization probe including a first complexing agent for forming a binding pair with a second complexing agent; and a second hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the first hybridization probe does not selectively hybridize, the second hybridization probe including a detectable marker; a third hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the third hybridization probe including the same detectable marker as the second hybridization probe; and a fourth hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the third hybridization probe does not selectively hybridize, the fourth hybridization probe including the first complexing agent for forming a binding pair with the second complexing agent; wherein the first and second hybridization probes are capable of simultaneously hybridizing to the target sequence and the third and fourth hybridization probes are capable of simultaneously hybridizing to the target sequence, the detectable marker is not present on the first or fourth hybridization probes and the first, second, third, and fourth hybridization probes each include a competitive nucleic acid sequence which is sufficiently complementary to a third portion of the target sequence that the competitive sequences of the first, second, third, and fourth hybridization probes compete with each other to hybridize to the third portion of the

  6. Deep Sequencing to Identify the Causes of Viral Encephalitis

    PubMed Central

    Chan, Benjamin K.; Wilson, Theodore; Fischer, Kael F.; Kriesel, John D.

    2014-01-01

    Deep sequencing allows for a rapid, accurate characterization of microbial DNA and RNA sequences in many types of samples. Deep sequencing (also called next generation sequencing or NGS) is being developed to assist with the diagnosis of a wide variety of infectious diseases. In this study, seven frozen brain samples from deceased subjects with recent encephalitis were investigated. RNA from each sample was extracted, randomly reverse transcribed and sequenced. The sequence analysis was performed in a blinded fashion and confirmed with pathogen-specific PCR. This analysis successfully identified measles virus sequences in two brain samples and herpes simplex virus type-1 sequences in three brain samples. No pathogen was identified in the other two brain specimens. These results were concordant with pathogen-specific PCR and partially concordant with prior neuropathological examinations, demonstrating that deep sequencing can accurately identify viral infections in frozen brain tissue. PMID:24699691

  7. Probe kit for identifying a base in a nucleic acid

    DOEpatents

    Fodor, Stephen P. A.; Lipshutz, Robert J.; Huang, Xiaohua

    2001-01-01

    Devices and techniques for hybridization of nucleic acids and for determining the sequence of nucleic acids. Arrays of nucleic acids are formed by techniques, preferably high resolution, light-directed techniques. Positions of hybridization of a target nucleic acid are determined by, e.g., epifluorescence microscopy. Devices and techniques are proposed to determine the sequence of a target nucleic acid more efficiently and more quickly through such synthesis and detection techniques.

  8. Newly identified essential amino acid residues affecting ^8-sphingolipid desaturase activity revealed by site-directed mutagenesis

    USDA-ARS?s Scientific Manuscript database

    In order to identify amino acid residues crucial for the enzymatic activity of ^8-sphingolipid desaturases, a sequence comparison was performed among ^8-sphingolipid desaturases and ^6-fatty acid desaturase from various plants. In addition to the known conserved cytb5 (cytochrome b5) HPGG motif and...

  9. Identification, characterization, and utilization of genome-wide simple sequence repeats to identify a QTL for acidity in apple

    PubMed Central

    2012-01-01

    Background Apple is an economically important fruit crop worldwide. Developing a genetic linkage map is a critical step towards mapping and cloning of genes responsible for important horticultural traits in apple. To facilitate linkage map construction, we surveyed and characterized the distribution and frequency of perfect microsatellites in assembled contig sequences of the apple genome. Results A total of 28,538 SSRs have been identified in the apple genome, with an overall density of 40.8 SSRs per Mb. Di-nucleotide repeats are the most frequent microsatellites in the apple genome, accounting for 71.9% of all microsatellites. AT/TA repeats are the most frequent in genomic regions, accounting for 38.3% of all the G-SSRs, while AG/GA dimers prevail in transcribed sequences, and account for 59.4% of all EST-SSRs. A total set of 310 SSRs is selected to amplify eight apple genotypes. Of these, 245 (79.0%) are found to be polymorphic among cultivars and wild species tested. AG/GA motifs in genomic regions have detected more alleles and higher PIC values than AT/TA or AC/CA motifs. Moreover, AG/GA repeats are more variable than any other dimers in apple, and should be preferentially selected for studies, such as genetic diversity and linkage map construction. A total of 54 newly developed apple SSRs have been genetically mapped. Interestingly, clustering of markers with distorted segregation is observed on linkage groups 1, 2, 10, 15, and 16. A QTL responsible for malic acid content of apple fruits is detected on linkage group 8, and accounts for ~13.5% of the observed phenotypic variation. Conclusions This study demonstrates that di-nucleotide repeats are prevalent in the apple genome and that AT/TA and AG/GA repeats are the most frequent in genomic and transcribed sequences of apple, respectively. All SSR motifs identified in this study as well as those newly mapped SSRs will serve as valuable resources for pursuing apple genetic studies, aiding the apple breeding

  10. The amino acid motif L/IIxxFE defines a novel actin-binding sequence in PDZ-RhoGEF

    PubMed Central

    Banerjee, Jayashree; Fischer, Christopher C.; Wedegaertner, Philip B.

    2009-01-01

    PDZ-RhoGEF is a member of the regulator of G protein signaling (RGS) domain-containing RhoGEFs (RGS-RhoGEFs) that link activated heterotrimeric G protein α subunits of the G12 family to activation of the small GTPase RhoA. Unique among the RGS-RhoGEFs, PDZ-RhoGEF contains a short sequence that localizes the protein to the actin cytoskeleton. In this report, we demonstrate that the actin-binding domain, located between amino acids 561–585, directly binds to F-actin in vitro. Extensive mutagenesis identifies isoleucine 568, isoleucine 569, phenylalanine 572, and glutamic acid 573 as necessary for binding to actin and for co-localization with the actin cytoskeleton in cells. These results define a novel actin-binding sequence in PDZ-RhoGEF with a critical amino acid motif of IIxxFE. Moreover, sequence analysis identifies a similar actin-binding motif in the N-terminus of the RhoGEF frabin, and, as with PDZ-RhoGEF, mutagenesis and actin interaction experiments demonstrate a motif of LIxxFE, consisting of the key amino acids leucine 23, isoleucine 24, phenylalanine 27, and glutamic acid 28. Taken together, results with PDZ-RhoGEF and frabin identify a novel actin binding sequence. Lastly, inducible dimerization of the actin-binding region of PDZ-RhoGEF revealed a dimerization-dependent actin bundling activity in vitro. PDZ-RhoGEF exists in cells as a dimer, raising the possibility that PDZ-RhoGEF could influence actin structure independent of its ability to activate RhoA. PMID:19618964

  11. Hybridization and sequencing of nucleic acids using base pair mismatches

    DOEpatents

    Fodor, Stephen P. A.; Lipshutz, Robert J.; Huang, Xiaohua

    2001-01-01

    Devices and techniques for hybridization of nucleic acids and for determining the sequence of nucleic acids. Arrays of nucleic acids are formed by techniques, preferably high resolution, light-directed techniques. Positions of hybridization of a target nucleic acid are determined by, e.g., epifluorescence microscopy. Devices and techniques are proposed to determine the sequence of a target nucleic acid more efficiently and more quickly through such synthesis and detection techniques.

  12. Antibiotic Resistance Markers in Burkholderia pseudomallei Strain Bp1651 Identified by Genome Sequence Analysis

    PubMed Central

    Sue, David; Gee, Jay E.; Elrod, Mindy G.; Hoffmaster, Alex R.; Randall, Linnell B.; Chirakul, Sunisa; Tuanyok, Apichai; Schweizer, Herbert P.; Weigel, Linda M.

    2017-01-01

    ABSTRACT Burkholderia pseudomallei Bp1651 is resistant to several classes of antibiotics that are usually effective for treatment of melioidosis, including tetracyclines, sulfonamides, and β-lactams such as penicillins (amoxicillin-clavulanic acid), cephalosporins (ceftazidime), and carbapenems (imipenem and meropenem). We sequenced, assembled, and annotated the Bp1651 genome and analyzed the sequence using comparative genomic analyses with susceptible strains, keyword searches of the annotation, publicly available antimicrobial resistance prediction tools, and published reports. More than 100 genes in the Bp1651 sequence were identified as potentially contributing to antimicrobial resistance. Most notably, we identified three previously uncharacterized point mutations in penA, which codes for a class A β-lactamase and was previously implicated in resistance to β-lactam antibiotics. The mutations result in amino acid changes T147A, D240G, and V261I. When individually introduced into select agent-excluded B. pseudomallei strain Bp82, D240G was found to contribute to ceftazidime resistance and T147A contributed to amoxicillin-clavulanic acid and imipenem resistance. This study provides the first evidence that mutations in penA may alter susceptibility to carbapenems in B. pseudomallei. Another mutation of interest was a point mutation affecting the dihydrofolate reductase gene folA, which likely explains the trimethoprim resistance of this strain. Bp1651 was susceptible to aminoglycosides likely because of a frameshift in the amrB gene, the transporter subunit of the AmrAB-OprA efflux pump. These findings expand the role of penA to include resistance to carbapenems and may assist in the development of molecular diagnostics that predict antimicrobial resistance and provide guidance for treatment of melioidosis. PMID:28396541

  13. Genome sequence of a distinct watermelon mosaic virus identified from ginseng (Panax ginseng) transcriptome.

    PubMed

    Park, D; Kim, H; Hahn, Y

    Watermelon mosaic virus (WMV) is a member of the genus Potyvirus, which is the largest genus of plant viruses. WMV is a significant pathogen of crop plants, including Cucurbitaceae species. A WMV strain, designated as WMV-Pg, was identified in transcriptome data collected from ginseng (Panax ginseng) root. WMV-Pg showed 84% nucleotide sequence identity and 91% amino acid sequence identity with its closest related virus, WMV-Fr. A phylogenetic analysis of WMV-Pg with other WMVs and soybean mosaic viruses (SMVs) indicated that WMV-Pg is a distinct subtype of the WMV/SMV group of the genus Potyvirus in the family Potyviridae.

  14. Mouse Vk gene classification by nucleic acid sequence similarity.

    PubMed

    Strohal, R; Helmberg, A; Kroemer, G; Kofler, R

    1989-01-01

    Analyses of immunoglobulin (Ig) variable (V) region gene usage in the immune response, estimates of V gene germline complexity, and other nucleic acid hybridization-based studies depend on the extent to which such genes are related (i.e., sequence similarity) and their organization in gene families. While mouse Igh heavy chain V region (VH) gene families are relatively well-established, a corresponding systematic classification of Igk light chain V region (Vk) genes has not been reported. The present analysis, in the course of which we reviewed the known extent of the Vk germline gene repertoire and Vk gene usage in a variety of responses to foreign and self antigens, provides a classification of mouse Vk genes in gene families composed of members with greater than 80% overall nucleic acid sequence similarity. This classification differed in several aspects from that of VH genes: only some Vk gene families were as clearly separated (by greater than 25% sequence dissimilarity) as typical VH gene families; most Vk gene families were closely related and, in several instances, members from different families were very similar (greater than 80%) over large sequence portions; frequently, classification by nucleic acid sequence similarity diverged from existing classifications based on amino-terminal protein sequence similarity. Our data have implications for Vk gene analyses by nucleic acid hybridization and describe potentially important differences in sequence organization between VH and Vk genes.

  15. Sequence-based predictive modeling to identify cancerlectins

    PubMed Central

    Lai, Hong-Yan; Chen, Xin-Xin; Chen, Wei; Tang, Hua; Lin, Hao

    2017-01-01

    Lectins are a diverse type of glycoproteins or carbohydrate-binding proteins that have a wide distribution to various species. They can specially identify and exclusively bind to a certain kind of saccharide groups. Cancerlectins are a group of lectins that are closely related to cancer and play a major role in the initiation, survival, growth, metastasis and spread of tumor. Several computational methods have emerged to discriminate cancerlectins from non-cancerlectins, which promote the study on pathogenic mechanisms and clinical treatment of cancer. However, the predictive accuracies of most of these techniques are very limited. In this work, by constructing a benchmark dataset based on the CancerLectinDB database, a new amino acid sequence-based strategy for feature description was developed, and then the binomial distribution was applied to screen the optimal feature set. Ultimately, an SVM-based predictor was performed to distinguish cancerlectins from non-cancerlectins, and achieved an accuracy of 77.48% with AUC of 85.52% in jackknife cross-validation. The results revealed that our prediction model could perform better comparing with published predictive tools. PMID:28423655

  16. RNA Sequencing Identifies Novel Translational Biomarkers of Kidney Fibrosis

    PubMed Central

    Craciun, Florin L.; Bijol, Vanesa; Ajay, Amrendra K.; Rao, Poornima; Kumar, Ramya K.; Hutchinson, John; Hofmann, Oliver; Joshi, Nikita; Luyendyk, James P.; Kusebauch, Ulrike; Moss, Christopher L.; Srivastava, Anand; Himmelfarb, Jonathan; Waikar, Sushrut S.; Moritz, Robert L.

    2016-01-01

    CKD is the gradual, asymptomatic loss of kidney function, but current tests only identify CKD when significant loss has already happened. Several potential biomarkers of CKD have been reported, but none have been approved for preclinical or clinical use. Using RNA sequencing in a mouse model of folic acid-induced nephropathy, we identified ten genes that track kidney fibrosis development, the common pathologic finding in patients with CKD. The gene expression of all ten candidates was confirmed to be significantly higher (approximately ten- to 150-fold) in three well established, mechanistically distinct mouse models of kidney fibrosis than in models of nonfibrotic AKI. Protein expression of these genes was also high in the folic acid model and in patients with biopsy-proven kidney fibrosis. mRNA expression of the ten genes increased with increasing severity of kidney fibrosis, decreased in response to therapeutic intervention, and increased only modestly (approximately two- to five-fold) with liver fibrosis in mice and humans, demonstrating specificity for kidney fibrosis. Using targeted selected reaction monitoring mass spectrometry, we detected three of the ten candidates in human urine: cadherin 11 (CDH11), macrophage mannose receptor C1 (MRC1), and phospholipid transfer protein (PLTP). Furthermore, urinary levels of each of these three proteins distinguished patients with CKD (n=53) from healthy individuals (n=53; P<0.05). In summary, we report the identification of urinary CDH11, MRC1, and PLTP as novel noninvasive biomarkers of CKD. PMID:26449608

  17. The amino acid sequence of Staphylococcus aureus penicillinase.

    PubMed Central

    Ambler, R P

    1975-01-01

    The amino acid sequence of the penicillinase (penicillin amido-beta-lactamhydrolase, EC 3.5.2.6) from Staphylococcus aureus strain PC1 was determined. The protein consists of a single polypeptide chain of 257 residues, and the sequence was determined by characterization of tryptic, chymotryptic, peptic and CNBr peptides, with some additional evidence from thermolysin and S. aureus proteinase peptides. A mistake in the preliminary report of the sequence is corrected; residues 113-116 are now thought to be -Lys-Lys-Val-Lys- rather than -Lys-Val-Lys-Lys-. Detailed evidence for the amino acid sequence has been deposited as Supplementary Publication SUP 50056 (91 pages) at the British Library (Lending Division), Boston Spa, Wetherby, West Yorkshire LS23 7BQ, U.K., from whom copies may be obtained on the terms given in Biochem. J. (1975) 145, 5. PMID:1218078

  18. GCPred: a web tool for guanylyl cyclase functional centre prediction from amino acid sequence.

    PubMed

    Xu, Nuo; Fu, Dongfang; Li, Shiang; Wang, Yuxuan; Wong, Aloysius

    2018-06-15

    GCPred is a webserver for the prediction of guanylyl cyclase (GC) functional centres from amino acid sequence. GCs are enzymes that generate the signalling molecule cyclic guanosine 3', 5'-monophosphate from guanosine-5'-triphosphate. A novel class of GC centres (GCCs) has been identified in complex plant proteins. Using currently available experimental data, GCPred is created to automate and facilitate the identification of similar GCCs. The server features GCC values that consider in its calculation, the physicochemical properties of amino acids constituting the GCC and the conserved amino acids within the centre. From user input amino acid sequence, the server returns a table of GCC values and graphs depicting deviations from mean values. The utility of this server is demonstrated using plant proteins and the human interleukin-1 receptor-associated kinase family of proteins as example. The GCPred server is available at http://gcpred.com. Supplementary data are available at Bioinformatics online.

  19. Complete genome sequence of southern tomato virus identified from China using next generation sequencing

    USDA-ARS?s Scientific Manuscript database

    Complete genome sequence of a double-stranded RNA (dsRNA) virus, southern tomato virus (STV), on tomatoes in China, was elucidated using small RNAs deep sequencing. The identified STV_CN12 shares 99% sequence identity to other isolates from Mexico, France, Spain, and U.S. This is the first report ...

  20. Detection and isolation of nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1997-01-01

    A method for detecting a target nucleic acid sequence in a sample is provided using hybridization probes which competitively hybridize to a target nucleic acid. According to the method, a target nucleic acid sequence is hybridized to first and second hybridization probes which are complementary to overlapping portions of the target nucleic acid sequence, the first hybridization probe including a first complexing agent capable of forming a binding pair with a second complexing agent and the second hybridization probe including a detectable marker. The first complexing agent attached to the first hybridization probe is contacted with a second complexing agent, the second complexing agent being attached to a solid support such that when the first and second complexing agents are attached, target nucleic acid sequences hybridized to the first hybridization probe become immobilized on to the solid support. The immobilized target nucleic acids are then separated and detected by detecting the detectable marker attached to the second hybridization probe. A kit for performing the method is also provided.

  1. Detection and isolation of nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1997-04-01

    A method for detecting a target nucleic acid sequence in a sample is provided using hybridization probes which competitively hybridize to a target nucleic acid. According to the method, a target nucleic acid sequence is hybridized to first and second hybridization probes which are complementary to overlapping portions of the target nucleic acid sequence, the first hybridization probe including a first complexing agent capable of forming a binding pair with a second complexing agent and the second hybridization probe including a detectable marker. The first complexing agent attached to the first hybridization probe is contacted with a second complexing agent, the second complexing agent being attached to a solid support such that when the first and second complexing agents are attached, target nucleic acid sequences hybridized to the first hybridization probe become immobilized on to the solid support. The immobilized target nucleic acids are then separated and detected by detecting the detectable marker attached to the second hybridization probe. A kit for performing the method is also provided. 7 figs.

  2. Amino acid sequence of the human fibronectin receptor

    PubMed Central

    1987-01-01

    The amino acid sequence deduced from cDNA of the human placental fibronectin receptor is reported. The receptor is composed of two subunits: an alpha subunit of 1,008 amino acids which is processed into two polypeptides disulfide bonded to one another, and a beta subunit of 778 amino acids. Each subunit has near its COOH terminus a hydrophobic segment. This and other sequence features suggest a structure for the receptor in which the hydrophobic segments serve as transmembrane domains anchoring each subunit to the membrane and dividing each into a large ectodomain and a short cytoplasmic domain. The alpha subunit ectodomain has five sequence elements homologous to consensus Ca2+- binding sites of several calcium-binding proteins, and the beta subunit contains a fourfold repeat strikingly rich in cysteine. The alpha subunit sequence is 46% homologous to the alpha subunit of the vitronectin receptor. The beta subunit is 44% homologous to the human platelet adhesion receptor subunit IIIa and 47% homologous to a leukocyte adhesion receptor beta subunit. The high degree of homology (85%) of the beta subunit with one of the polypeptides of a chicken adhesion receptor complex referred to as integrin complex strongly suggests that the latter polypeptide is the chicken homologue of the fibronectin receptor beta subunit. These receptor subunit homologies define a superfamily of adhesion receptors. The availability of the entire protein sequence for the fibronectin receptor will facilitate studies on the functions of these receptors. PMID:2958481

  3. SNBRFinder: A Sequence-Based Hybrid Algorithm for Enhanced Prediction of Nucleic Acid-Binding Residues.

    PubMed

    Yang, Xiaoxia; Wang, Jia; Sun, Jun; Liu, Rong

    2015-01-01

    Protein-nucleic acid interactions are central to various fundamental biological processes. Automated methods capable of reliably identifying DNA- and RNA-binding residues in protein sequence are assuming ever-increasing importance. The majority of current algorithms rely on feature-based prediction, but their accuracy remains to be further improved. Here we propose a sequence-based hybrid algorithm SNBRFinder (Sequence-based Nucleic acid-Binding Residue Finder) by merging a feature predictor SNBRFinderF and a template predictor SNBRFinderT. SNBRFinderF was established using the support vector machine whose inputs include sequence profile and other complementary sequence descriptors, while SNBRFinderT was implemented with the sequence alignment algorithm based on profile hidden Markov models to capture the weakly homologous template of query sequence. Experimental results show that SNBRFinderF was clearly superior to the commonly used sequence profile-based predictor and SNBRFinderT can achieve comparable performance to the structure-based template methods. Leveraging the complementary relationship between these two predictors, SNBRFinder reasonably improved the performance of both DNA- and RNA-binding residue predictions. More importantly, the sequence-based hybrid prediction reached competitive performance relative to our previous structure-based counterpart. Our extensive and stringent comparisons show that SNBRFinder has obvious advantages over the existing sequence-based prediction algorithms. The value of our algorithm is highlighted by establishing an easy-to-use web server that is freely accessible at http://ibi.hzau.edu.cn/SNBRFinder.

  4. Phenolic acid esterases, coding sequences and methods

    DOEpatents

    Blum, David L.; Kataeva, Irina; Li, Xin-Liang; Ljungdahl, Lars G.

    2002-01-01

    Described herein are four phenolic acid esterases, three of which correspond to domains of previously unknown function within bacterial xylanases, from XynY and XynZ of Clostridium thermocellum and from a xylanase of Ruminococcus. The fourth specifically exemplified xylanase is a protein encoded within the genome of Orpinomyces PC-2. The amino acids of these polypeptides and nucleotide sequences encoding them are provided. Recombinant host cells, expression vectors and methods for the recombinant production of phenolic acid esterases are also provided.

  5. Identification of random nucleic acid sequence aberrations using dual capture probes which hybridize to different chromosome regions

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1998-01-01

    A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration.

  6. Identification and Analysis of Novel Amino-Acid Sequence Repeats in Bacillus anthracis str. Ames Proteome Using Computational Tools

    PubMed Central

    Hemalatha, G. R.; Rao, D. Satyanarayana; Guruprasad, L.

    2007-01-01

    We have identified four repeats and ten domains that are novel in proteins encoded by the Bacillus anthracis str. Ames proteome using automated in silico methods. A “repeat” corresponds to a region comprising less than 55-amino-acid residues that occur more than once in the protein sequence and sometimes present in tandem. A “domain” corresponds to a conserved region with greater than 55-amino-acid residues and may be present as single or multiple copies in the protein sequence. These correspond to (1) 57-amino-acid-residue PxV domain, (2) 122-amino-acid-residue FxF domain, (3) 111-amino-acid-residue YEFF domain, (4) 109-amino-acid-residue IMxxH domain, (5) 103-amino-acid-residue VxxT domain, (6) 84-amino-acid-residue ExW domain, (7) 104-amino-acid-residue NTGFIG domain, (8) 36-amino-acid-residue NxGK repeat, (9) 95-amino-acid-residue VYV domain, (10) 75-amino-acid-residue KEWE domain, (11) 59-amino-acid-residue AFL domain, (12) 53-amino-acid-residue RIDVK repeat, (13) (a) 41-amino-acid-residue AGQF repeat and (b) 42-amino-acid-residue GSAL repeat. A repeat or domain type is characterized by specific conserved sequence motifs. We discuss the presence of these repeats and domains in proteins from other genomes and their probable secondary structure. PMID:17538688

  7. Amino acid sequence analysis of the annexin super-gene family of proteins.

    PubMed

    Barton, G J; Newman, R H; Freemont, P S; Crumpton, M J

    1991-06-15

    The annexins are a widespread family of calcium-dependent membrane-binding proteins. No common function has been identified for the family and, until recently, no crystallographic data existed for an annexin. In this paper we draw together 22 available annexin sequences consisting of 88 similar repeat units, and apply the techniques of multiple sequence alignment, pattern matching, secondary structure prediction and conservation analysis to the characterisation of the molecules. The analysis clearly shows that the repeats cluster into four distinct families and that greatest variation occurs within the repeat 3 units. Multiple alignment of the 88 repeats shows amino acids with conserved physicochemical properties at 22 positions, with only Gly at position 23 being absolutely conserved in all repeats. Secondary structure prediction techniques identify five conserved helices in each repeat unit and patterns of conserved hydrophobic amino acids are consistent with one face of a helix packing against the protein core in predicted helices a, c, d, e. Helix b is generally hydrophobic in all repeats, but contains a striking pattern of repeat-specific residue conservation at position 31, with Arg in repeats 4 and Glu in repeats 2, but unconserved amino acids in repeats 1 and 3. This suggests repeats 2 and 4 may interact via a buried saltbridge. The loop between predicted helices a and b of repeat 3 shows features distinct from the equivalent loop in repeats 1, 2 and 4, suggesting an important structural and/or functional role for this region. No compelling evidence emerges from this study for uteroglobin and the annexins sharing similar tertiary structures, or for uteroglobin representing a derivative of a primordial one-repeat structure that underwent duplication to give the present day annexins. The analyses performed in this paper are re-evaluated in the Appendix, in the light of the recently published X-ray structure for human annexin V. The structure confirms most of

  8. Identification of random nucleic acid sequence aberrations using dual capture probes which hybridize to different chromosome regions

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1998-03-24

    A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration. 14 figs.

  9. Evidence of Divergent Amino Acid Usage in Comparative Analyses of R5- and X4-Associated HIV-1 Vpr Sequences

    PubMed Central

    Antell, Gregory C.; Zhong, Wen; Kercher, Katherine; Passic, Shendra; Williams, Jean; Liu, Yucheng; James, Tony; Jacobson, Jeffrey M.; Szep, Zsofia

    2017-01-01

    Vpr is an HIV-1 accessory protein that plays numerous roles during viral replication, and some of which are cell type dependent. To test the hypothesis that HIV-1 tropism extends beyond the envelope into the vpr gene, studies were performed to identify the associations between coreceptor usage and Vpr variation in HIV-1-infected patients. Colinear HIV-1 Env-V3 and Vpr amino acid sequences were obtained from the LANL HIV-1 sequence database and from well-suppressed patients in the Drexel/Temple Medicine CNS AIDS Research and Eradication Study (CARES) Cohort. Genotypic classification of Env-V3 sequences as X4 (CXCR4-utilizing) or R5 (CCR5-utilizing) was used to group colinear Vpr sequences. To reveal the sequences associated with a specific coreceptor usage genotype, Vpr amino acid sequences were assessed for amino acid diversity and Jensen-Shannon divergence between the two groups. Five amino acid alphabets were used to comprehensively examine the impact of amino acid substitutions involving side chains with similar physiochemical properties. Positions 36, 37, 41, 89, and 96 of Vpr were characterized by statistically significant divergence across multiple alphabets when X4 and R5 sequence groups were compared. In addition, consensus amino acid switches were found at positions 37 and 41 in comparisons of the R5 and X4 sequence populations. These results suggest an evolutionary link between Vpr and gp120 in HIV-1-infected patients. PMID:28620613

  10. Sequence, distribution and chromosomal context of class I and class II pilin genes of Neisseria meningitidis identified in whole genome sequences

    PubMed Central

    2014-01-01

    Background Neisseria meningitidis expresses type four pili (Tfp) which are important for colonisation and virulence. Tfp have been considered as one of the most variable structures on the bacterial surface due to high frequency gene conversion, resulting in amino acid sequence variation of the major pilin subunit (PilE). Meningococci express either a class I or a class II pilE gene and recent work has indicated that class II pilins do not undergo antigenic variation, as class II pilE genes encode conserved pilin subunits. The purpose of this work was to use whole genome sequences to further investigate the frequency and variability of the class II pilE genes in meningococcal isolate collections. Results We analysed over 600 publically available whole genome sequences of N. meningitidis isolates to determine the sequence and genomic organization of pilE. We confirmed that meningococcal strains belonging to a limited number of clonal complexes (ccs, namely cc1, cc5, cc8, cc11 and cc174) harbour a class II pilE gene which is conserved in terms of sequence and chromosomal context. We also identified pilS cassettes in all isolates with class II pilE, however, our analysis indicates that these do not serve as donor sequences for pilE/pilS recombination. Furthermore, our work reveals that the class II pilE locus lacks the DNA sequence motifs that enable (G4) or enhance (Sma/Cla repeat) pilin antigenic variation. Finally, through analysis of pilin genes in commensal Neisseria species we found that meningococcal class II pilE genes are closely related to pilE from Neisseria lactamica and Neisseria polysaccharea, suggesting horizontal transfer among these species. Conclusions Class II pilins can be defined by their amino acid sequence and genomic context and are present in meningococcal isolates which have persisted and spread globally. The absence of G4 and Sma/Cla sequences adjacent to the class II pilE genes is consistent with the lack of pilin subunit variation in these

  11. Identifying Group-Specific Sequences for Microbial Communities Using Long k-mer Sequence Signatures

    PubMed Central

    Wang, Ying; Fu, Lei; Ren, Jie; Yu, Zhaoxia; Chen, Ting; Sun, Fengzhu

    2018-01-01

    Comparing metagenomic samples is crucial for understanding microbial communities. For different groups of microbial communities, such as human gut metagenomic samples from patients with a certain disease and healthy controls, identifying group-specific sequences offers essential information for potential biomarker discovery. A sequence that is present, or rich, in one group, but absent, or scarce, in another group is considered “group-specific” in our study. Our main purpose is to discover group-specific sequence regions between control and case groups as disease-associated markers. We developed a long k-mer (k ≥ 30 bps)-based computational pipeline to detect group-specific sequences at strain resolution free from reference sequences, sequence alignments, and metagenome-wide de novo assembly. We called our method MetaGO: Group-specific oligonucleotide analysis for metagenomic samples. An open-source pipeline on Apache Spark was developed with parallel computing. We applied MetaGO to one simulated and three real metagenomic datasets to evaluate the discriminative capability of identified group-specific markers. In the simulated dataset, 99.11% of group-specific logical 40-mers covered 98.89% disease-specific regions from the disease-associated strain. In addition, 97.90% of group-specific numerical 40-mers covered 99.61 and 96.39% of differentially abundant genome and regions between two groups, respectively. For a large-scale metagenomic liver cirrhosis (LC)-associated dataset, we identified 37,647 group-specific 40-mer features. Any one of the features can predict disease status of the training samples with the average of sensitivity and specificity higher than 0.8. The random forests classification using the top 10 group-specific features yielded a higher AUC (from ∼0.8 to ∼0.9) than that of previous studies. All group-specific 40-mers were present in LC patients, but not healthy controls. All the assembled 11 LC-specific sequences can be mapped to two

  12. Complete amino acid sequence of bovine colostrum low-Mr cysteine proteinase inhibitor.

    PubMed

    Hirado, M; Tsunasawa, S; Sakiyama, F; Niinobe, M; Fujii, S

    1985-07-01

    The complete amino acid sequence of bovine colostrum cysteine proteinase inhibitor was determined by sequencing native inhibitor and peptides obtained by cyanogen bromide degradation, Achromobacter lysylendopeptidase digestion and partial acid hydrolysis of reduced and S-carboxymethylated protein. Achromobacter peptidase digestion was successfully used to isolate two disulfide-containing peptides. The inhibitor consists of 112 amino acids with an Mr of 12787. Two disulfide bonds were established between Cys 66 and Cys 77 and between Cys 90 and Cys 110. A high degree of homology in the sequence was found between the colostrum inhibitor and human gamma-trace, human salivary acidic protein and chicken egg-white cystatin.

  13. Nucleic acid sequence detection using multiplexed oligonucleotide PCR

    DOEpatents

    Nolan, John P [Santa Fe, NM; White, P Scott [Los Alamos, NM

    2006-12-26

    Methods for rapidly detecting single or multiple sequence alleles in a sample nucleic acid are described. Provided are all of the oligonucleotide pairs capable of annealing specifically to a target allele and discriminating among possible sequences thereof, and ligating to each other to form an oligonucleotide complex when a particular sequence feature is present (or, alternatively, absent) in the sample nucleic acid. The design of each oligonucleotide pair permits the subsequent high-level PCR amplification of a specific amplicon when the oligonucleotide complex is formed, but not when the oligonucleotide complex is not formed. The presence or absence of the specific amplicon is used to detect the allele. Detection of the specific amplicon may be achieved using a variety of methods well known in the art, including without limitation, oligonucleotide capture onto DNA chips or microarrays, oligonucleotide capture onto beads or microspheres, electrophoresis, and mass spectrometry. Various labels and address-capture tags may be employed in the amplicon detection step of multiplexed assays, as further described herein.

  14. Contig Maps and Genomic Sequencing Identify Candidate Genes in the Usher 1C Locus

    PubMed Central

    Higgins, Michael J.; Day, Colleen D.; Smilinich, Nancy J.; Ni, L.; Cooper, Paul R.; Nowak, Norma J.; Davies, Chris; de Jong, Pieter J.; Hejtmancik, Fielding; Evans, Glen A.; Smith, Richard J.H.; Shows, Thomas B.

    1998-01-01

    Usher syndrome 1C (USH1C) is a congenital condition manifesting profound hearing loss, the absence of vestibular function, and eventual retinal degeneration. The USH1C locus has been mapped genetically to a 2- to 3-cM interval in 11p14–15.1 between D11S899 and D11S861. In an effort to identify the USH1C disease gene we have isolated the region between these markers in yeast artificial chromosomes (YACs) using a combination of STS content mapping and Alu–PCR hybridization. The YAC contig is ∼3.5 Mb and has located several other loci within this interval, resulting in the order CEN-LDHA-SAA1-TPH-D11S1310-(D11S1888/KCNC1)-MYOD1-D11S902D11S921-D11S1890-TEL. Subsequent haplotyping and homozygosity analysis refined the location of the disease gene to a 400-kb interval between D11S902 and D11S1890 with all affected individuals being homozygous for the internal marker D11S921. To facilitate gene identification, the critical region has been converted into P1 artificial chromosome (PAC) clones using sequence-tagged sites (STSs) mapped to the YAC contig, Alu–PCR products generated from the YACs, and PAC end probes. A contig of >50 PAC clones has been assembled between D11S1310 and D11S1890, confirming the order of markers used in haplotyping. Three PAC clones representing nearly two-thirds of the USH1C critical region have been sequenced. PowerBLAST analysis identified six clusters of expressed sequence tags (ESTs), two known genes (BIR,SUR1) mapped previously to this region, and a previously characterized but unmapped gene NEFA (DNA binding/EF hand/acidic amino-acid-rich). GRAIL analysis identified 11 CpG islands and 73 exons of excellent quality. These data allowed the construction of a transcription map for the USH1C critical region, consisting of three known genes and six or more novel transcripts. Based on their map location, these loci represent candidate disease loci for USH1C. The NEFA gene was assessed as the USH1C locus by the sequencing of an amplified NEFA

  15. RNA sequencing identifies upregulated kyphoscoliosis peptidase and phosphatidic acid signaling pathways in muscle hypertrophy generated by transgenic expression of myostatin propeptide.

    PubMed

    Miao, Yuanxin; Yang, Jinzeng; Xu, Zhong; Jing, Lu; Zhao, Shuhong; Li, Xinyun

    2015-04-09

    Myostatin (MSTN), a member of the transforming growth factor-β superfamily, plays a crucial negative role in muscle growth. MSTN mutations or inhibitions can dramatically increase muscle mass in most mammal species. Previously, we generated a transgenic mouse model of muscle hypertrophy via the transgenic expression of the MSTN N-terminal propeptide cDNA under the control of the skeletal muscle-specific MLC1 promoter. Here, we compare the mRNA profiles between transgenic mice and wild-type littermate controls with a high-throughput RNA sequencing method. The results show that 132 genes were significantly differentially expressed between transgenic mice and wild-type control mice; 97 of these genes were up-regulated, and 35 genes were down-regulated in the skeletal muscle. Several genes that had not been reported to be involved in muscle hypertrophy were identified, including up-regulated myosin binding protein H (mybph), and zinc metallopeptidase STE24 (Zmpste24). In addition, kyphoscoliosis peptidase (Ky), which plays a vital role in muscle growth, was also up-regulated in the transgenic mice. Interestingly, a pathway analysis based on grouping the differentially expressed genes uncovered that cardiomyopathy-related pathways and phosphatidic acid (PA) pathways (Dgki, Dgkz, Plcd4) were up-regulated. Increased PA signaling may increase mTOR signaling, resulting in skeletal muscle growth. The findings of the RNA sequencing analysis help to understand the molecular mechanisms of muscle hypertrophy caused by MSTN inhibition.

  16. Identification of a novel bovine enterovirus possessing highly divergent amino acid sequences in capsid protein.

    PubMed

    Tsuchiaka, Shinobu; Rahpaya, Sayed Samim; Otomaru, Konosuke; Aoki, Hiroshi; Kishimoto, Mai; Naoi, Yuki; Omatsu, Tsutomu; Sano, Kaori; Okazaki-Terashima, Sachiko; Katayama, Yukie; Oba, Mami; Nagai, Makoto; Mizutani, Tetsuya

    2017-01-17

    Bovine enterovirus (BEV) belongs to the species Enterovirus E or F, genus Enterovirus and family Picornaviridae. Although numerous studies have identified BEVs in the feces of cattle with diarrhea, the pathogenicity of BEVs remains unclear. Previously, we reported the detection of novel kobu-like virus in calf feces, by metagenomics analysis. In the present study, we identified a novel BEV in diarrheal feces collected for that survey. Complete genome sequences were determined by deep sequencing in feces. Secondary RNA structure analysis of the 5' untranslated region (UTR), phylogenetic tree construction and pairwise identity analysis were conducted. The complete genome sequences of BEV were genetically distant from other EVs and the VP1 coding region contained novel and unique amino acid sequences. We named this strain as BEV AN12/Bos taurus/JPN/2014 (referred to as BEV-AN12). According to genome analysis, the genome length of this virus is 7414 nucleotides excluding the poly (A) tail and its genome consists of a 5'UTR, open reading frame encoding a single polyprotein, and 3'UTR. The results of secondary RNA structure analysis showed that in the 5'UTR, BEV-AN12 had an additional clover leaf structure and small stem loop structure, similarly to other BEVs. In pairwise identity analysis, BEV-AN12 showed high amino acid (aa) identities to Enterovirus F in the polyprotein, P2 and P3 regions (aa identity ≥82.4%). Therefore, BEV-AN12 is closely related to Enterovirus F. However, aa sequences in the capsid protein regions, particularly the VP1 encoding region, showed significantly low aa identity to other viruses in genus Enterovirus (VP1 aa identity ≤58.6%). In addition, BEV-AN12 branched separately from Enterovirus E and F in phylogenetic trees based on the aa sequences of P1 and VP1, although it clustered with Enterovirus F in trees based on sequences in the P2 and P3 genome region. We identified novel BEV possessing highly divergent aa sequences in the VP1 coding

  17. Acid lipase inhibitor in chicken plasma identified as apolipoprotein A-I.

    PubMed

    Fujii, M; Higuchi, T; Mukai, S; Yonekura, M; Yano, T; Kawaguchi, H; Nonaka, K; Fukunaga, T; Sugimoto, Y; Yamada, S

    1996-10-01

    We have reported a inhibitor of acid lipases in liver lysosomes and erythrocytes from chickens [M. Fujii et al., Int. J. Biochem., 22, 895-898 (1990)]. In this paper, the properties of the inhibitor were described in comparison with those of apo A-I of chicken. The purified inhibitor migrated with the same mobility on SDS-PAGE as apo A-I, and had a molecular weight of 27,000. The peptide map from the lipase inhibitor was similar to that of apo A-I. Antibodies to the acid lipase inhibitor also reacted with apo A-I. Apo A-I inhibited the acid lipase activities of liver lysosomes and erythrocytes from chickens as strongly as the lipase inhibitor. The N-terminal amino acid sequence of lipase inhibitor was identical to that of apo A-I as far as residue 20. The amino acid sequence of peptides obtained from the inhibitor by cleavage with CNBr corresponded to internal sequence of apo A-I, and so the CNBr-peptides were derived by cleavage after the methionine residues in apo A-I. The findings showed that the inhibitor of the acid lipases in liver lysosomes and erythrocytes from chickens was identical to apo A-I.

  18. Homology analyses of the protein sequences of fatty acid synthases from chicken liver, rat mammary gland, and yeast

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chang, Soo-Ik; Hammes, G.G.

    1989-11-01

    Homology analyses of the protein sequences of chicken liver and rat mammary gland fatty acid synthases were carried out. The amino acid sequences of the chicken and rat enzymes are 67% identical. If conservative substitutions are allowed, 78% of the amino acids are matched. A region of low homologies exists between the functional domains, in particular around amino acid residues 1059-1264 of the chicken enzyme. Homologies between the active sites of chicken and rat and of chicken and yeast enzymes have been analyzed by an alignment method. A high degree of homology exists between the active sites of the chickenmore » and rat enzymes. However, the chicken and yeast enzymes show a lower degree of homology. The DADPH-binding dinucleotide folds of the {beta}-ketoacyl reductase and the enoyl reductase sites were identified by comparison with a known consensus sequence for the DADP- and FAD-binding dinucleotide folds. The active sites of all of the enzymes are primarily in hydrophobic regions of the protein. This study suggests that the genes for the functional domains of fatty acid synthase were originally separated, and these genes were connected to each other by using different connecting nucleotide sequences in different species. An alternative explanation for the differences in rat and chicken is a common ancestry and mutations in the joining regions during evolution.« less

  19. Gene Unprediction with Spurio: A tool to identify spurious protein sequences.

    PubMed

    Höps, Wolfram; Jeffryes, Matt; Bateman, Alex

    2018-01-01

    We now have access to the sequences of tens of millions of proteins. These protein sequences are essential for modern molecular biology and computational biology. The vast majority of protein sequences are derived from gene prediction tools and have no experimental supporting evidence for their translation.  Despite the increasing accuracy of gene prediction tools there likely exists a large number of spurious protein predictions in the sequence databases.  We have developed the Spurio tool to help identify spurious protein predictions in prokaryotes.  Spurio searches the query protein sequence against a prokaryotic nucleotide database using tblastn and identifies homologous sequences. The tblastn matches are used to score the query sequence's likelihood of being a spurious protein prediction using a Gaussian process model. The most informative feature is the appearance of stop codons within the presumed translation of homologous DNA sequences. Benchmarking shows that the Spurio tool is able to distinguish spurious from true proteins. However, transposon proteins are prone to be predicted as spurious because of the frequency of degraded homologs found in the DNA sequence databases. Our initial experiments suggest that less than 1% of the proteins in the UniProtKB sequence database are likely to be spurious and that Spurio is able to identify over 60 times more spurious proteins than the AntiFam resource. The Spurio software and source code is available under an MIT license at the following URL: https://bitbucket.org/bateman-group/spurio.

  20. WEB-server for search of a periodicity in amino acid and nucleotide sequences

    NASA Astrophysics Data System (ADS)

    E Frenkel, F.; Skryabin, K. G.; Korotkov, E. V.

    2017-12-01

    A new web server (http://victoria.biengi.ac.ru/splinter/login.php) was designed and developed to search for periodicity in nucleotide and amino acid sequences. The web server operation is based upon a new mathematical method of searching for multiple alignments, which is founded on the position weight matrices optimization, as well as on implementation of the two-dimensional dynamic programming. This approach allows the construction of multiple alignments of the indistinctly similar amino acid and nucleotide sequences that accumulated more than 1.5 substitutions per a single amino acid or a nucleotide without performing the sequences paired comparisons. The article examines the principles of the web server operation and two examples of studying amino acid and nucleotide sequences, as well as information that could be obtained using the web server.

  1. Detection and isolation of nucleic acid sequences using a bifunctional hybridization probe

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    2000-01-01

    A method for detecting and isolating a target sequence in a sample of nucleic acids is provided using a bifunctional hybridization probe capable of hybridizing to the target sequence that includes a detectable marker and a first complexing agent capable of forming a binding pair with a second complexing agent. A kit is also provided for detecting a target sequence in a sample of nucleic acids using a bifunctional hybridization probe according to this method.

  2. MACARON: A python framework to identify and re-annotate multi-base affected codons in whole genome/exome sequence data.

    PubMed

    Khan, Waqasuddin; Saripella, Ganapathi Varma-; Ludwig, Thomas; Cuppens, Tania; Thibord, Florian; Génin, Emmanuelle; Deleuze, Jean-Francois; Trégouët, David-Alexandre

    2018-05-03

    Predicted deleteriousness of coding variants is a frequently used criterion to filter out variants detected in next-generation sequencing projects and to select candidates impacting on the risk of human diseases. Most available dedicated tools implement a base-to-base annotation approach that could be biased in presence of several variants in the same genetic codon. We here proposed the MACARON program that, from a standard VCF file, identifies, re-annotates and predicts the amino acid change resulting from multiple single nucleotide variants (SNVs) within the same genetic codon. Applied to the whole exome dataset of 573 individuals, MACARON identifies 114 situations where multiple SNVs within a genetic codon induce an amino acid change that is different from those predicted by standard single SNV annotation tool. Such events are not uncommon and deserve to be studied in sequencing projects with inconclusive findings. MACARON is written in python with codes available on the GENMED website (www.genmed.fr). david-alexandre.tregouet@inserm.fr. Supplementary data are available at Bioinformatics online.

  3. Fatty Acid Profile and Unigene-Derived Simple Sequence Repeat Markers in Tung Tree (Vernicia fordii)

    PubMed Central

    Zhang, Lin; Jia, Baoguang; Tan, Xiaofeng; Thammina, Chandra S.; Long, Hongxu; Liu, Min; Wen, Shanna; Song, Xianliang; Cao, Heping

    2014-01-01

    Tung tree (Vernicia fordii) provides the sole source of tung oil widely used in industry. Lack of fatty acid composition and molecular markers hinders biochemical, genetic and breeding research. The objectives of this study were to determine fatty acid profiles and develop unigene-derived simple sequence repeat (SSR) markers in tung tree. Fatty acid profiles of 41 accessions showed that the ratio of α-eleostearic acid was increasing continuously with a parallel trend to the amount of tung oil accumulation while the ratios of other fatty acids were decreasing in different stages of the seeds and that α-eleostearic acid (18∶3) consisted of 77% of the total fatty acids in tung oil. Transcriptome sequencing identified 81,805 unigenes from tung cDNA library constructed using seed mRNA and discovered 6,366 SSRs in 5,404 unigenes. The di- and tri-nucleotide microsatellites accounted for 92% of the SSRs with AG/CT and AAG/CTT being the most abundant SSR motifs. Fifteen polymorphic genic-SSR markers were developed from 98 unigene loci tested in 41 cultivated tung accessions by agarose gel and capillary electrophoresis. Genbank database search identified 10 of them putatively coding for functional proteins. Quantitative PCR demonstrated that all 15 polymorphic SSR-associated unigenes were expressed in tung seeds and some of them were highly correlated with oil composition in the seeds. Dendrogram revealed that most of the 41 accessions were clustered according to the geographic region. These new polymorphic genic-SSR markers will facilitate future studies on genetic diversity, molecular fingerprinting, comparative genomics and genetic mapping in tung tree. The lipid profiles in the seeds of 41 tung accessions will be valuable for biochemical and breeding studies. PMID:25167054

  4. Identifying micro-inversions using high-throughput sequencing reads.

    PubMed

    He, Feifei; Li, Yang; Tang, Yu-Hang; Ma, Jian; Zhu, Huaiqiu

    2016-01-11

    The identification of inversions of DNA segments shorter than read length (e.g., 100 bp), defined as micro-inversions (MIs), remains challenging for next-generation sequencing reads. It is acknowledged that MIs are important genomic variation and may play roles in causing genetic disease. However, current alignment methods are generally insensitive to detect MIs. Here we develop a novel tool, MID (Micro-Inversion Detector), to identify MIs in human genomes using next-generation sequencing reads. The algorithm of MID is designed based on a dynamic programming path-finding approach. What makes MID different from other variant detection tools is that MID can handle small MIs and multiple breakpoints within an unmapped read. Moreover, MID improves reliability in low coverage data by integrating multiple samples. Our evaluation demonstrated that MID outperforms Gustaf, which can currently detect inversions from 30 bp to 500 bp. To our knowledge, MID is the first method that can efficiently and reliably identify MIs from unmapped short next-generation sequencing reads. MID is reliable on low coverage data, which is suitable for large-scale projects such as the 1000 Genomes Project (1KGP). MID identified previously unknown MIs from the 1KGP that overlap with genes and regulatory elements in the human genome. We also identified MIs in cancer cell lines from Cancer Cell Line Encyclopedia (CCLE). Therefore our tool is expected to be useful to improve the study of MIs as a type of genetic variant in the human genome. The source code can be downloaded from: http://cqb.pku.edu.cn/ZhuLab/MID .

  5. Genome Sequence of Lactobacillus sakei LK-145 Isolated from a Japanese Sake Cellar as a High Producer of d-Amino Acids

    PubMed Central

    Kato, Shiro

    2017-01-01

    ABSTRACT This announcement reports the complete genome sequence of strain LK-145 of Lactobacillus sakei isolated from a Japanese sake cellar as a potent strain for the production of large amounts of d-amino acids. Three putative genes encoding an amino acid racemase were identified. PMID:28818888

  6. Soil amino acid composition across a boreal forest successional sequence

    Treesearch

    Nancy R. Werdin-Pfisterer; Knut Kielland; Richard D. Boone

    2009-01-01

    Soil amino acids are important sources of organic nitrogen for plant nutrition, yet few studies have examined which amino acids are most prevalent in the soil. In this study, we examined the composition, concentration, and seasonal patterns of soil amino acids across a primary successional sequence encompassing a natural gradient of plant productivity and soil...

  7. Preferential amino acid sequences in alumina-catalyzed peptide bond formation.

    PubMed

    Bujdák, J; Rode, B M

    2002-05-21

    The catalytic effect of activated alumina on amino acid condensation was investigated. The readiness of amino acids to form peptide sequences was estimated on the basis of the yield of dipeptides and was found to decrease in the order glycine (Gly), alanine (Ala), leucine (Leu), valine (Val), proline (Pro). For example, approximately 15% Gly was converted to the dipeptide (Gly(2)), 5% to cyclic anhydride (cyc(Gly(2))) and small amounts of tri- (Gly(3)) and tetrapeptide (Gly(4)) were formed after 28 days. On the other hand, only trace amounts of Pro(2) were formed from proline under the same conditions. Preferential formation of certain sequences was observed in the mixed reaction systems containing two amino acids. For example, almost ten times more Gly-Val than Val-Gly was formed in the Gly+Val reaction system. The preferred sequences can be explained on the basis of an inductive effect that side groups have on the nucleophilicity and electrophilicity, respectively, of the amino and carboxyl groups. A comparison with published data of amino acid reactions in other reaction systems revealed that the main trends of preferential sequence formation were the same as those described for the salt-induced peptide formation (SIPF) reaction. The results of this work and other previously published papers show that alumina and related mineral surfaces might have played a crucial role in the prebiotic formation of the first peptides on the primitive earth.

  8. Unconventional P-35S sequence identified in genetically modified maize

    PubMed Central

    Al-Hmoud, Nisreen; Al-Husseini, Nawar; Ibrahim-Alobaide, Mohammed A; Kübler, Eric; Farfoura, Mahmoud; Alobydi, Hytham; Al-Rousan, Hiyam

    2014-01-01

    The Cauliflower Mosaic Virus 35S promoter sequence, CaMV P-35S, is one of several commonly used genetic targets to detect genetically modified maize and is found in most GMOs. In this research we report the finding of an alternative P-35S sequence and its incidence in GM maize marketed in Jordan. The primer pair normally used to amplify a 123 bp DNA fragment of the CaMV P-35S promoter in GMOs also amplified a previously undetected alternative sequence of CaMV P-35S in GM maize samples which we term V3. The amplified V3 sequence comprises 386 base pairs and was not found in the standard wild-type maize, MON810 and MON 863 GM maize. The identified GM maize samples carrying the V3 sequence were found free of CaMV when compared with CaMV infected brown mustard sample. The data of sequence alignment analysis of the V3 genetic element showed 90% similarity with the matching P-35S sequence of the cauliflower mosaic virus isolate CabbB-JI and 99% similarity with matching P-35S sequences found in several binary plant vectors, of which the binary vector locus JQ693018 is one example. The current study showed an increase of 44% in the incidence of the identified 386 bp sequence in GM maize sold in Jordan’s markets during the period 2009 and 2012. PMID:24495911

  9. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2010-07-01 2010-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide and...

  10. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2011-07-01 2011-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide and...

  11. Application of small RNA sequencing to identify microRNAs in acute kidney injury and fibrosis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pellegrini, Kathryn L.

    Establishing a microRNA (miRNA) expression profile in affected tissues provides an important foundation for the discovery of miRNAs involved in the development or progression of pathologic conditions. We conducted small RNA sequencing to generate a temporal profile of miRNA expression in the kidneys using a mouse model of folic acid-induced (250 mg/kg i.p.) kidney injury and fibrosis. From the 103 miRNAs that were differentially expressed over the time course (> 2-fold, p < 0.05), we chose to further investigate miR-18a-5p, which is expressed during the acute stage of the injury; miR-132-3p, which is upregulated during transition between acute and fibroticmore » injury; and miR-146b-5p, which is highly expressed at the peak of fibrosis. Using qRT-PCR, we confirmed the increased expression of these candidate miRNAs in the folic acid model as well as in other established mouse models of acute injury (ischemia/reperfusion injury) and fibrosis (unilateral ureteral obstruction). In situ hybridization confirmed high expression of miR-18a-5p, miR-132-3p and miR-146b-5p throughout the kidney cortex in mice and humans with severe kidney injury or fibrosis. When primary human proximal tubular epithelial cells were treated with model nephrotoxicants such as cadmium chloride (CdCl{sub 2}), arsenic trioxide, aristolochic acid (AA), potassium dichromate (K{sub 2}Cr{sub 2}O{sub 7}) and cisplatin, miRNA-132-3p was upregulated 4.3-fold after AA treatment and 1.5-fold after K{sub 2}Cr{sub 2}O{sub 7} and CdCl{sub 2} treatment. These results demonstrate the application of temporal small RNA sequencing to identify miR-18a, miR-132 and miR-146b as differentially expressed miRNAs during distinct phases of kidney injury and fibrosis progression. - Highlights: • We used small RNA sequencing to identify differentially expressed miRNAs in kidney. • Distinct patterns were found for acute injury and fibrotic stages in the kidney. • Upregulation of miR-18a, -132 and -146b was confirmed

  12. The complete amino acid sequence of human skeletal-muscle fructose-bisphosphate aldolase.

    PubMed Central

    Freemont, P S; Dunbar, B; Fothergill-Gilmore, L A

    1988-01-01

    The complete amino acid sequence of human skeletal-muscle fructose-bisphosphate aldolase, comprising 363 residues, was determined. The sequence was deduced by automated sequencing of CNBr-cleavage, o-iodosobenzoic acid-cleavage, trypsin-digest and staphylococcal-proteinase-digest fragments. Comparison of the sequence with other class I aldolase sequences shows that the mammalian muscle isoenzyme is one of the most highly conserved enzymes known, with only about 2% of the residues changing per 100 million years. Non-mammalian aldolases appear to be evolving at the same rate as other glycolytic enzymes, with about 4% of the residues changing per 100 million years. Secondary-structure predictions are analysed in an accompanying paper [Sawyer, Fothergill-Gilmore & Freemont (1988) Biochem. J. 249, 789-793]. PMID:3355497

  13. Sequences Of Amino Acids For Human Serum Albumin

    NASA Technical Reports Server (NTRS)

    Carter, Daniel C.

    1992-01-01

    Sequences of amino acids defined for use in making polypeptides one-third to one-sixth as large as parent human serum albumin molecule. Smaller, chemically stable peptides have diverse applications including service as artificial human serum and as active components of biosensors and chromatographic matrices. In applications involving production of artificial sera from new sequences, little or no concern about viral contaminants. Smaller genetically engineered polypeptides more easily expressed and produced in large quantities, making commercial isolation and production more feasible and profitable.

  14. Whole exome sequencing for familial bicuspid aortic valve identifies putative variants.

    PubMed

    Martin, Lisa J; Pilipenko, Valentina; Kaufman, Kenneth M; Cripe, Linda; Kottyan, Leah C; Keddache, Mehdi; Dexheimer, Phillip; Weirauch, Matthew T; Benson, D Woodrow

    2014-10-01

    Bicuspid aortic valve (BAV) is the most common congenital cardiovascular malformation. Although highly heritable, few causal variants have been identified. The purpose of this study was to identify genetic variants underlying BAV by whole exome sequencing a multiplex BAV kindred. Whole exome sequencing was performed on 17 individuals from a single family (BAV=3; other cardiovascular malformation, 3). Postvariant calling error control metrics were established after examining the relationship between Mendelian inheritance error rate and coverage, quality score, and call rate. To determine the most effective approach to identifying susceptibility variants from among 54 674 variants passing error control metrics, we evaluated 3 variant selection strategies frequently used in whole exome sequencing studies plus extended family linkage. No putative rare, high-effect variants were identified in all affected but no unaffected individuals. Eight high-effect variants were identified by ≥2 of the commonly used selection strategies; however, these were either common in the general population (>10%) or present in the majority of the unaffected family members. However, using extended family linkage, 3 synonymous variants were identified; all 3 variants were identified by at least one other strategy. These results suggest that traditional whole exome sequencing approaches, which assume causal variants alter coding sense, may be insufficient for BAV and other complex traits. Identification of disease-associated variants is facilitated by the use of segregation within families. © 2014 American Heart Association, Inc.

  15. Comparative characterization of random-sequence proteins consisting of 5, 12, and 20 kinds of amino acids

    PubMed Central

    Tanaka, Junko; Doi, Nobuhide; Takashima, Hideaki; Yanagawa, Hiroshi

    2010-01-01

    Screening of functional proteins from a random-sequence library has been used to evolve novel proteins in the field of evolutionary protein engineering. However, random-sequence proteins consisting of the 20 natural amino acids tend to aggregate, and the occurrence rate of functional proteins in a random-sequence library is low. From the viewpoint of the origin of life, it has been proposed that primordial proteins consisted of a limited set of amino acids that could have been abundantly formed early during chemical evolution. We have previously found that members of a random-sequence protein library constructed with five primitive amino acids show high solubility (Doi et al., Protein Eng Des Sel 2005;18:279–284). Although such a library is expected to be appropriate for finding functional proteins, the functionality may be limited, because they have no positively charged amino acid. Here, we constructed three libraries of 120-amino acid, random-sequence proteins using alphabets of 5, 12, and 20 amino acids by preselection using mRNA display (to eliminate sequences containing stop codons and frameshifts) and characterized and compared the structural properties of random-sequence proteins arbitrarily chosen from these libraries. We found that random-sequence proteins constructed with the 12-member alphabet (including five primitive amino acids and positively charged amino acids) have higher solubility than those constructed with the 20-member alphabet, though other biophysical properties are very similar in the two libraries. Thus, a library of moderate complexity constructed from 12 amino acids may be a more appropriate resource for functional screening than one constructed from 20 amino acids. PMID:20162614

  16. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data shall...

  17. Amino acid sequence of a trypsin inhibitor from a Spirometra (Spirometra erinaceieuropaei).

    PubMed

    Sanda, A; Uchida, A; Itagaki, T; Kobayashi, H; Inokuchi, N; Koyama, T; Iwama, M; Ohgi, K; Irie, M

    2001-12-01

    A trypsin inhibitor that is highly homologous with bovine pancreatic trypsin inhibitor (BPTI) was co-purified along with RNase from Spirometra (Spirometra erinaceieuropaei). The amino acid sequence of this inhibitor (SETI) and the nucleotide sequence of the cDNA encoding this protein were determined by protein chemistry and gene technology. SETI contains 68 amino acid residues and has a molecular mass of 7,798 Da. SETI has 31 amino acid residues that are identical with BPTI's sequence, including 6 half-cystine and 5 aromatic amino acid residues. The active site Lys residue in BPTI is replaced by an Arg residue in SETI. SETI is an effective inhibitor of trypsin and moderately inhibits a-chymotrypsin, but less inhibits elastase or subtilisin. SETI was expressed by E. coli containing a PelB vector carrying the SETI encoding cDNA; an expression yield of 0.68 mg/l was obtained. The phylogenetic relationship of SETI and the other BPTI-like trypsin inhibitors was analyzed using most likelihood inference methods.

  18. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... and/or amino acid sequences as part of the application. 1.823 Section 1.823 Patents, Trademarks, and... Amino Acid Sequences § 1.823 Requirements for nucleotide and/or amino acid sequences as part of the... incorporation-by-reference of the Sequence Listing as required by § 1.52(e)(5). The presentation of the...

  19. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... and/or amino acid sequences as part of the application. 1.823 Section 1.823 Patents, Trademarks, and... Amino Acid Sequences § 1.823 Requirements for nucleotide and/or amino acid sequences as part of the... incorporation-by-reference of the Sequence Listing as required by § 1.52(e)(5). The presentation of the...

  20. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... and/or amino acid sequences as part of the application. 1.823 Section 1.823 Patents, Trademarks, and... Amino Acid Sequences § 1.823 Requirements for nucleotide and/or amino acid sequences as part of the... incorporation-by-reference of the Sequence Listing as required by § 1.52(e)(5). The presentation of the...

  1. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... and/or amino acid sequences as part of the application. 1.823 Section 1.823 Patents, Trademarks, and... Amino Acid Sequences § 1.823 Requirements for nucleotide and/or amino acid sequences as part of the... incorporation-by-reference of the Sequence Listing as required by § 1.52(e)(5). The presentation of the...

  2. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... and/or amino acid sequences as part of the application. 1.823 Section 1.823 Patents, Trademarks, and... Amino Acid Sequences § 1.823 Requirements for nucleotide and/or amino acid sequences as part of the... incorporation-by-reference of the Sequence Listing as required by § 1.52(e)(5). The presentation of the...

  3. An integrated system for identifying the hidden assassins in traditional medicines containing aristolochic acids

    NASA Astrophysics Data System (ADS)

    Wu, Lan; Sun, Wei; Wang, Bo; Zhao, Haiyu; Li, Yaoli; Cai, Shaoqing; Xiang, Li; Zhu, Yingjie; Yao, Hui; Song, Jingyuan; Cheng, Yung-Chi; Chen, Shilin

    2015-08-01

    Traditional herbal medicines adulterated and contaminated with plant materials from the Aristolochiaceae family, which contain aristolochic acids (AAs), cause aristolochic acid nephropathy. Approximately 256 traditional Chinese patent medicines, containing Aristolochiaceous materials, are still being sold in Chinese markets today. In order to protect consumers from health risks due to AAs, the hidden assassins, efficient methods to differentiate Aristolochiaceous herbs from their putative substitutes need to be established. In this study, 158 Aristolochiaceous samples representing 46 species and four genera as well as 131 non-Aristolochiaceous samples representing 33 species, 20 genera and 12 families were analyzed using DNA barcodes based on the ITS2 and psbA-trnH sequences. Aristolochiaceous materials and their non-Aristolochiaceous substitutes were successfully identified using BLAST1, the nearest distance method and the neighbor-joining (NJ) tree. In addition, based on sequence information of ITS2, we developed a Real-Time PCR assay which successfully identified herbal material from the Aristolochiaceae family. Using Ultra High Performance Liquid Chromatography-Mass Spectrometer (UHPLC-HR-MS), we demonstrated that most representatives from the Aristolochiaceae family contain toxic AAs. Therefore, integrated DNA barcodes, Real-Time PCR assays using TaqMan probes and UHPLC-HR-MS system provides an efficient and reliable authentication system to protect consumers from health risks due to the hidden assassins (AAs).

  4. Methods for Identifying Ligands that Target Nucleic Acid Molecules and Nucleic Acid Structural Motifs

    NASA Technical Reports Server (NTRS)

    Childs-Disney, Jessica L. (Inventor); Disney, Matthew D. (Inventor)

    2017-01-01

    Disclosed are methods for identifying a nucleic acid (e.g., RNA, DNA, etc.) motif which interacts with a ligand. The method includes providing a plurality of ligands immobilized on a support, wherein each particular ligand is immobilized at a discrete location on the support; contacting the plurality of immobilized ligands with a nucleic acid motif library under conditions effective for one or more members of the nucleic acid motif library to bind with the immobilized ligands; and identifying members of the nucleic acid motif library that are bound to a particular immobilized ligand. Also disclosed are methods for selecting, from a plurality of candidate ligands, one or more ligands that have increased likelihood of binding to a nucleic acid molecule comprising a particular nucleic acid motif, as well as methods for identifying a nucleic acid which interacts with a ligand.

  5. Nucleotide sequence analysis of the gene encoding the Deinococcus radiodurans surface protein, derived amino acid sequence, and complementary protein chemical studies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Peters, J.; Peters, M.; Lottspeich, F.

    1987-11-01

    The complete nucleotide sequence of the gene encoding the surface (hexagonally packed intermediate (HPI))-layer polypeptide of Deinococcus radiodurans Sark was determined and found to encode a polypeptide of 1036 amino acids. Amino acid sequence analysis of about 30% of the residues revealed that the mature polypeptide consists of at least 978 amino acids. The N terminus was blocked to Edman degradation. The results of proteolytic modification of the HPI layer in situ and M/sub r/ estimations of the HPI polypeptide expressed in Escherichia coli indicated that there is a leader sequence. The N-terminal region contained a very high percentage (29%)more » of threonine and serine, including a cluster of nine consecutive serine or threonine residues, whereas a stretch near the C terminus was extremely rich in aromatic amino acids (29%). The protein contained at least two disulfide bridges, as well as tightly bound reducing sugars and fatty acids.« less

  6. Biodegradation of 5-chloro-2-picolinic acid by novel identified co-metabolizing degrader Achromobacter sp. f1.

    PubMed

    Wu, Zhi-Guo; Wang, Fang; Ning, Li-Qun; Stedtfeld, Robert D; Yang, Zong-Zheng; Cao, Jing-Guo; Sheng, Hong-Jie; Jiang, Xin

    2017-06-01

    Several bacteria have been isolated to degrade 4-chloronitrobenzene. Degradation of 4-chloronitrobenzene by Cupriavidus sp. D4 produces 5-chloro-2-picolinic acid as a dead-end by-product, a potential pollutant. To date, no bacterium that degrades 5-chloro-2-picolinic acid has been reported. Strain f1, isolated from a soil polluted by 4-chloronitrobenzene, was able to co-metabolize 5-chloro-2-picolinic acid in the presence of ethanol or other appropriate carbon sources. The strain was identified as Achromobacter sp. based on its physiological, biochemical characteristics, and 16S rRNA gene sequence analysis. The organism completely degraded 50, 100 and 200 mg L -1 of 5-chloro-2-picolinic acid within 48, 60, and 72 h, respectively. During the degradation of 5-chloro-2-picolinic acid, Cl - was released. The initial metabolic product of 5-chloro-2-picolinic acid was identified as 6-hydroxy-5-chloro-2-picolinic acid by LC-MS and NMR. Using a mixed culture of Achromobacter sp. f1 and Cupriavidus sp. D4 for degradation of 4-chloronitrobenzen, 5-chloro-2-picolinic acid did not accumulate. Results infer that Achromobacter sp. f1 can be used for complete biodegradation of 4-chloronitrobenzene in remedial applications.

  7. Using whole-exome sequencing to identify variants inherited from mosaic parents

    PubMed Central

    Rios, Jonathan J; Delgado, Mauricio R

    2015-01-01

    Whole-exome sequencing (WES) has allowed the discovery of genes and variants causing rare human disease. This is often achieved by comparing nonsynonymous variants between unrelated patients, and particularly for sporadic or recessive disease, often identifies a single or few candidate genes for further consideration. However, despite the potential for this approach to elucidate the genetic cause of rare human disease, a majority of patients fail to realize a genetic diagnosis using standard exome analysis methods. Although genetic heterogeneity contributes to the difficulty of exome sequence analysis between patients, it remains plausible that rare human disease is not caused by de novo or recessive variants. Multiple human disorders have been described for which the variant was inherited from a phenotypically normal mosaic parent. Here we highlight the potential for exome sequencing to identify a reasonable number of candidate genes when dominant disease variants are inherited from a mosaic parent. We show the power of WES to identify a limited number of candidate genes using this disease model and how sequence coverage affects identification of mosaic variants by WES. We propose this analysis as an alternative to discover genetic causes of rare human disorders for which typical WES approaches fail to identify likely pathogenic variants. PMID:24986828

  8. Amino acid sequence of tyrosinase from Neurospora crassa.

    PubMed Central

    Lerch, K

    1978-01-01

    The amino-acid sequence of tyrosinase from Neurospora crassa (monophenol,dihydroxyphenylalanine:oxygen oxidoreductase, EC 1.14.18.1) is reported. This copper-containing oxidase consists of a single polypeptide chain of 407 amino acids. The primary structure was determined by automated and manual sequence analysis on fragments produced by cleavage with cyanogen bromide and on peptides obtained by digestion with trypsin, pepsin, thermolysin, or chymotrypsin. The amino terminus of the protein is acetylated and the single cysteinyl residue 96 is covalently linked via a thioether bridge to histidyl residue 94. The formation and the possible role of this unusual structure in Neurospora tyrosinase is discussed. Dye-sensitized photooxidation of apotyrosinase and active-site-directed inactivation of the native enzyme indicate the possible involvement of histidyl residues 188, 192, 289, and 305 or 306 as ligands to the active-site copper as well as in the catalytic mechanism of this monooxygenase. PMID:151279

  9. Purification, characterization, gene cloning and nucleotide sequencing of D: -stereospecific amino acid amidase from soil bacterium: Delftia acidovorans.

    PubMed

    Hongpattarakere, Tipparat; Komeda, Hidenobu; Asano, Yasuhisa

    2005-12-01

    The D-amino acid amidase-producing bacterium was isolated from soil samples using an enrichment culture technique in medium broth containing D-phenylalanine amide as a sole source of nitrogen. The strain exhibiting the strongest activity was identified as Delftia acidovorans strain 16. This strain produced intracellular D-amino acid amidase constitutively. The enzyme was purified about 380-fold to homogeneity and its molecular mass was estimated to be about 50 kDa, on sodium dodecyl sulfate polyacrylamide gel electrophoresis. The enzyme was active preferentially toward D-amino acid amides rather than their L-counterparts. It exhibited strong amino acid amidase activity toward aromatic amino acid amides including D-phenylalanine amide, D-tryptophan amide and D-tyrosine amide, yet it was not specifically active toward low-molecular-weight D-amino acid amides such as D-alanine amide, L-alanine amide and L-serine amide. Moreover, it was not specifically active toward oligopeptides. The enzyme showed maximum activity at 40 degrees C and pH 8.5 and appeared to be very stable, with 92.5% remaining activity after the reaction was performed at 45 degrees C for 30 min. However, it was mostly inactivated in the presence of phenylmethanesulfonyl fluoride or Cd2+, Ag+, Zn2+, Hg2+ and As3+ . The NH2 terminal and internal amino acid sequences of the enzyme were determined; and the gene was cloned and sequenced. The enzyme gene damA encodes a 466-amino-acid protein (molecular mass 49,860.46 Da); and the deduced amino acid sequence exhibits homology to the D-amino acid amidase from Variovorax paradoxus (67.9% identity), the amidotransferase A subunit from Burkholderia fungorum (50% identity) and other enantioselective amidases.

  10. fCCAC: functional canonical correlation analysis to evaluate covariance between nucleic acid sequencing datasets.

    PubMed

    Madrigal, Pedro

    2017-03-01

    Computational evaluation of variability across DNA or RNA sequencing datasets is a crucial step in genomic science, as it allows both to evaluate reproducibility of biological or technical replicates, and to compare different datasets to identify their potential correlations. Here we present fCCAC, an application of functional canonical correlation analysis to assess covariance of nucleic acid sequencing datasets such as chromatin immunoprecipitation followed by deep sequencing (ChIP-seq). We show how this method differs from other measures of correlation, and exemplify how it can reveal shared covariance between histone modifications and DNA binding proteins, such as the relationship between the H3K4me3 chromatin mark and its epigenetic writers and readers. An R/Bioconductor package is available at http://bioconductor.org/packages/fCCAC/ . pmb59@cam.ac.uk. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  11. A FRET Biosensor for ROCK Based on a Consensus Substrate Sequence Identified by KISS Technology.

    PubMed

    Li, Chunjie; Imanishi, Ayako; Komatsu, Naoki; Terai, Kenta; Amano, Mutsuki; Kaibuchi, Kozo; Matsuda, Michiyuki

    2017-01-11

    Genetically-encoded biosensors based on Förster/fluorescence resonance energy transfer (FRET) are versatile tools for studying the spatio-temporal regulation of signaling molecules within not only the cells but also tissues. Perhaps the hardest task in the development of a FRET biosensor for protein kinases is to identify the kinase-specific substrate peptide to be used in the FRET biosensor. To solve this problem, we took advantage of kinase-interacting substrate screening (KISS) technology, which deduces a consensus substrate sequence for the protein kinase of interest. Here, we show that a consensus substrate sequence for ROCK identified by KISS yielded a FRET biosensor for ROCK, named Eevee-ROCK, with high sensitivity and specificity. By treating HeLa cells with inhibitors or siRNAs against ROCK, we show that a substantial part of the basal FRET signal of Eevee-ROCK was derived from the activities of ROCK1 and ROCK2. Eevee-ROCK readily detected ROCK activation by epidermal growth factor, lysophosphatidic acid, and serum. When cells stably-expressing Eevee-ROCK were time-lapse imaged for three days, ROCK activity was found to increase after the completion of cytokinesis, concomitant with the spreading of cells. Eevee-ROCK also revealed a gradual increase in ROCK activity during apoptosis. Thus, Eevee-ROCK, which was developed from a substrate sequence predicted by the KISS technology, will pave the way to a better understanding of the function of ROCK in a physiological context.

  12. Nanopores and nucleic acids: prospects for ultrarapid sequencing

    NASA Technical Reports Server (NTRS)

    Deamer, D. W.; Akeson, M.

    2000-01-01

    DNA and RNA molecules can be detected as they are driven through a nanopore by an applied electric field at rates ranging from several hundred microseconds to a few milliseconds per molecule. The nanopore can rapidly discriminate between pyrimidine and purine segments along a single-stranded nucleic acid molecule. Nanopore detection and characterization of single molecules represents a new method for directly reading information encoded in linear polymers. If single-nucleotide resolution can be achieved, it is possible that nucleic acid sequences can be determined at rates exceeding a thousand bases per second.

  13. Intravenous phage display identifies peptide sequences that target the burn-injured intestine.

    PubMed

    Costantini, Todd W; Eliceiri, Brian P; Putnam, James G; Bansal, Vishal; Baird, Andrew; Coimbra, Raul

    2012-11-01

    The injured intestine is responsible for significant morbidity and mortality after severe trauma and burn; however, targeting the intestine with therapeutics aimed at decreasing injury has proven difficult. We hypothesized that we could use intravenous phage display technology to identify peptide sequences that target the injured intestinal mucosa in a murine model, and then confirm the cross-reactivity of this peptide sequence with ex vivo human gut. Four hours following 30% TBSA burn we performed an in vivo, intravenous systemic administration of phage library containing 10(12) phage in balb/c mice to biopan for gut-targeting peptides. In vivo assessment of the candidate peptide sequences identified after 4 rounds of internalization was performed by injecting 1×10(12) copies of each selected phage clone into sham or burned animals. Internalization into the gut was assessed using quantitative polymerase chain reaction. We then incubated this gut-targeting peptide sequence with human intestine and visualized fluorescence using confocal microscopy. We identified 3 gut-targeting peptide sequences which caused collapse of the phage library (4-1: SGHQLLLNKMP, 4-5: ILANDLTAPGPR, 4-11: SFKPSGLPAQSL). Sequence 4-5 was internalized into the intestinal mucosa of burned animals 9.3-fold higher than sham animals injected with the same sequence (2.9×10(5)vs. 3.1×10(4) particles per mg tissue). Sequences 4-1 and 4-11 were both internalized into the gut, but did not demonstrate specificity for the injured mucosa. Phage sequence 4-11 demonstrated cross-reactivity with human intestine. In the future, this gut-targeting peptide sequence could serve as a platform for the delivery of biotherapeutics. Copyright © 2012 Elsevier Inc. All rights reserved.

  14. Transcriptomic analysis of rice aleurone cells identified a novel abscisic acid response element.

    PubMed

    Watanabe, Kenneth A; Homayouni, Arielle; Gu, Lingkun; Huang, Kuan-Ying; Ho, Tuan-Hua David; Shen, Qingxi J

    2017-09-01

    Seeds serve as a great model to study plant responses to drought stress, which is largely mediated by abscisic acid (ABA). The ABA responsive element (ABRE) is a key cis-regulatory element in ABA signalling. However, its consensus sequence (ACGTG(G/T)C) is present in the promoters of only about 40% of ABA-induced genes in rice aleurone cells, suggesting other ABREs may exist. To identify novel ABREs, RNA sequencing was performed on aleurone cells of rice seeds treated with 20 μM ABA. Gibbs sampling was used to identify enriched elements, and particle bombardment-mediated transient expression studies were performed to verify the function. Gene ontology analysis was performed to predict the roles of genes containing the novel ABREs. This study revealed 2443 ABA-inducible genes and a novel ABRE, designated as ABREN, which was experimentally verified to mediate ABA signalling in rice aleurone cells. Many of the ABREN-containing genes are predicted to be involved in stress responses and transcription. Analysis of other species suggests that the ABREN may be monocot specific. This study also revealed interesting expression patterns of genes involved in ABA metabolism and signalling. Collectively, this study advanced our understanding of diverse cis-regulatory sequences and the transcriptomes underlying ABA responses in rice aleurone cells. © 2017 John Wiley & Sons Ltd.

  15. PubDNA Finder: a web database linking full-text articles to sequences of nucleic acids.

    PubMed

    García-Remesal, Miguel; Cuevas, Alejandro; Pérez-Rey, David; Martín, Luis; Anguita, Alberto; de la Iglesia, Diana; de la Calle, Guillermo; Crespo, José; Maojo, Víctor

    2010-11-01

    PubDNA Finder is an online repository that we have created to link PubMed Central manuscripts to the sequences of nucleic acids appearing in them. It extends the search capabilities provided by PubMed Central by enabling researchers to perform advanced searches involving sequences of nucleic acids. This includes, among other features (i) searching for papers mentioning one or more specific sequences of nucleic acids and (ii) retrieving the genetic sequences appearing in different articles. These additional query capabilities are provided by a searchable index that we created by using the full text of the 176 672 papers available at PubMed Central at the time of writing and the sequences of nucleic acids appearing in them. To automatically extract the genetic sequences occurring in each paper, we used an original method we have developed. The database is updated monthly by automatically connecting to the PubMed Central FTP site to retrieve and index new manuscripts. Users can query the database via the web interface provided. PubDNA Finder can be freely accessed at http://servet.dia.fi.upm.es:8080/pubdnafinder

  16. Using variable rate models to identify genes under selection in sequence pairs: their validity and limitations for EST sequences.

    PubMed

    Church, Sheri A; Livingstone, Kevin; Lai, Zhao; Kozik, Alexander; Knapp, Steven J; Michelmore, Richard W; Rieseberg, Loren H

    2007-02-01

    Using likelihood-based variable selection models, we determined if positive selection was acting on 523 EST sequence pairs from two lineages of sunflower and lettuce. Variable rate models are generally not used for comparisons of sequence pairs due to the limited information and the inaccuracy of estimates of specific substitution rates. However, previous studies have shown that the likelihood ratio test (LRT) is reliable for detecting positive selection, even with low numbers of sequences. These analyses identified 56 genes that show a signature of selection, of which 75% were not identified by simpler models that average selection across codons. Subsequent mapping studies in sunflower show four of five of the positively selected genes identified by these methods mapped to domestication QTLs. We discuss the validity and limitations of using variable rate models for comparisons of sequence pairs, as well as the limitations of using ESTs for identification of positively selected genes.

  17. Exome Sequencing Fails to Identify the Genetic Cause of Aicardi Syndrome.

    PubMed

    Lund, Caroline; Striano, Pasquale; Sorte, Hanne Sørmo; Parisi, Pasquale; Iacomino, Michele; Sheng, Ying; Vigeland, Magnus D; Øye, Anne-Marte; Møller, Rikke Steensbjerre; Selmer, Kaja K; Zara, Federico

    2016-09-01

    Aicardi syndrome (AS) is a well-characterized neurodevelopmental disorder with an unknown etiology. In this study, we performed whole-exome sequencing in 11 female patients with the diagnosis of AS, in order to identify the disease-causing gene. In particular, we focused on detecting variants in the X chromosome, including the analysis of variants with a low number of sequencing reads, in case of somatic mosaicism. For 2 of the patients, we also sequenced the exome of the parents to search for de novo mutations. We did not identify any genetic variants likely to be damaging. Only one single missense variant was identified by the de novo analyses of the 2 trios, and this was considered benign. The failure to identify a disease gene in this study may be due to technical limitations of our study design, including the possibility that the genetic aberration leading to AS is situated in a non-exonic region or that the mutation is somatic and not detectable by our approach. Alternatively, it is possible that AS is genetically heterogeneous and that 11 patients are not sufficient to reveal the causative genes. Future studies of AS should consider designs where also non-exonic regions are explored and apply a sequencing depth so that also low-grade somatic mosaicism can be detected.

  18. Complete complementary DNA-derived amino acid sequence of canine cardiac phospholamban.

    PubMed Central

    Fujii, J; Ueno, A; Kitano, K; Tanaka, S; Kadoma, M; Tada, M

    1987-01-01

    Complementary DNA (cDNA) clones specific for phospholamban of sarcoplasmic reticulum membranes have been isolated from a canine cardiac cDNA library. The amino acid sequence deduced from the cDNA sequence indicates that phospholamban consists of 52 amino acid residues and lacks an amino-terminal signal sequence. The protein has an inferred mol wt 6,080 that is in agreement with its apparent monomeric mol wt 6,000, estimated previously by sodium dodecyl sulfate-polyacrylamide gel electrophoresis. Phospholamban contains two distinct domains, a hydrophilic region at the amino terminus (domain I) and a hydrophobic region at the carboxy terminus (domain II). We propose that domain I is localized at the cytoplasmic surface and offers phosphorylatable sites whereas domain II is anchored into the sarcoplasmic reticulum membrane. PMID:3793929

  19. Amino acid sequence of bovine muzzle epithelial desmocollin derived from cloned cDNA: a novel subtype of desmosomal cadherins.

    PubMed

    Koch, P J; Goldschmidt, M D; Walsh, M J; Zimbelmann, R; Schmelz, M; Franke, W W

    1991-05-01

    Desmosomes are cell-type-specific intercellular junctions found in epithelium, myocardium and certain other tissues. They consist of assemblies of molecules involved in the adhesion of specific cell types and in the anchorage of cell-type-specific cytoskeletal elements, the intermediate-size filaments, to the plasma membrane. To explore the individual desmosomal components and their functions we have isolated DNA clones encoding the desmosomal glycoprotein, desmocollin, using antibodies and a cDNA expression library from bovine muzzle epithelium. The cDNA-deduced amino-acid sequence of desmocollin (presently we cannot decide to which of the two desmocollins, DC I or DC II, this clone relates) defines a polypeptide with a calculated molecular weight of 85,000, with a single candidate sequence of 24 amino acids sufficiently long for a transmembrane arrangement, and an extracellular aminoterminal portion of 561 amino acid residues, compared to a cytoplasmic part of only 176 amino acids. Amino acid sequence comparisons have revealed that desmocollin is highly homologous to members of the cadherin family of cell adhesion molecules, including the previously sequenced desmoglein, another desmosome-specific cadherin. Using riboprobes derived from cDNAs for Northern-blot analyses, we have identified an mRNA of approximately 6 kb in stratified epithelia such as muzzle epithelium and tongue mucosa but not in two epithelial cell culture lines containing desmosomes and desmoplakins. The difference may indicate drastic differences in mRNA concentration or the existence of cell-type-specific desmocollin subforms. The molecular topology of desmocollin(s) is discussed in relation to possible functions of the individual molecular domains.

  20. The complete amino acid sequence of human erythrocyte diphosphoglycerate mutase.

    PubMed Central

    Haggarty, N W; Dunbar, B; Fothergill, L A

    1983-01-01

    The complete amino acid sequence of human erythrocyte diphosphoglycerate mutase, comprising 239 residues, was determined. The sequence was deduced from the four cyanogen bromide fragments, and from the peptides derived from these fragments after digestion with a number of proteolytic enzymes. Comparison of this sequence with that of the yeast glycolytic enzyme, phosphoglycerate mutase, shows that these enzymes are 47% identical. Most, but not all, of the residues implicated as being important for the activity of the glycolytic mutase are conserved in the erythrocyte diphosphoglycerate mutase. PMID:6313356

  1. Whole exome sequencing identifies a mutation for a novel form of corneal intraepithelial dyskeratosis

    PubMed Central

    Soler, Vincent José; Tran-Viet, Khanh-Nhat; Galiacy, Stéphane D; Limviphuvadh, Vachiranee; Klemm, Thomas Patrick; St Germain, Elizabeth; Fournié, Pierre R; Guillaud, Céline; Maurer-Stroh, Sebastian; Hawthorne, Felicia; Suarez, Cyrielle; Kantelip, Bernadette; Afshari, Natalie A; Creveaux, Isabelle; Luo, Xiaoyan; Meng, Weihua; Calvas, Patrick; Cassagne, Myriam; Arné, Jean-Louis; Rozen, Steven G; Malecaze, François; Young, Terri L

    2014-01-01

    Background Corneal intraepithelial dyskeratosis is an extremely rare condition. The classical form, affecting Native American Haliwa-Saponi tribe members, is called hereditary benign intraepithelial dyskeratosis (HBID). Herein, we present a new form of corneal intraepithelial dyskeratosis for which we identified the causative gene by using deep sequencing technology. Methods and results A seven member Caucasian French family with two corneal intraepithelial dyskeratosis affected individuals (6-year-old proband and his mother) was ascertained. The proband presented with bilateral complete corneal opacification and dyskeratosis. Palmoplantar hyperkeratosis and laryngeal dyskeratosis were associated with the phenotype. Histopathology studies of cornea and vocal cord biopsies showed dyskeratotic keratinisation. Quantitative PCR ruled out 4q35 duplication, classically described in HBID cases. Next generation sequencing with mean coverage of 50× using the Illumina Hi Seq and whole exome capture processing was performed. Sequence reads were aligned, and screened for single nucleotide variants and insertion/deletion calls. In-house pipeline filtering analyses and comparisons with available databases were performed. A novel missense mutation M77T was discovered for the gene NLRP1 which maps to chromosome 17p13.2. This was a de novo mutation in the proband’s mother, following segregation in the family, and not found in 738 control DNA samples. NLRP1 expression was determined in adult corneal epithelium. The amino acid change was found to destabilise significantly the protein structure. Conclusions We describe a new corneal intraepithelial dyskeratosis and how we identified its causative gene. The NLRP1 gene product is implicated in inflammation, autoimmune disorders, and caspase mediated apoptosis. NLRP1 polymorphisms are associated with various diseases. PMID:23349227

  2. Identifying novel sequence variants of RNA 3D motifs

    PubMed Central

    Zirbel, Craig L.; Roll, James; Sweeney, Blake A.; Petrov, Anton I.; Pirrung, Meg; Leontis, Neocles B.

    2015-01-01

    Predicting RNA 3D structure from sequence is a major challenge in biophysics. An important sub-goal is accurately identifying recurrent 3D motifs from RNA internal and hairpin loop sequences extracted from secondary structure (2D) diagrams. We have developed and validated new probabilistic models for 3D motif sequences based on hybrid Stochastic Context-Free Grammars and Markov Random Fields (SCFG/MRF). The SCFG/MRF models are constructed using atomic-resolution RNA 3D structures. To parameterize each model, we use all instances of each motif found in the RNA 3D Motif Atlas and annotations of pairwise nucleotide interactions generated by the FR3D software. Isostericity relations between non-Watson–Crick basepairs are used in scoring sequence variants. SCFG techniques model nested pairs and insertions, while MRF ideas handle crossing interactions and base triples. We use test sets of randomly-generated sequences to set acceptance and rejection thresholds for each motif group and thus control the false positive rate. Validation was carried out by comparing results for four motif groups to RMDetect. The software developed for sequence scoring (JAR3D) is structured to automatically incorporate new motifs as they accumulate in the RNA 3D Motif Atlas when new structures are solved and is available free for download. PMID:26130723

  3. Novel genomic findings in multiple myeloma identified through routine diagnostic sequencing.

    PubMed

    Ryland, Georgina L; Jones, Kate; Chin, Melody; Markham, John; Aydogan, Elle; Kankanige, Yamuna; Caruso, Marisa; Guinto, Jerick; Dickinson, Michael; Prince, H Miles; Yong, Kwee; Blombery, Piers

    2018-05-14

    Multiple myeloma is a genomically complex haematological malignancy with many genomic alterations recognised as important in diagnosis, prognosis and therapeutic decision making. Here, we provide a summary of genomic findings identified through routine diagnostic next-generation sequencing at our centre. A cohort of 86 patients with multiple myeloma underwent diagnostic sequencing using a custom hybridisation-based panel targeting 104 genes. Sequence variants, genome-wide copy number changes and structural rearrangements were detected using an inhouse-developed bioinformatics pipeline. At least one mutation was found in 69 (80%) patients. Frequently mutated genes included TP53 (36%), KRAS (22.1%), NRAS (15.1%), FAM46C/DIS3 (8.1%) and TET2/FGFR3 (5.8%), including multiple mutations not previously described in myeloma. Importantly we observed TP53 mutations in the absence of a 17 p deletion in 8% of the cohort, highlighting the need for sequencing-based assessment in addition to cytogenetics to identify these high-risk patients. Multiple novel copy number changes and immunoglobulin heavy chain translocations are also discussed. Our results demonstrate that many clinically relevant genomic findings remain in multiple myeloma which have not yet been identified through large-scale sequencing efforts, and provide important mechanistic insights into plasma cell pathobiology. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  4. iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition.

    PubMed

    Feng, Peng-Mian; Chen, Wei; Lin, Hao; Chou, Kuo-Chen

    2013-11-01

    Heat shock proteins (HSPs) are a type of functionally related proteins present in all living organisms, both prokaryotes and eukaryotes. They play essential roles in protein-protein interactions such as folding and assisting in the establishment of proper protein conformation and prevention of unwanted protein aggregation. Their dysfunction may cause various life-threatening disorders, such as Parkinson's, Alzheimer's, and cardiovascular diseases. Based on their functions, HSPs are usually classified into six families: (i) HSP20 or sHSP, (ii) HSP40 or J-class proteins, (iii) HSP60 or GroEL/ES, (iv) HSP70, (v) HSP90, and (vi) HSP100. Although considerable progress has been achieved in discriminating HSPs from other proteins, it is still a big challenge to identify HSPs among their six different functional types according to their sequence information alone. With the avalanche of protein sequences generated in the post-genomic age, it is highly desirable to develop a high-throughput computational tool in this regard. To take up such a challenge, a predictor called iHSP-PseRAAAC has been developed by incorporating the reduced amino acid alphabet information into the general form of pseudo amino acid composition. One of the remarkable advantages of introducing the reduced amino acid alphabet is being able to avoid the notorious dimension disaster or overfitting problem in statistical prediction. It was observed that the overall success rate achieved by iHSP-PseRAAAC in identifying the functional types of HSPs among the aforementioned six types was more than 87%, which was derived by the jackknife test on a stringent benchmark dataset in which none of HSPs included has ≥40% pairwise sequence identity to any other in the same subset. It has not escaped our notice that the reduced amino acid alphabet approach can also be used to investigate other protein classification problems. As a user-friendly web server, iHSP-PseRAAAC is accessible to the public at http

  5. Forensic Loci Allele Database (FLAD): Automatically generated, permanent identifiers for sequenced forensic alleles.

    PubMed

    Van Neste, Christophe; Van Criekinge, Wim; Deforce, Dieter; Van Nieuwerburgh, Filip

    2016-01-01

    It is difficult to predict if and when massively parallel sequencing of forensic STR loci will replace capillary electrophoresis as the new standard technology in forensic genetics. The main benefits of sequencing are increased multiplexing scales and SNP detection. There is not yet a consensus on how sequenced profiles should be reported. We present the Forensic Loci Allele Database (FLAD) service, made freely available on http://forensic.ugent.be/FLAD/. It offers permanent identifiers for sequenced forensic alleles (STR or SNP) and their microvariants for use in forensic allele nomenclature. Analogous to Genbank, its aim is to provide permanent identifiers for forensically relevant allele sequences. Researchers that are developing forensic sequencing kits or are performing population studies, can register on http://forensic.ugent.be/FLAD/ and add loci and allele sequences with a short and simple application interface (API). Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  6. Quantum-Sequencing: Biophysics of quantum tunneling through nucleic acids

    NASA Astrophysics Data System (ADS)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    Tunneling microscopy and spectroscopy has extensively been used in physical surface sciences to study quantum tunneling to measure electronic local density of states of nanomaterials and to characterize adsorbed species. Quantum-Sequencing (Q-Seq) is a new method based on tunneling microscopy for electronic sequencing of single molecule of nucleic acids. A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free single-molecule sequencing method. Here, we present the unique ``electronic fingerprints'' for all nucleotides on DNA and RNA using Q-Seq along their intrinsic biophysical parameters. We have analyzed tunneling spectra for the nucleotides at different pH conditions and analyzed the HOMO, LUMO and energy gap for all of them. In addition we show a number of biophysical parameters to further characterize all nucleobases (electron and hole transition voltage and energy barriers). These results highlight the robustness of Q-Seq as a technique for next-generation sequencing.

  7. Amino acid sequence of the Amur tiger prion protein.

    PubMed

    Wu, Changde; Pang, Wanyong; Zhao, Deming

    2006-10-01

    Prion diseases are fatal neurodegenerative disorders in human and animal associated with conformational conversion of a cellular prion protein (PrP(C)) into the pathologic isoform (PrP(Sc)). Various data indicate that the polymorphisms within the open reading frame (ORF) of PrP are associated with the susceptibility and control the species barrier in prion diseases. In the present study, partial Prnp from 25 Amur tigers (tPrnp) were cloned and screened for polymorphisms. Four single nucleotide polymorphisms (T423C, A501G, C511A, A610G) were found; the C511A and A610G nucleotide substitutions resulted in the amino acid changes Lysine171Glutamine and Alanine204Threoine, respectively. The tPrnp amino acid sequence is similar to house cat (Felis catus ) and sheep, but differs significantly from other two cat Prnp sequences that were previously deposited in GenBank.

  8. HIV-1 envelope sequence-based diversity measures for identifying recent infections

    PubMed Central

    Kafando, Alexis; Fournier, Eric; Serhir, Bouchra; Martineau, Christine; Doualla-Bell, Florence; Sangaré, Mohamed Ndongo; Sylla, Mohamed; Chamberland, Annie; El-Far, Mohamed; Charest, Hugues

    2017-01-01

    Identifying recent HIV-1 infections is crucial for monitoring HIV-1 incidence and optimizing public health prevention efforts. To identify recent HIV-1 infections, we evaluated and compared the performance of 4 sequence-based diversity measures including percent diversity, percent complexity, Shannon entropy and number of haplotypes targeting 13 genetic segments within the env gene of HIV-1. A total of 597 diagnostic samples obtained in 2013 and 2015 from recently and chronically HIV-1 infected individuals were selected. From the selected samples, 249 (134 from recent versus 115 from chronic infections) env coding regions, including V1-C5 of gp120 and the gp41 ectodomain of HIV-1, were successfully amplified and sequenced by next generation sequencing (NGS) using the Illumina MiSeq platform. The ability of the four sequence-based diversity measures to correctly identify recent HIV infections was evaluated using the frequency distribution curves, median and interquartile range and area under the curve (AUC) of the receiver operating characteristic (ROC). Comparing the median and interquartile range and evaluating the frequency distribution curves associated with the 4 sequence-based diversity measures, we observed that the percent diversity, number of haplotypes and Shannon entropy demonstrated significant potential to discriminate recent from chronic infections (p<0.0001). Using the AUC of ROC analysis, only the Shannon entropy measure within three HIV-1 env segments could accurately identify recent infections at a satisfactory level. The env segments were gp120 C2_1 (AUC = 0.806), gp120 C2_3 (AUC = 0.805) and gp120 V3 (AUC = 0.812). Our results clearly indicate that the Shannon entropy measure represents a useful tool for predicting HIV-1 infection recency. PMID:29284009

  9. HIV-1 envelope sequence-based diversity measures for identifying recent infections.

    PubMed

    Kafando, Alexis; Fournier, Eric; Serhir, Bouchra; Martineau, Christine; Doualla-Bell, Florence; Sangaré, Mohamed Ndongo; Sylla, Mohamed; Chamberland, Annie; El-Far, Mohamed; Charest, Hugues; Tremblay, Cécile L

    2017-01-01

    Identifying recent HIV-1 infections is crucial for monitoring HIV-1 incidence and optimizing public health prevention efforts. To identify recent HIV-1 infections, we evaluated and compared the performance of 4 sequence-based diversity measures including percent diversity, percent complexity, Shannon entropy and number of haplotypes targeting 13 genetic segments within the env gene of HIV-1. A total of 597 diagnostic samples obtained in 2013 and 2015 from recently and chronically HIV-1 infected individuals were selected. From the selected samples, 249 (134 from recent versus 115 from chronic infections) env coding regions, including V1-C5 of gp120 and the gp41 ectodomain of HIV-1, were successfully amplified and sequenced by next generation sequencing (NGS) using the Illumina MiSeq platform. The ability of the four sequence-based diversity measures to correctly identify recent HIV infections was evaluated using the frequency distribution curves, median and interquartile range and area under the curve (AUC) of the receiver operating characteristic (ROC). Comparing the median and interquartile range and evaluating the frequency distribution curves associated with the 4 sequence-based diversity measures, we observed that the percent diversity, number of haplotypes and Shannon entropy demonstrated significant potential to discriminate recent from chronic infections (p<0.0001). Using the AUC of ROC analysis, only the Shannon entropy measure within three HIV-1 env segments could accurately identify recent infections at a satisfactory level. The env segments were gp120 C2_1 (AUC = 0.806), gp120 C2_3 (AUC = 0.805) and gp120 V3 (AUC = 0.812). Our results clearly indicate that the Shannon entropy measure represents a useful tool for predicting HIV-1 infection recency.

  10. Method for high-volume sequencing of nucleic acids: random and directed priming with libraries of oligonucleotides

    DOEpatents

    Studier, F. William

    1995-04-18

    Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient.

  11. Method for high-volume sequencing of nucleic acids: random and directed priming with libraries of oligonucleotides

    DOEpatents

    Studier, F.W.

    1995-04-18

    Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient. 2 figs.

  12. Deep Sequencing Reveals the Complete Genome and Evidence for Transcriptional Activity of the First Virus-Like Sequences Identified in Aristotelia chilensis (Maqui Berry)

    PubMed Central

    Villacreses, Javier; Rojas-Herrera, Marcelo; Sánchez, Carolina; Hewstone, Nicole; Undurraga, Soledad F.; Alzate, Juan F.; Manque, Patricio; Maracaja-Coutinho, Vinicius; Polanco, Victor

    2015-01-01

    Here, we report the genome sequence and evidence for transcriptional activity of a virus-like element in the native Chilean berry tree Aristotelia chilensis. We propose to name the endogenous sequence as Aristotelia chilensis Virus 1 (AcV1). High-throughput sequencing of the genome of this tree uncovered an endogenous viral element, with a size of 7122 bp, corresponding to the complete genome of AcV1. Its sequence contains three open reading frames (ORFs): ORFs 1 and 2 shares 66%–73% amino acid similarity with members of the Caulimoviridae virus family, especially the Petunia vein clearing virus (PVCV), Petuvirus genus. ORF1 encodes a movement protein (MP); ORF2 a Reverse Transcriptase (RT) and a Ribonuclease H (RNase H) domain; and ORF3 showed no amino acid sequence similarity with any other known virus proteins. Analogous to other known endogenous pararetrovirus sequences (EPRVs), AcV1 is integrated in the genome of Maqui Berry and showed low viral transcriptional activity, which was detected by deep sequencing technology (DNA and RNA-seq). Phylogenetic analysis of AcV1 and other pararetroviruses revealed a closer resemblance with Petuvirus. Overall, our data suggests that AcV1 could be a new member of Caulimoviridae family, genus Petuvirus, and the first evidence of this kind of virus in a fruit plant. PMID:25855242

  13. Predicting protein amidation sites by orchestrating amino acid sequence features

    NASA Astrophysics Data System (ADS)

    Zhao, Shuqiu; Yu, Hua; Gong, Xiujun

    2017-08-01

    Amidation is the fourth major category of post-translational modifications, which plays an important role in physiological and pathological processes. Identifying amidation sites can help us understanding the amidation and recognizing the original reason of many kinds of diseases. But the traditional experimental methods for predicting amidation sites are often time-consuming and expensive. In this study, we propose a computational method for predicting amidation sites by orchestrating amino acid sequence features. Three kinds of feature extraction methods are used to build a feature vector enabling to capture not only the physicochemical properties but also position related information of the amino acids. An extremely randomized trees algorithm is applied to choose the optimal features to remove redundancy and dependence among components of the feature vector by a supervised fashion. Finally the support vector machine classifier is used to label the amidation sites. When tested on an independent data set, it shows that the proposed method performs better than all the previous ones with the prediction accuracy of 0.962 at the Matthew's correlation coefficient of 0.89 and area under curve of 0.964.

  14. Purification, amino acid sequence and characterisation of kangaroo IGF-I.

    PubMed

    Yandell, C A; Francis, G L; Wheldrake, J F; Upton, Z

    1998-01-01

    Insulin-like growth factor-I (IGF-I) and IGF-II have been purified to homogeneity from kangaroo (Macropus fuliginosus) serum, thus this represents the first report of the purification, sequencing and characterisation of marsupial IGFs. N-Terminal protein sequencing reveals that there are six amino acid differences between kangaroo and human IGF-I. Kangaroo IGF-II has been partially sequenced and no differences were found between human and kangaroo IGF-II in the 53 residues identified. Thus the IGFs appear to be remarkably structurally conserved during mammalian radiation. In addition, in vitro characterisation of kangaroo IGF-I demonstrated that the functional properties of human, kangaroo and chicken IGF-I are very similar. In an assay measuring the ability of the proteins to stimulate protein synthesis in rat L6 myoblasts, all IGF-I proteins were found to be equally potent. The ability of all three proteins to compete for binding with radiolabelled human IGF-I to type-1 IGF receptors in L6 myoblasts and in Sminthopsis crassicaudata transformed lung fibroblasts, a marsupial cell line, was comparable. Furthermore, kangaroo and human IGF-I react equally in a human IGF-I RIA using a human reference standard, radiolabelled human IGF-I and a polyclonal antibody raised against recombinant human IGF-I. This study indicates that not only is the primary structure of eutherian and metatherian IGF-I conserved, but also the proteins appear to be functionally similar.

  15. Correlation approach to identify coding regions in DNA sequences

    NASA Technical Reports Server (NTRS)

    Ossadnik, S. M.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1994-01-01

    Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.

  16. Conservation of Shannon's redundancy for proteins. [information theory applied to amino acid sequences

    NASA Technical Reports Server (NTRS)

    Gatlin, L. L.

    1974-01-01

    Concepts of information theory are applied to examine various proteins in terms of their redundancy in natural originators such as animals and plants. The Monte Carlo method is used to derive information parameters for random protein sequences. Real protein sequence parameters are compared with the standard parameters of protein sequences having a specific length. The tendency of a chain to contain some amino acids more frequently than others and the tendency of a chain to contain certain amino acid pairs more frequently than other pairs are used as randomness measures of individual protein sequences. Non-periodic proteins are generally found to have random Shannon redundancies except in cases of constraints due to short chain length and genetic codes. Redundant characteristics of highly periodic proteins are discussed. A degree of periodicity parameter is derived.

  17. Molecular characterization of a novel Luteovirus from peach identified by high-throughput sequencing

    USDA-ARS?s Scientific Manuscript database

    Contigs with sequence homologies to Cherry-associated luteovirus were identified by high-throughput sequencing analysis of two peach accessions undergoing quarantine testing. The complete genomic sequences of the two isolates of this virus are 5,819 and 5,814 nucleotides. Their genome organization i...

  18. Complete Amino Acid Sequence of a Copper/Zinc-Superoxide Dismutase from Ginger Rhizome.

    PubMed

    Nishiyama, Yuki; Fukamizo, Tamo; Yoneda, Kazunari; Araki, Tomohiro

    2017-04-01

    Superoxide dismutase (SOD) is an antioxidant enzyme protecting cells from oxidative stress. Ginger (Zingiber officinale) is known for its antioxidant properties, however, there are no data on SODs from ginger rhizomes. In this study, we purified SOD from the rhizome of Z. officinale (Zo-SOD) and determined its complete amino acid sequence using N terminal sequencing, amino acid analysis, and de novo sequencing by tandem mass spectrometry. Zo-SOD consists of 151 amino acids with two signature Cu/Zn-SOD motifs and has high similarity to other plant Cu/Zn-SODs. Multiple sequence alignment showed that Cu/Zn-binding residues and cysteines forming a disulfide bond, which are highly conserved in Cu/Zn-SODs, are also present in Zo-SOD. Phylogenetic analysis revealed that plant Cu/Zn-SODs clustered into distinct chloroplastic, cytoplasmic, and intermediate groups. Among them, only chloroplastic enzymes carried amino acid substitutions in the region functionally important for enzymatic activity, suggesting that chloroplastic SODs may have a function distinct from those of SODs localized in other subcellular compartments. The nucleotide sequence of the Zo-SOD coding region was obtained by reverse-translation, and the gene was synthesized, cloned, and expressed. The recombinant Zo-SOD demonstrated pH stability in the range of 5-10, which is similar to other reported Cu/Zn-SODs, and thermal stability in the range of 10-60 °C, which is higher than that for most plant Cu/Zn-SODs but lower compared to the enzyme from a Z. officinale relative Curcuma aromatica.

  19. Protein location prediction using atomic composition and global features of the amino acid sequence

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cherian, Betsy Sheena, E-mail: betsy.skb@gmail.com; Nair, Achuthsankar S.

    2010-01-22

    Subcellular location of protein is constructive information in determining its function, screening for drug candidates, vaccine design, annotation of gene products and in selecting relevant proteins for further studies. Computational prediction of subcellular localization deals with predicting the location of a protein from its amino acid sequence. For a computational localization prediction method to be more accurate, it should exploit all possible relevant biological features that contribute to the subcellular localization. In this work, we extracted the biological features from the full length protein sequence to incorporate more biological information. A new biological feature, distribution of atomic composition is effectivelymore » used with, multiple physiochemical properties, amino acid composition, three part amino acid composition, and sequence similarity for predicting the subcellular location of the protein. Support Vector Machines are designed for four modules and prediction is made by a weighted voting system. Our system makes prediction with an accuracy of 100, 82.47, 88.81 for self-consistency test, jackknife test and independent data test respectively. Our results provide evidence that the prediction based on the biological features derived from the full length amino acid sequence gives better accuracy than those derived from N-terminal alone. Considering the features as a distribution within the entire sequence will bring out underlying property distribution to a greater detail to enhance the prediction accuracy.« less

  20. Human retroviruses and AIDS 1996. A compilation and analysis of nucleic acid and amino acid sequences

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Myers, G.; Foley, B.; Korber, B.

    1997-04-01

    This compendium and the accompanying floppy diskettes are the result of an effort to compile and rapidly publish all relevant molecular data concerning the human immunodeficiency viruses (HIV) and related retroviruses. The scope of the compendium and database is best summarized by the five parts that it comprises: (1) Nuclear Acid Alignments and Sequences; (2) Amino Acid Alignments; (3) Analysis; (4) Related Sequences; and (5) Database Communications. Information within all the parts is updated throughout the year on the Web site, http://hiv-web.lanl.gov. While this publication could take the form of a review or sequence monograph, it is not so conceived.more » Instead, the literature from which the database is derived has simply been summarized and some elementary computational analyses have been performed upon the data. Interpretation and commentary have been avoided insofar as possible so that the reader can form his or her own judgments concerning the complex information. In addition to the general descriptions of the parts of the compendium, the user should read the individual introductions for each part.« less

  1. Streptococcal phosphoenolpyruvate-sugar phosphotransferase system: amino acid sequence and site of ATP-dependent phosphorylation of HPr

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Deutscher, J.; Pevec, B.; Beyreuther, K.

    1986-10-21

    The amino acid sequence of histidine-containing protein (HPr) from Streptococcus faecalis has been determined by direct Edman degradation of intact HPr and by amino acid sequence analysis of tryptic peptides, V8 proteolyptic peptides, thermolytic peptides, and cyanogen bromide cleavage products. HPr from S. faecalis was found to contain 89 amino acid residues, corresponding to a molecular weight of 9438. The amino acid sequence of HPr from S. faecalis shows extended homology to the primary structure of HPr proteins from other bacteria. Besides the phosphoenolpyruvate-dependent phosphorylation of a histidyl residue in HPr, catalyzed by enzyme I of the bacterial phosphotransferase system,more » HPr was also found to be phosphorylated at a seryl residue in an ATP-dependent protein kinase catalyzed reaction. The site of ATP-dependent phosphorylation in HPr of S faecalis has now been determined. (/sup 32/P)P-Ser-HPr was digested with three different proteases, and in each case, a single labeled peptide was isolated. Following digestion with subtilisin, they obtained a peptide with the sequence -(P)Ser-Ile-Met-. Using chymotrypsin, they isolated a peptide with the sequence -Ser-Val-Asn-Leu-Lys-(P)Ser-Ile-Met-Gly-Val-Met-. The longest labeled peptide was obtained with V8 staphylococcal protease. According to amino acid analysis, this peptide contained 36 out of the 89 amino acid residues of HPr. The following sequence of 12 amino acid residues of the V8 peptide was determined: -Tyr-Lys-Gly-Lys-Ser-Val-Asn-Leu-Lys-(P)Ser-Ile-Met-. Thus, the site of ATP-dependent phosphorylation was determined to be Ser-46 within the primary structure of HPr.« less

  2. Diagnostics based on nucleic acid sequence variant profiling: PCR, hybridization, and NGS approaches.

    PubMed

    Khodakov, Dmitriy; Wang, Chunyan; Zhang, David Yu

    2016-10-01

    Nucleic acid sequence variations have been implicated in many diseases, and reliable detection and quantitation of DNA/RNA biomarkers can inform effective therapeutic action, enabling precision medicine. Nucleic acid analysis technologies being translated into the clinic can broadly be classified into hybridization, PCR, and sequencing, as well as their combinations. Here we review the molecular mechanisms of popular commercial assays, and their progress in translation into in vitro diagnostics. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  3. Experimental assessment of the importance of amino acid positions identified by an entropy-based correlation analysis of multiple-sequence alignments.

    PubMed

    Dietrich, Susanne; Borst, Nadine; Schlee, Sandra; Schneider, Daniel; Janda, Jan-Oliver; Sterner, Reinhard; Merkl, Rainer

    2012-07-17

    The analysis of a multiple-sequence alignment (MSA) with correlation methods identifies pairs of residue positions whose occupation with amino acids changes in a concerted manner. It is plausible to assume that positions that are part of many such correlation pairs are important for protein function or stability. We have used the algorithm H2r to identify positions k in the MSAs of the enzymes anthranilate phosphoribosyl transferase (AnPRT) and indole-3-glycerol phosphate synthase (IGPS) that show a high conn(k) value, i.e., a large number of significant correlations in which k is involved. The importance of the identified residues was experimentally validated by performing mutagenesis studies with sAnPRT and sIGPS from the archaeon Sulfolobus solfataricus. For sAnPRT, five H2r mutant proteins were generated by replacing nonconserved residues with alanine or the prevalent residue of the MSA. As a control, five residues with conn(k) values of zero were chosen randomly and replaced with alanine. The catalytic activities and conformational stabilities of the H2r and control mutant proteins were analyzed by steady-state enzyme kinetics and thermal unfolding studies. Compared to wild-type sAnPRT, the catalytic efficiencies (k(cat)/K(M)) were largely unaltered. In contrast, the apparent thermal unfolding temperature (T(M)(app)) was lowered in most proteins. Remarkably, the strongest observed destabilization (ΔT(M)(app) = 14 °C) was caused by the V284A exchange, which pertains to the position with the highest correlation signal [conn(k) = 11]. For sIGPS, six H2r mutant and four control proteins with alanine exchanges were generated and characterized. The k(cat)/K(M) values of four H2r mutant proteins were reduced between 13- and 120-fold, and their T(M)(app) values were decreased by up to 5 °C. For the sIGPS control proteins, the observed activity and stability decreases were much less severe. Our findings demonstrate that positions with high conn(k) values have an

  4. "De-novo" amino acid sequence elucidation of protein G'e by combined "top-down" and "bottom-up" mass spectrometry.

    PubMed

    Yefremova, Yelena; Al-Majdoub, Mahmoud; Opuni, Kwabena F M; Koy, Cornelia; Cui, Weidong; Yan, Yuetian; Gross, Michael L; Glocker, Michael O

    2015-03-01

    Mass spectrometric de-novo sequencing was applied to review the amino acid sequence of a commercially available recombinant protein G´ with great scientific and economic importance. Substantial deviations to the published amino acid sequence (Uniprot Q54181) were found by the presence of 46 additional amino acids at the N-terminus, including a so-called "His-tag" as well as an N-terminal partial α-N-gluconoylation and α-N-phosphogluconoylation, respectively. The unexpected amino acid sequence of the commercial protein G' comprised 241 amino acids and resulted in a molecular mass of 25,998.9 ± 0.2 Da for the unmodified protein. Due to the higher mass that is caused by its extended amino acid sequence compared with the original protein G' (185 amino acids), we named this protein "protein G'e." By means of mass spectrometric peptide mapping, the suggested amino acid sequence, as well as the N-terminal partial α-N-gluconoylations, was confirmed with 100% sequence coverage. After the protein G'e sequence was determined, we were able to determine the expression vector pET-28b from Novagen with the Xho I restriction enzyme cleavage site as the best option that was used for cloning and expressing the recombinant protein G'e in E. coli. A dissociation constant (K(d)) value of 9.4 nM for protein G'e was determined thermophoretically, showing that the N-terminal flanking sequence extension did not cause significant changes in the binding affinity to immunoglobulins.

  5. HomSI: a homozygous stretch identifier from next-generation sequencing data.

    PubMed

    Görmez, Zeliha; Bakir-Gungor, Burcu; Sagiroglu, Mahmut Samil

    2014-02-01

    In consanguineous families, as a result of inheriting the same genomic segments through both parents, the individuals have stretches of their genomes that are homozygous. This situation leads to the prevalence of recessive diseases among the members of these families. Homozygosity mapping is based on this observation, and in consanguineous families, several recessive disease genes have been discovered with the help of this technique. The researchers typically use single nucleotide polymorphism arrays to determine the homozygous regions and then search for the disease gene by sequencing the genes within this candidate disease loci. Recently, the advent of next-generation sequencing enables the concurrent identification of homozygous regions and the detection of mutations relevant for diagnosis, using data from a single sequencing experiment. In this respect, we have developed a novel tool that identifies homozygous regions using deep sequence data. Using *.vcf (variant call format) files as an input file, our program identifies the majority of homozygous regions found by microarray single nucleotide polymorphism genotype data. HomSI software is freely available at www.igbam.bilgem.tubitak.gov.tr/softwares/HomSI, with an online manual.

  6. Exome capture sequencing identifies a novel mutation in BBS4

    PubMed Central

    Wang, Hui; Chen, Xianfeng; Dudinsky, Lynn; Patenia, Claire; Chen, Yiyun; Li, Yumei; Wei, Yue; Abboud, Emad B.; Al-Rajhi, Ali A.; Lewis, Richard Alan; Lupski, James R.; Mardon, Graeme; Gibbs, Richard A.; Perkins, Brian D.

    2011-01-01

    Purpose Leber congenital amaurosis (LCA) is one of the most severe eye dystrophies characterized by severe vision loss at an early stage and accounts for approximately 5% of all retinal dystrophies. The purpose of this study was to identify a novel LCA disease allele or gene and to develop an approach combining genetic mapping with whole exome sequencing. Methods Three patients from King Khaled Eye Specialist Hospital (KKESH205) underwent whole genome single nucleotide polymorphism genotyping, and a single candidate region was identified. Taking advantage of next-generation high-throughput DNA sequencing technologies, whole exome capture sequencing was performed on patient KKESH205#7. Sanger direct sequencing was used during the validation step. The zebrafish model was used to examine the function of the mutant allele. Results A novel missense mutation in Bardet-Biedl syndrome 4 protein (BBS4) was identified in a consanguineous family from Saudi Arabia. This missense mutation in the fifth exon (c.253G>C;p.E85Q) of BBS4 is likely a disease-causing mutation as it segregates with the disease. The mutation is not found in the single nucleotide polymorphism (SNP) database, the 1000 Genomes Project, or matching normal controls. Functional analysis of this mutation in zebrafish indicates that the G253C allele is pathogenic. Coinjection of the G253C allele cannot rescue the mislocalization of rhodopsin in the retina when BBS4 is knocked down by morpholino injection. Immunofluorescence analysis in cell culture shows that this missense mutation in BBS4 does not cause obvious defects in protein expression or pericentriolar localization. Conclusions This mutation likely mainly reduces or abolishes BBS4 function in the retina. Further studies of this allele will provide important insights concerning the pleiotropic nature of BBS4 function. PMID:22219648

  7. Correlation between fibroin amino acid sequence and physical silk properties.

    PubMed

    Fedic, Robert; Zurovec, Michal; Sehnal, Frantisek

    2003-09-12

    The fiber properties of lepidopteran silk depend on the amino acid repeats that interact during H-fibroin polymerization. The aim of our research was to relate repeat composition to insect biology and fiber strength. Representative regions of the H-fibroin genes were sequenced and analyzed in three pyralid species: wax moth (Galleria mellonella), European flour moth (Ephestia kuehniella), and Indian meal moth (Plodia interpunctella). The amino acid repeats are species-specific, evidently a diversification of an ancestral region of 43 residues, and include three types of regularly dispersed motifs: modifications of GSSAASAA sequence, stretches of tripeptides GXZ where X and Z represent bulky residues, and sequences similar to PVIVIEE. No concatenations of GX dipeptide or alanine, which are typical for Bombyx silkworms and Antheraea silk moths, respectively, were found. Despite different repeat structure, the silks of G. mellonella and E. kuehniella exhibit similar tensile strength as the Bombyx and Antheraea silks. We suggest that in these latter two species, variations in the repeat length obstruct repeat alignment, but sufficiently long stretches of iterated residues get superposed to interact. In the pyralid H-fibroins, interactions of the widely separated and diverse motifs depend on the precision of repeat matching; silk is strong in G. mellonella and E. kuehniella, with 2-3 types of long homogeneous repeats, and nearly 10 times weaker in P. interpunctella, with seven types of shorter erratic repeats. The high proportion of large amino acids in the H-fibroin of pyralids has probably evolved in connection with the spinning habit of caterpillars that live in protective silk tubes and spin continuously, enlarging the tubes on one end and partly devouring the other one. The silk serves as a depot of energetically rich and essential amino acids that may be scarce in the diet.

  8. Quantum-Sequencing: Fast electronic single DNA molecule sequencing

    NASA Astrophysics Data System (ADS)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free, high-throughput and cost-effective, single-molecule sequencing method. Here, we present the first demonstration of unique ``electronic fingerprint'' of all nucleotides (A, G, T, C), with single-molecule DNA sequencing, using Quantum-tunneling Sequencing (Q-Seq) at room temperature. We show that the electronic state of the nucleobases shift depending on the pH, with most distinct states identified at acidic pH. We also demonstrate identification of single nucleotide modifications (methylation here). Using these unique electronic fingerprints (or tunneling data), we report a partial sequence of beta lactamase (bla) gene, which encodes resistance to beta-lactam antibiotics, with over 95% success rate. These results highlight the potential of Q-Seq as a robust technique for next-generation sequencing.

  9. Genome wide identification of microRNAs involved in fatty acid and lipid metabolism of Brassica napus by small RNA and degradome sequencing.

    PubMed

    Wang, Zhiwei; Qiao, Yan; Zhang, Jingjing; Shi, Wenhui; Zhang, Jinwen

    2017-07-01

    Rapeseed (Brassica napus) is an important cash crop considered as the third largest oil crop worldwide. Rapeseed oil contains various saturation or unsaturation fatty acids, these fatty acids, whose could incorporation with TAG form into lipids stored in seeds play various roles in the metabolic activity. The different fatty acids in B. napus seeds determine oil quality, define if the oil is edible or must be used as industrial material. miRNAs are kind of non-coding sRNAs that could regulate gene expressions through post-transcriptional modification to their target transcripts playing important roles in plant metabolic activities. We employed high-throughput sequencing to identify the miRNAs and their target transcripts involved in fatty acids and lipids metabolism in different development of B. napus seeds. As a result, we identified 826 miRNA sequences, including 523 conserved and 303 newly miRNAs. From the degradome sequencing, we found 589 mRNA could be targeted by 236 miRNAs, it includes 49 novel miRNAs and 187 conserved miRNAs. The miRNA-target couple suggests that bna-5p-163957_18, bna-5p-396192_7, miR9563a-p3, miR9563b-p5, miR838-p3, miR156e-p3, miR159c and miR1134 could target PDP, LACS9, MFPA, ADSL1, ACO32, C0401, GDL73, PlCD6, OLEO3 and WSD1. These target transcripts are involving in acetyl-CoA generate and carbon chain desaturase, regulating the levels of very long chain fatty acids, β-oxidation and lipids transport and metabolism process. At the same, we employed the q-PCR to valid the expression of miRNAs and their target transcripts that involve in fatty acid and lipid metabolism, the result suggested that the miRNA and their transcript expression are negative correlation, which in accord with the expression of miRNA and its target transcript. The study findings suggest that the identified miRNA may play important role in the fatty acids and lipids metabolism in seeds of B. napus. Copyright © 2017 The Author(s). Published by Elsevier B.V. All

  10. Nucleic acid analysis using terminal-phosphate-labeled nucleotides

    DOEpatents

    Korlach, Jonas [Ithaca, NY; Webb, Watt W [Ithaca, NY; Levene, Michael [Ithaca, NY; Turner, Stephen [Ithaca, NY; Craighead, Harold G [Ithaca, NY; Foquet, Mathieu [Ithaca, NY

    2008-04-22

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  11. [Identifying and sequence analysis of HLA-B*2736].

    PubMed

    Li, Zhen; Zou, Hong-Yan; Shao, Chao-Peng; Tang, Si; Wang, Da-Ming; Cheng, Liang-Hong

    2007-11-01

    An unknown HLA-B allele which was similar to HLA-B*270401 was detected by FLOW-SSOPCR-SSP and heterozygous sequence-based typing (SBT) in Chinese Han individual. Its anomalous patterns suggested the possible presence of new allele. Amplifying exon 2-5(include intron 2-4) of the HLA-B*27 allele separately by using allele-specific primers and sequencing in both directions. Identifying the difference between the novel B*27 allele and B*270401. The sequence of novel B*27 from exon 2 to partial exon 5 is 1 815 bp. There are 10 nt changes from B*270401 in exon 3-4, at nt634where A-->C(codon130 AGC-->CGC, 130 S-->R); nt670 where A-->T (codon142 ACC-->TCC, 142 T-->S); nt683 where G-->T (codon146 TGG-->TTG, 146 W-->L); nt698 where A-->T (codon151 GAG-->GTG, 151 E-->V); nt774 where G-->C (codon176 GAG-->GAC, 176 E-->D); nt776 where C-->A (codon177 ACG-->AAG, 177 T-->K); nt781 where C-->G (codon179 CAG-->GAG, 179Q-->E); nt789 where G-->T (codon181 GCG-->GCT) resulting no coding change; nt1438 where C-->T (codon206 GGC-->GGT) resulting no coding change; nt1449 where G-->C (codon210 GGG-->GCG, 210G-->A). In IMGT/HLA database, only three alleles (B*270502/2706/2732) have sequences of introns. The same sequence in intron 2 showed homology between the novel HLA-B*27 allele and B*2706, but their homology could not be supported in intron 3-4. Comparing the sequence of the novel B*27 allele in intron 3 and 4 with B*27 group, it showed there are three mutations at nt106 C-->G, nt179 G-->A, nt536 G-->A and one deletion at nt168 in intron 3 and one mutations at nt82 T-->C in intron 4, but the sequence of the novel B*27 allele in intron 3 and 4 was all the same to B*070201. The sequence was submitted to Gen-Bank and the accession number was DQ915176. The allele has been confirmed as an extension of B*2736 by the WHO Nomenclature committee in November 2006.

  12. Position-dependent effects of locked nucleic acid (LNA) on DNA sequencing and PCR primers

    PubMed Central

    Levin, Joshua D.; Fiala, Dean; Samala, Meinrado F.; Kahn, Jason D.; Peterson, Raymond J.

    2006-01-01

    Genomes are becoming heavily annotated with important features. Analysis of these features often employs oligonucleotides that hybridize at defined locations. When the defined location lies in a poor sequence context, traditional design strategies may fail. Locked Nucleic Acid (LNA) can enhance oligonucleotide affinity and specificity. Though LNA has been used in many applications, formal design rules are still being defined. To further this effort we have investigated the effect of LNA on the performance of sequencing and PCR primers in AT-rich regions, where short primers yield poor sequencing reads or PCR yields. LNA was used in three positional patterns: near the 5′ end (LNA-5′), near the 3′ end (LNA-3′) and distributed throughout (LNA-Even). Quantitative measures of sequencing read length (Phred Q30 count) and real-time PCR signal (cycle threshold, CT) were characterized using two-way ANOVA. LNA-5′ increased the average Phred Q30 score by 60% and it was never observed to decrease performance. LNA-5′ generated cycle thresholds in quantitative PCR that were comparable to high-yielding conventional primers. In contrast, LNA-3′ and LNA-Even did not improve read lengths or CT. ANOVA demonstrated the statistical significance of these results and identified significant interaction between the positional design rule and primer sequence. PMID:17071964

  13. Taste, umami-enhance effect and amino acid sequence of peptides separated from silkworm pupa hydrolysate.

    PubMed

    Yu, Zilin; Jiang, Hongrui; Guo, Rongcan; Yang, Bo; You, Gang; Zhao, Mouming; Liu, Xiaoling

    2018-06-01

    Four umami peptides were separated and purified by ultrafiltration, gel filtration chromatography and identified by ultra-performance liquid chromatography tandem mass-spectrometry (UPLC-MS/MS), the amino acid sequences of four peptides are Val-Pro-Tyr (VPY), Thr-Ala-Tyr (TAY), Ala-Ala-Pro-Tyr (AAPY) and Gly-Phe-Pro (GFP). The result illustrates that the umami amino acids are not the content of umami peptides, but bitter amino acids are included. The threshold of VPY, TAY, AAPY and GFP were 1.65 mmol/L, 1.76 mmol/L, 2.97 mmol/L and 6.26 mmol/L, respectively. The peptide TAY, VPY and AAPY had an umami-enhancement effect on the monosodium glutamate (MSG) + sodium chloride (NaCl) solution, their concentrations were 2.5 g/L, 5 g/L and 5 g/L, respectively, while GFP has no significant umami-enhancement effect in solution. In addition, the peptides have better taste than its composing amino acids, which indicates that the taste of peptide does not depend on its composing amino acids. Copyright © 2018. Published by Elsevier Ltd.

  14. Molecular characterization of a novel Nucleorhabdovirus from black currant identified by high-throughput sequencing

    USDA-ARS?s Scientific Manuscript database

    Contigs with sequence similarities to several nucleorhabdoviruses were identified by high-throughput sequencing analysis from a black currant (Ribes nigrum L.) cultivar. The complete genomic sequence of this new nucleorhabdovirus is 14,432 nucleotides. Its genomic organization is typical of nucleorh...

  15. Synthetic oligonucleotide probes deduced from amino acid sequence data. Theoretical and practical considerations.

    PubMed

    Lathe, R

    1985-05-05

    Synthetic probes deduced from amino acid sequence data are widely used to detect cognate coding sequences in libraries of cloned DNA segments. The redundancy of the genetic code dictates that a choice must be made between (1) a mixture of probes reflecting all codon combinations, and (2) a single longer "optimal" probe. The second strategy is examined in detail. The frequency of sequences matching a given probe by chance alone can be determined and also the frequency of sequences closely resembling the probe and contributing to the hybridization background. Gene banks cannot be treated as random associations of the four nucleotides, and probe sequences deduced from amino acid sequence data occur more often than predicted by chance alone. Probe lengths must be increased to confer the necessary specificity. Examination of hybrids formed between unique homologous probes and their cognate targets reveals that short stretches of perfect homology occurring by chance make a significant contribution to the hybridization background. Statistical methods for improving homology are examined, taking human coding sequences as an example, and considerations of codon utilization and dinucleotide frequencies yield an overall homology of greater than 82%. Recommendations for probe design and hybridization are presented, and the choice between using multiple probes reflecting all codon possibilities and a unique optimal probe is discussed.

  16. TaALMT1 promoter sequence compositions, acid tolerance, and Al tolerance in wheat cultivars and landraces from Sichuan in China.

    PubMed

    Han, C; Dai, S F; Liu, D C; Pu, Z J; Wei, Y M; Zheng, Y L; Wen, D J; Zhao, L; Yan, Z H

    2013-11-18

    Previous genetic studies on wheat from various sources have indicated that aluminum (Al) tolerance may have originated independently in USA, Brazil, and China. Here, TaALMT1 promoter sequences of 92 landraces and cultivars from Sichuan, China, were sequenced. Five promoter types (I', II, III, IV, and V) were observed in 39 cultivars, and only three promoter types (I, II, and III) were observed in 53 landraces. Among the wheat collections worldwide, only the Chinese Spring (CS) landrace native to Sichuan, China, carried the TaALMT1 promoter type III. Besides CS, two other Sichuan-bred landraces and six cultivars with TaALMT1 promoter type III were identified in this study. In the phylogenetic tree constructed based on the TaALMT1 promoter sequences, type III formed a separate branch, which was supported by a high bootstrap value. It is likely that TaALMT1 promoter type III originated from Sichuan-bred wheat landraces of China. In addition, the landraces with promoter type I showed the lowest Al tolerance among all landraces and cultivars. Furthermore, the cultivars with promoter type IV showed better Al tolerance than landraces with promoter type II. A comparison of acid tolerance and Al tolerance between cultivars and landraces showed that the landraces had better acid tolerance than the cultivars, whereas the cultivars showed better Al tolerance than the landraces. Moreover, significant difference in Al tolerance was also observed between the cultivars raised by the National Ministry of Agriculture and by Sichuan Province. Among the landraces from different regions, those from the East showed better acid tolerance and Al tolerance than those from the South and West of Sichuan. Additional Al-tolerant and acid-tolerant wheat lines were also identified.

  17. Large-Scale Concatenation cDNA Sequencing

    PubMed Central

    Yu, Wei; Andersson, Björn; Worley, Kim C.; Muzny, Donna M.; Ding, Yan; Liu, Wen; Ricafrente, Jennifer Y.; Wentland, Meredith A.; Lennon, Greg; Gibbs, Richard A.

    1997-01-01

    A total of 100 kb of DNA derived from 69 individual human brain cDNA clones of 0.7–2.0 kb were sequenced by concatenated cDNA sequencing (CCS), whereby multiple individual DNA fragments are sequenced simultaneously in a single shotgun library. The method yielded accurate sequences and a similar efficiency compared with other shotgun libraries constructed from single DNA fragments (>20 kb). Computer analyses were carried out on 65 cDNA clone sequences and their corresponding end sequences to examine both nucleic acid and amino acid sequence similarities in the databases. Thirty-seven clones revealed no DNA database matches, 12 clones generated exact matches (≥98% identity), and 16 clones generated nonexact matches (57%–97% identity) to either known human or other species genes. Of those 28 matched clones, 8 had corresponding end sequences that failed to identify similarities. In a protein similarity search, 27 clone sequences displayed significant matches, whereas only 20 of the end sequences had matches to known protein sequences. Our data indicate that full-length cDNA insert sequences provide significantly more nucleic acid and protein sequence similarity matches than expressed sequence tags (ESTs) for database searching. [All 65 cDNA clone sequences described in this paper have been submitted to the GenBank data library under accession nos. U79240–U79304.] PMID:9110174

  18. Novel pathogenic variant (c.3178G>A) in the SMC1A gene in a family with Cornelia de Lange syndrome identified by exome sequencing.

    PubMed

    Jang, Mi Ae; Lee, Chang Woo; Kim, Jin Kyung; Ki, Chang Seok

    2015-11-01

    Cornelia de Lange syndrome (CdLS) is a clinically and genetically heterogeneous congenital anomaly. Mutations in the NIPBL gene account for a half of the affected individuals. We describe a family with CdLS carrying a novel pathogenic variant of the SMC1A gene identified by exome sequencing. The proband was a 3-yr-old boy presenting with a developmental delay. He had distinctive facial features without major structural anomalies and tested negative for the NIPBL gene. His younger sister, mother, and maternal grandmother presented with mild mental retardation. By exome sequencing of the proband, a novel SMC1A variant, c.3178G>A, was identified, which was expected to cause an amino acid substitution (p.Glu1060Lys) in the highly conserved coiled-coil domain of the SMC1A protein. Sanger sequencing confirmed that the three female relatives with mental retardation also carry this variant. Our results reveal that SMC1A gene defects are associated with milder phenotypes of CdLS. Furthermore, we showed that exome sequencing could be a useful tool to identify pathogenic variants in patients with CdLS.

  19. Internet-accessible DNA sequence database for identifying fusaria from human and animal infections.

    PubMed

    O'Donnell, Kerry; Sutton, Deanna A; Rinaldi, Michael G; Sarver, Brice A J; Balajee, S Arunmozhi; Schroers, Hans-Josef; Summerbell, Richard C; Robert, Vincent A R G; Crous, Pedro W; Zhang, Ning; Aoki, Takayuki; Jung, Kyongyong; Park, Jongsun; Lee, Yong-Hwan; Kang, Seogchan; Park, Bongsoo; Geiser, David M

    2010-10-01

    Because less than one-third of clinically relevant fusaria can be accurately identified to species level using phenotypic data (i.e., morphological species recognition), we constructed a three-locus DNA sequence database to facilitate molecular identification of the 69 Fusarium species associated with human or animal mycoses encountered in clinical microbiology laboratories. The database comprises partial sequences from three nuclear genes: translation elongation factor 1α (EF-1α), the largest subunit of RNA polymerase (RPB1), and the second largest subunit of RNA polymerase (RPB2). These three gene fragments can be amplified by PCR and sequenced using primers that are conserved across the phylogenetic breadth of Fusarium. Phylogenetic analyses of the combined data set reveal that, with the exception of two monotypic lineages, all clinically relevant fusaria are nested in one of eight variously sized and strongly supported species complexes. The monophyletic lineages have been named informally to facilitate communication of an isolate's clade membership and genetic diversity. To identify isolates to the species included within the database, partial DNA sequence data from one or more of the three genes can be used as a BLAST query against the database which is Web accessible at FUSARIUM-ID (http://isolate.fusariumdb.org) and the Centraalbureau voor Schimmelcultures (CBS-KNAW) Fungal Biodiversity Center (http://www.cbs.knaw.nl/fusarium). Alternatively, isolates can be identified via phylogenetic analysis by adding sequences of unknowns to the DNA sequence alignment, which can be downloaded from the two aforementioned websites. The utility of this database should increase significantly as members of the clinical microbiology community deposit in internationally accessible culture collections (e.g., CBS-KNAW or the Fusarium Research Center) cultures of novel mycosis-associated fusaria, along with associated, corrected sequence chromatograms and data, so that the

  20. Amino acid positions subject to multiple coevolutionary constraints can be robustly identified by their eigenvector network centrality scores.

    PubMed

    Parente, Daniel J; Ray, J Christian J; Swint-Kruse, Liskin

    2015-12-01

    As proteins evolve, amino acid positions key to protein structure or function are subject to mutational constraints. These positions can be detected by analyzing sequence families for amino acid conservation or for coevolution between pairs of positions. Coevolutionary scores are usually rank-ordered and thresholded to reveal the top pairwise scores, but they also can be treated as weighted networks. Here, we used network analyses to bypass a major complication of coevolution studies: For a given sequence alignment, alternative algorithms usually identify different, top pairwise scores. We reconciled results from five commonly-used, mathematically divergent algorithms (ELSC, McBASC, OMES, SCA, and ZNMI), using the LacI/GalR and 1,6-bisphosphate aldolase protein families as models. Calculations used unthresholded coevolution scores from which column-specific properties such as sequence entropy and random noise were subtracted; "central" positions were identified by calculating various network centrality scores. When compared among algorithms, network centrality methods, particularly eigenvector centrality, showed markedly better agreement than comparisons of the top pairwise scores. Positions with large centrality scores occurred at key structural locations and/or were functionally sensitive to mutations. Further, the top central positions often differed from those with top pairwise coevolution scores: instead of a few strong scores, central positions often had multiple, moderate scores. We conclude that eigenvector centrality calculations reveal a robust evolutionary pattern of constraints-detectable by divergent algorithms--that occur at key protein locations. Finally, we discuss the fact that multiple patterns coexist in evolutionary data that, together, give rise to emergent protein functions. © 2015 Wiley Periodicals, Inc.

  1. RNA-ID, a Powerful Tool for Identifying and Characterizing Regulatory Sequences.

    PubMed

    Brule, C E; Dean, K M; Grayhack, E J

    2016-01-01

    The identification and analysis of sequences that regulate gene expression is critical because regulated gene expression underlies biology. RNA-ID is an efficient and sensitive method to discover and investigate regulatory sequences in the yeast Saccharomyces cerevisiae, using fluorescence-based assays to detect green fluorescent protein (GFP) relative to a red fluorescent protein (RFP) control in individual cells. Putative regulatory sequences can be inserted either in-frame or upstream of a superfolder GFP fusion protein whose expression, like that of RFP, is driven by the bidirectional GAL1,10 promoter. In this chapter, we describe the methodology to identify and study cis-regulatory sequences in the RNA-ID system, explaining features and variations of the RNA-ID reporter, as well as some applications of this system. We describe in detail the methods to analyze a single regulatory sequence, from construction of a single GFP variant to assay of variants by flow cytometry, as well as modifications required to screen libraries of different strains simultaneously. We also describe subsequent analyses of regulatory sequences. © 2016 Elsevier Inc. All rights reserved.

  2. Exome Sequencing Identifies Potentially Druggable Mutations in Nasopharyngeal Carcinoma.

    PubMed

    Chow, Yock Ping; Tan, Lu Ping; Chai, San Jiun; Abdul Aziz, Norazlin; Choo, Siew Woh; Lim, Paul Vey Hong; Pathmanathan, Rajadurai; Mohd Kornain, Noor Kaslina; Lum, Chee Lun; Pua, Kin Choo; Yap, Yoke Yeow; Tan, Tee Yong; Teo, Soo Hwang; Khoo, Alan Soo-Beng; Patel, Vyomesh

    2017-03-03

    In this study, we first performed whole exome sequencing of DNA from 10 untreated and clinically annotated fresh frozen nasopharyngeal carcinoma (NPC) biopsies and matched bloods to identify somatically mutated genes that may be amenable to targeted therapeutic strategies. We identified a total of 323 mutations which were either non-synonymous (n = 238) or synonymous (n = 85). Furthermore, our analysis revealed genes in key cancer pathways (DNA repair, cell cycle regulation, apoptosis, immune response, lipid signaling) were mutated, of which those in the lipid-signaling pathway were the most enriched. We next extended our analysis on a prioritized sub-set of 37 mutated genes plus top 5 mutated cancer genes listed in COSMIC using a custom designed HaloPlex target enrichment panel with an additional 88 NPC samples. Our analysis identified 160 additional non-synonymous mutations in 37/42 genes in 66/88 samples. Of these, 99/160 mutations within potentially druggable pathways were further selected for validation. Sanger sequencing revealed that 77/99 variants were true positives, giving an accuracy of 78%. Taken together, our study indicated that ~72% (n = 71/98) of NPC samples harbored mutations in one of the four cancer pathways (EGFR-PI3K-Akt-mTOR, NOTCH, NF-κB, DNA repair) which may be potentially useful as predictive biomarkers of response to matched targeted therapies.

  3. Exome Sequencing Identifies Potentially Druggable Mutations in Nasopharyngeal Carcinoma

    PubMed Central

    Chow, Yock Ping; Tan, Lu Ping; Chai, San Jiun; Abdul Aziz, Norazlin; Choo, Siew Woh; Lim, Paul Vey Hong; Pathmanathan, Rajadurai; Mohd Kornain, Noor Kaslina; Lum, Chee Lun; Pua, Kin Choo; Yap, Yoke Yeow; Tan, Tee Yong; Teo, Soo Hwang; Khoo, Alan Soo-Beng; Patel, Vyomesh

    2017-01-01

    In this study, we first performed whole exome sequencing of DNA from 10 untreated and clinically annotated fresh frozen nasopharyngeal carcinoma (NPC) biopsies and matched bloods to identify somatically mutated genes that may be amenable to targeted therapeutic strategies. We identified a total of 323 mutations which were either non-synonymous (n = 238) or synonymous (n = 85). Furthermore, our analysis revealed genes in key cancer pathways (DNA repair, cell cycle regulation, apoptosis, immune response, lipid signaling) were mutated, of which those in the lipid-signaling pathway were the most enriched. We next extended our analysis on a prioritized sub-set of 37 mutated genes plus top 5 mutated cancer genes listed in COSMIC using a custom designed HaloPlex target enrichment panel with an additional 88 NPC samples. Our analysis identified 160 additional non-synonymous mutations in 37/42 genes in 66/88 samples. Of these, 99/160 mutations within potentially druggable pathways were further selected for validation. Sanger sequencing revealed that 77/99 variants were true positives, giving an accuracy of 78%. Taken together, our study indicated that ~72% (n = 71/98) of NPC samples harbored mutations in one of the four cancer pathways (EGFR-PI3K-Akt-mTOR, NOTCH, NF-κB, DNA repair) which may be potentially useful as predictive biomarkers of response to matched targeted therapies. PMID:28256603

  4. Negative Ion In-Source Decay Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry for Sequencing Acidic Peptides

    NASA Astrophysics Data System (ADS)

    McMillen, Chelsea L.; Wright, Patience M.; Cassady, Carolyn J.

    2016-05-01

    Matrix-assisted laser desorption/ionization (MALDI) in-source decay was studied in the negative ion mode on deprotonated peptides to determine its usefulness for obtaining extensive sequence information for acidic peptides. Eight biological acidic peptides, ranging in size from 11 to 33 residues, were studied by negative ion mode ISD (nISD). The matrices 2,5-dihydroxybenzoic acid, 2-aminobenzoic acid, 2-aminobenzamide, 1,5-diaminonaphthalene, 5-amino-1-naphthol, 3-aminoquinoline, and 9-aminoacridine were used with each peptide. Optimal fragmentation was produced with 1,5-diaminonphthalene (DAN), and extensive sequence informative fragmentation was observed for every peptide except hirudin(54-65). Cleavage at the N-Cα bond of the peptide backbone, producing c' and z' ions, was dominant for all peptides. Cleavage of the N-Cα bond N-terminal to proline residues was not observed. The formation of c and z ions is also found in electron transfer dissociation (ETD), electron capture dissociation (ECD), and positive ion mode ISD, which are considered to be radical-driven techniques. Oxidized insulin chain A, which has four highly acidic oxidized cysteine residues, had less extensive fragmentation. This peptide also exhibited the only charged localized fragmentation, with more pronounced product ion formation adjacent to the highly acidic residues. In addition, spectra were obtained by positive ion mode ISD for each protonated peptide; more sequence informative fragmentation was observed via nISD for all peptides. Three of the peptides studied had no product ion formation in ISD, but extensive sequence informative fragmentation was found in their nISD spectra. The results of this study indicate that nISD can be used to readily obtain sequence information for acidic peptides.

  5. Negative Ion In-Source Decay Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry for Sequencing Acidic Peptides.

    PubMed

    McMillen, Chelsea L; Wright, Patience M; Cassady, Carolyn J

    2016-05-01

    Matrix-assisted laser desorption/ionization (MALDI) in-source decay was studied in the negative ion mode on deprotonated peptides to determine its usefulness for obtaining extensive sequence information for acidic peptides. Eight biological acidic peptides, ranging in size from 11 to 33 residues, were studied by negative ion mode ISD (nISD). The matrices 2,5-dihydroxybenzoic acid, 2-aminobenzoic acid, 2-aminobenzamide, 1,5-diaminonaphthalene, 5-amino-1-naphthol, 3-aminoquinoline, and 9-aminoacridine were used with each peptide. Optimal fragmentation was produced with 1,5-diaminonphthalene (DAN), and extensive sequence informative fragmentation was observed for every peptide except hirudin(54-65). Cleavage at the N-Cα bond of the peptide backbone, producing c' and z' ions, was dominant for all peptides. Cleavage of the N-Cα bond N-terminal to proline residues was not observed. The formation of c and z ions is also found in electron transfer dissociation (ETD), electron capture dissociation (ECD), and positive ion mode ISD, which are considered to be radical-driven techniques. Oxidized insulin chain A, which has four highly acidic oxidized cysteine residues, had less extensive fragmentation. This peptide also exhibited the only charged localized fragmentation, with more pronounced product ion formation adjacent to the highly acidic residues. In addition, spectra were obtained by positive ion mode ISD for each protonated peptide; more sequence informative fragmentation was observed via nISD for all peptides. Three of the peptides studied had no product ion formation in ISD, but extensive sequence informative fragmentation was found in their nISD spectra. The results of this study indicate that nISD can be used to readily obtain sequence information for acidic peptides.

  6. Whole-Genome Sequencing of Sordaria macrospora Mutants Identifies Developmental Genes.

    PubMed

    Nowrousian, Minou; Teichert, Ines; Masloff, Sandra; Kück, Ulrich

    2012-02-01

    The study of mutants to elucidate gene functions has a long and successful history; however, to discover causative mutations in mutants that were generated by random mutagenesis often takes years of laboratory work and requires previously generated genetic and/or physical markers, or resources like DNA libraries for complementation. Here, we present an alternative method to identify defective genes in developmental mutants of the filamentous fungus Sordaria macrospora through Illumina/Solexa whole-genome sequencing. We sequenced pooled DNA from progeny of crosses of three mutants and the wild type and were able to pinpoint the causative mutations in the mutant strains through bioinformatics analysis. One mutant is a spore color mutant, and the mutated gene encodes a melanin biosynthesis enzyme. The causative mutation is a G to A change in the first base of an intron, leading to a splice defect. The second mutant carries an allelic mutation in the pro41 gene encoding a protein essential for sexual development. In the mutant, we detected a complex pattern of deletion/rearrangements at the pro41 locus. In the third mutant, a point mutation in the stop codon of a transcription factor-encoding gene leads to the production of immature fruiting bodies. For all mutants, transformation with a wild type-copy of the affected gene restored the wild-type phenotype. Our data demonstrate that whole-genome sequencing of mutant strains is a rapid method to identify developmental genes in an organism that can be genetically crossed and where a reference genome sequence is available, even without prior mapping information.

  7. Whole-Genome Sequencing of Sordaria macrospora Mutants Identifies Developmental Genes

    PubMed Central

    Nowrousian, Minou; Teichert, Ines; Masloff, Sandra; Kück, Ulrich

    2012-01-01

    The study of mutants to elucidate gene functions has a long and successful history; however, to discover causative mutations in mutants that were generated by random mutagenesis often takes years of laboratory work and requires previously generated genetic and/or physical markers, or resources like DNA libraries for complementation. Here, we present an alternative method to identify defective genes in developmental mutants of the filamentous fungus Sordaria macrospora through Illumina/Solexa whole-genome sequencing. We sequenced pooled DNA from progeny of crosses of three mutants and the wild type and were able to pinpoint the causative mutations in the mutant strains through bioinformatics analysis. One mutant is a spore color mutant, and the mutated gene encodes a melanin biosynthesis enzyme. The causative mutation is a G to A change in the first base of an intron, leading to a splice defect. The second mutant carries an allelic mutation in the pro41 gene encoding a protein essential for sexual development. In the mutant, we detected a complex pattern of deletion/rearrangements at the pro41 locus. In the third mutant, a point mutation in the stop codon of a transcription factor-encoding gene leads to the production of immature fruiting bodies. For all mutants, transformation with a wild type-copy of the affected gene restored the wild-type phenotype. Our data demonstrate that whole-genome sequencing of mutant strains is a rapid method to identify developmental genes in an organism that can be genetically crossed and where a reference genome sequence is available, even without prior mapping information. PMID:22384404

  8. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ...” means those amino acids other than “Xaa” and those nucleotide bases other than “n”defined in accordance... 37 Patents, Trademarks, and Copyrights 1 2014-07-01 2014-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences...

  9. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ...” means those amino acids other than “Xaa” and those nucleotide bases other than “n”defined in accordance... 37 Patents, Trademarks, and Copyrights 1 2013-07-01 2013-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences...

  10. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ...” means those amino acids other than “Xaa” and those nucleotide bases other than “n”defined in accordance... 37 Patents, Trademarks, and Copyrights 1 2012-07-01 2012-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences...

  11. Full genome virus detection in fecal samples using sensitive nucleic acid preparation, deep sequencing, and a novel iterative sequence classification algorithm.

    PubMed

    Cotten, Matthew; Oude Munnink, Bas; Canuti, Marta; Deijs, Martin; Watson, Simon J; Kellam, Paul; van der Hoek, Lia

    2014-01-01

    We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to double-stranded DNA and subjected to Illumina MiSeq sequencing. The resulting short reads were processed with a novel iterative Python algorithm SLIM for the identification of sequences with homology to known viruses. De novo assembly was then used to generate full viral genomes. The sensitivity of this process was demonstrated with a set of fecal samples from HIV-1 infected patients. A quantitative assessment of the mammalian, plant, and bacterial virus content of this compartment was generated and the deep sequencing data were sufficient to assembly 12 complete viral genomes from 6 virus families. The method detected high levels of enteropathic viruses that are normally controlled in healthy adults, but may be involved in the pathogenesis of HIV-1 infection and will provide a powerful tool for virus detection and for analyzing changes in the fecal virome associated with HIV-1 progression and pathogenesis.

  12. Full Genome Virus Detection in Fecal Samples Using Sensitive Nucleic Acid Preparation, Deep Sequencing, and a Novel Iterative Sequence Classification Algorithm

    PubMed Central

    Cotten, Matthew; Oude Munnink, Bas; Canuti, Marta; Deijs, Martin; Watson, Simon J.; Kellam, Paul; van der Hoek, Lia

    2014-01-01

    We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to double-stranded DNA and subjected to Illumina MiSeq sequencing. The resulting short reads were processed with a novel iterative Python algorithm SLIM for the identification of sequences with homology to known viruses. De novo assembly was then used to generate full viral genomes. The sensitivity of this process was demonstrated with a set of fecal samples from HIV-1 infected patients. A quantitative assessment of the mammalian, plant, and bacterial virus content of this compartment was generated and the deep sequencing data were sufficient to assembly 12 complete viral genomes from 6 virus families. The method detected high levels of enteropathic viruses that are normally controlled in healthy adults, but may be involved in the pathogenesis of HIV-1 infection and will provide a powerful tool for virus detection and for analyzing changes in the fecal virome associated with HIV-1 progression and pathogenesis. PMID:24695106

  13. [Complete genome sequencing of polymalic acid-producing strain Aureobasidium pullulans CCTCC M2012223].

    PubMed

    Wang, Yongkang; Song, Xiaodan; Li, Xiaorong; Yang, Sang-tian; Zou, Xiang

    2017-01-04

    To explore the genome sequence of Aureobasidium pullulans CCTCC M2012223, analyze the key genes related to the biosynthesis of important metabolites, and provide genetic background for metabolic engineering. Complete genome of A. pullulans CCTCC M2012223 was sequenced by Illumina HiSeq high throughput sequencing platform. Then, fragment assembly, gene prediction, functional annotation, and GO/COG cluster were analyzed in comparison with those of other five A. pullulans varieties. The complete genome sequence of A. pullulans CCTCC M2012223 was 30756831 bp with an average GC content of 47.49%, and 9452 genes were successfully predicted. Genome-wide analysis showed that A. pullulans CCTCC M2012223 had the biggest genome assembly size. Protein sequences involved in the pullulan and polymalic acid pathway were highly conservative in all of six A. pullulans varieties. Although both A. pullulans CCTCC M2012223 and A. pullulans var. melanogenum have a close affinity, some point mutation and inserts were occurred in protein sequences involved in melanin biosynthesis. Genome information of A. pullulans CCTCC M2012223 was annotated and genes involved in melanin, pullulan and polymalic acid pathway were compared, which would provide a theoretical basis for genetic modification of metabolic pathway in A. pullulans.

  14. Whole Exome Sequencing Identifies RAI1 Mutation in a Morbidly Obese Child Diagnosed With ROHHAD Syndrome

    PubMed Central

    Esteves, Kristyn M.; Towne, Meghan C.; Brownstein, Catherine A.; James, Philip M.; Crowley, Laura; Hirschhorn, Joel N.; Elsea, Sarah H.; Beggs, Alan H.; Picker, Jonathan

    2015-01-01

    Context: The current obesity epidemic is attributed to complex interactions between genetic and environmental factors. However, a limited number of cases, especially those with early-onset severe obesity, are linked to single gene defects. Rapid-onset obesity with hypothalamic dysfunction, hypoventilation and autonomic dysregulation (ROHHAD) is one of the syndromes that presents with abrupt-onset extreme weight gain with an unknown genetic basis. Objective: To identify the underlying genetic etiology in a child with morbid early-onset obesity, hypoventilation, and autonomic and behavioral disturbances who was clinically diagnosed with ROHHAD syndrome. Design/Setting/Intervention: The index patient was evaluated at an academic medical center. Whole-exome sequencing was performed on the proband and his parents. Genetic variants were validated by Sanger sequencing. Results: We identified a novel de novo nonsense mutation, c.3265 C>T (p.R1089X), in the retinoic acid-induced 1 (RAI1) gene in the proband. Mutations in the RAI1 gene are known to cause Smith-Magenis syndrome (SMS). On further evaluation, his clinical features were not typical of either SMS or ROHHAD syndrome. Conclusions: This study identifies a de novo RAI1 mutation in a child with morbid obesity and a clinical diagnosis of ROHHAD syndrome. Although extreme early-onset obesity, autonomic disturbances, and hypoventilation are present in ROHHAD, several of the clinical findings are consistent with SMS. This case highlights the challenges in the diagnosis of ROHHAD syndrome and its potential overlap with SMS. We also propose RAI1 as a candidate gene for children with morbid obesity. PMID:25781356

  15. Whole exome sequencing identifies RAI1 mutation in a morbidly obese child diagnosed with ROHHAD syndrome.

    PubMed

    Thaker, Vidhu V; Esteves, Kristyn M; Towne, Meghan C; Brownstein, Catherine A; James, Philip M; Crowley, Laura; Hirschhorn, Joel N; Elsea, Sarah H; Beggs, Alan H; Picker, Jonathan; Agrawal, Pankaj B

    2015-05-01

    The current obesity epidemic is attributed to complex interactions between genetic and environmental factors. However, a limited number of cases, especially those with early-onset severe obesity, are linked to single gene defects. Rapid-onset obesity with hypothalamic dysfunction, hypoventilation and autonomic dysregulation (ROHHAD) is one of the syndromes that presents with abrupt-onset extreme weight gain with an unknown genetic basis. To identify the underlying genetic etiology in a child with morbid early-onset obesity, hypoventilation, and autonomic and behavioral disturbances who was clinically diagnosed with ROHHAD syndrome. Design/Setting/Intervention: The index patient was evaluated at an academic medical center. Whole-exome sequencing was performed on the proband and his parents. Genetic variants were validated by Sanger sequencing. We identified a novel de novo nonsense mutation, c.3265 C>T (p.R1089X), in the retinoic acid-induced 1 (RAI1) gene in the proband. Mutations in the RAI1 gene are known to cause Smith-Magenis syndrome (SMS). On further evaluation, his clinical features were not typical of either SMS or ROHHAD syndrome. This study identifies a de novo RAI1 mutation in a child with morbid obesity and a clinical diagnosis of ROHHAD syndrome. Although extreme early-onset obesity, autonomic disturbances, and hypoventilation are present in ROHHAD, several of the clinical findings are consistent with SMS. This case highlights the challenges in the diagnosis of ROHHAD syndrome and its potential overlap with SMS. We also propose RAI1 as a candidate gene for children with morbid obesity.

  16. Exome sequencing identifies SUCO mutations in mesial temporal lobe epilepsy.

    PubMed

    Sha, Zhiqiang; Sha, Longze; Li, Wenting; Dou, Wanchen; Shen, Yan; Wu, Liwen; Xu, Qi

    2015-03-30

    Mesial temporal lobe epilepsy (mTLE) is the main type and most common medically intractable form of epilepsy. Severity of disease-based stratified samples may help identify new disease-associated mutant genes. We analyzed mRNA expression profiles from patient hippocampal tissue. Three of the seven patients had severe mTLE with generalized-onset convulsions and consciousness loss that occurred over many years. We found that compared with other groups, patients with severe mTLE were classified into a distinct group. Whole-exome sequencing and Sanger sequencing validation in all seven patients identified three novel SUN domain-containing ossification factor (SUCO) mutations in severely affected patients. Furthermore, SUCO knock down significantly reduced dendritic length in vitro. Our results indicate that mTLE defects may affect neuronal development, and suggest that neurons have abnormal development due to lack of SUCO, which may be a generalized-onset epilepsy-related gene. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  17. Next-generation sequencing library preparation method for identification of RNA viruses on the Ion Torrent Sequencing Platform.

    PubMed

    Chen, Guiqian; Qiu, Yuan; Zhuang, Qingye; Wang, Suchun; Wang, Tong; Chen, Jiming; Wang, Kaicheng

    2018-05-09

    Next generation sequencing (NGS) is a powerful tool for the characterization, discovery, and molecular identification of RNA viruses. There were multiple NGS library preparation methods published for strand-specific RNA-seq, but some methods are not suitable for identifying and characterizing RNA viruses. In this study, we report a NGS library preparation method to identify RNA viruses using the Ion Torrent PGM platform. The NGS sequencing adapters were directly inserted into the sequencing library through reverse transcription and polymerase chain reaction, without fragmentation and ligation of nucleic acids. The results show that this method is simple to perform, able to identify multiple species of RNA viruses in clinical samples.

  18. Complete amino acid sequence of ananain and a comparison with stem bromelain and other plant cysteine proteases.

    PubMed Central

    Lee, K L; Albee, K L; Bernasconi, R J; Edmunds, T

    1997-01-01

    The amino acid sequences of ananain (EC3.4.22.31) and stem bromelain (3.4.22.32), two cysteine proteases from pineapple stem, are similar yet ananain and stem bromelain possess distinct specificities towards synthetic peptide substrates and different reactivities towards the cysteine protease inhibitors E-64 and chicken egg white cystatin. We present here the complete amino acid sequence of ananain and compare it with the reported sequences of pineapple stem bromelain, papain and chymopapain from papaya and actinidin from kiwifruit. Ananain is comprised of 216 residues with a theoretical mass of 23464 Da. This primary structure includes a sequence insert between residues 170 and 174 not present in stem bromelain or papain and a hydrophobic series of amino acids adjacent to His-157. It is possible that these sequence differences contribute to the different substrate and inhibitor specificities exhibited by ananain and stem bromelain. PMID:9355753

  19. Identification of amino acid sequences in the polyomavirus capsid proteins that serve as nuclear localization signals

    NASA Technical Reports Server (NTRS)

    Chang, D.; Haynes, J. I. Jr; Brady, J. N.; Consigli, R. A.; Spooner, B. S. (Principal Investigator)

    1993-01-01

    The molecular mechanism participating in the transport of newly synthesized proteins from the cytoplasm to the nucleus in mammalian cells is poorly understood. Recently, the nuclear localization signal sequences (NLS) of many nuclear proteins have been identified, and most have been found to be composed of a highly basic amino acid stretch. A genetic "subtractive" and a biochemical "additive" approach were used in our studies to identify the NLS's of the polyomavirus structural capsid proteins. An NLS was identified at the N-terminus (Ala1-Pro-Lys-Arg-Lys-Ser-Gly-Val-Ser-Lys-Cys11) of the major capsid protein VP1 and at the C-terminus (Glu307 -Glu-Asp-Gly-Pro-Glu-Lys-Lys-Lys-Arg-Arg-Leu318) of the VP2/VP3 minor capsid proteins.

  20. UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy

    PubMed Central

    2017-01-01

    Unique Molecular Identifiers (UMIs) are random oligonucleotide barcodes that are increasingly used in high-throughput sequencing experiments. Through a UMI, identical copies arising from distinct molecules can be distinguished from those arising through PCR amplification of the same molecule. However, bioinformatic methods to leverage the information from UMIs have yet to be formalized. In particular, sequencing errors in the UMI sequence are often ignored or else resolved in an ad hoc manner. We show that errors in the UMI sequence are common and introduce network-based methods to account for these errors when identifying PCR duplicates. Using these methods, we demonstrate improved quantification accuracy both under simulated conditions and real iCLIP and single-cell RNA-seq data sets. Reproducibility between iCLIP replicates and single-cell RNA-seq clustering are both improved using our proposed network-based method, demonstrating the value of properly accounting for errors in UMIs. These methods are implemented in the open source UMI-tools software package. PMID:28100584

  1. Transcriptomic Analysis Identifies Candidate Genes Related to Intramuscular Fat Deposition and Fatty Acid Composition in the Breast Muscle of Squabs (Columba)

    PubMed Central

    Ye, Manhong; Zhou, Bin; Wei, Shanshan; Ding, MengMeng; Lu, Xinghui; Shi, Xuehao; Ding, Jiatong; Yang, Shengmei; Wei, Wanhong

    2016-01-01

    Despite the fact that squab is consumed throughout the world because of its high nutritional value and appreciated sensory attributes, aspects related to its characterization, and in particular genetic issues, have rarely been studied. In this study, meat traits in terms of pH, water-holding capacity, intramuscular fat content, and fatty acid profile of the breast muscle of squabs from two meat pigeon breeds were determined. Breed-specific differences were detected in fat-related traits of intramuscular fat content and fatty acid composition. RNA-Sequencing was applied to compare the transcriptomes of muscle and liver tissues between squabs of two breeds to identify candidate genes associated with the differences in the capacity of fat deposition. A total of 27 differentially expressed genes assigned to pathways of lipid metabolism were identified, of which, six genes belonged to the peroxisome proliferator-activated receptor signaling pathway along with four other genes. Our results confirmed in part previous reports in livestock and provided also a number of genes which had not been related to fat deposition so far. These genes can serve as a basis for further investigations to screen markers closely associated with intramuscular fat content and fatty acid composition in squabs. The data from this study were deposited in the National Center for Biotechnology Information (NCBI)’s Sequence Read Archive under the accession numbers SRX1680021 and SRX1680022. This is the first transcriptome analysis of the muscle and liver tissue in Columba using next generation sequencing technology. Data provided here are of potential value to dissect functional genes influencing fat deposition in squabs. PMID:27175015

  2. Sequencing ASMT identifies rare mutations in Chinese Han patients with autism.

    PubMed

    Wang, Lifang; Li, Jun; Ruan, Yanyan; Lu, Tianlan; Liu, Chenxing; Jia, Meixiang; Yue, Weihua; Liu, Jing; Bourgeron, Thomas; Zhang, Dai

    2013-01-01

    Melatonin is involved in the regulation of circadian and seasonal rhythms and immune function. Prior research reported low melatonin levels in autism spectrum disorders (ASD). ASMT located in pseudo-autosomal region 1 encodes the last enzyme of the melatonin biosynthesis pathway. A previous study reported an association between ASD and single nucleotide polymorphisms (SNPs) rs4446909 and rs5989681 located in the promoter of ASMT. Furthermore, rare deleterious mutations were identified in a subset of patients. To investigate the association between ASMT and autism, we sequenced all ASMT exons and its neighboring region in 398 Chinese Han individuals with autism and 437 healthy controls. Although our study did not detect significant differences of genotypic distribution and allele frequencies of the common SNPs in ASMT between patients with autism and healthy controls, we identified new rare coding mutations of ASMT. Among these rare variants, 4 were exclusively detected in patients with autism including a stop mutation (p.R115W, p.V166I, p.V179G, and p.W257X). These four coding variants were observed in 6 of 398 (1.51%) patients with autism and none in 437 controls (Chi-Square test, Continuity Correction p = 0.032, two-sided). Functional prediction of impact of amino acid showed that p.R115W might affect protein function. These results indicate that ASMT might be a susceptibility gene for autism. Further studies in larger samples are needed to better understand the degree of variation in this gene as well as to understand the biochemical and clinical impacts of ASMT/melatonin deficiency.

  3. CDSbank: taxonomy-aware extraction, selection, renaming and formatting of protein-coding DNA or amino acid sequences.

    PubMed

    Hazes, Bart

    2014-02-28

    Protein-coding DNA sequences and their corresponding amino acid sequences are routinely used to study relationships between sequence, structure, function, and evolution. The rapidly growing size of sequence databases increases the power of such comparative analyses but it makes it more challenging to prepare high quality sequence data sets with control over redundancy, quality, completeness, formatting, and labeling. Software tools for some individual steps in this process exist but manual intervention remains a common and time consuming necessity. CDSbank is a database that stores both the protein-coding DNA sequence (CDS) and amino acid sequence for each protein annotated in Genbank. CDSbank also stores Genbank feature annotation, a flag to indicate incomplete 5' and 3' ends, full taxonomic data, and a heuristic to rank the scientific interest of each species. This rich information allows fully automated data set preparation with a level of sophistication that aims to meet or exceed manual processing. Defaults ensure ease of use for typical scenarios while allowing great flexibility when needed. Access is via a free web server at http://hazeslab.med.ualberta.ca/CDSbank/. CDSbank presents a user-friendly web server to download, filter, format, and name large sequence data sets. Common usage scenarios can be accessed via pre-programmed default choices, while optional sections give full control over the processing pipeline. Particular strengths are: extract protein-coding DNA sequences just as easily as amino acid sequences, full access to taxonomy for labeling and filtering, awareness of incomplete sequences, and the ability to take one protein sequence and extract all synonymous CDS or identical protein sequences in other species. Finally, CDSbank can also create labeled property files to, for instance, annotate or re-label phylogenetic trees.

  4. ANCAC: amino acid, nucleotide, and codon analysis of COGs--a tool for sequence bias analysis in microbial orthologs.

    PubMed

    Meiler, Arno; Klinger, Claudia; Kaufmann, Michael

    2012-09-08

    The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC's NUCOCOG dataset as the largest one available for that purpose thus far. Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills.

  5. Molecular characterization of a novel rhabdovirus infecting blackcurrant identified by high-throughput sequencing.

    PubMed

    Wu, L-P; Yang, T; Liu, H-W; Postman, J; Li, R

    2018-05-01

    A large contig with sequence similarities to several nucleorhabdoviruses was identified by high-throughput sequencing analysis from a black currant (Ribes nigrum L.) cultivar. The complete genome sequence of this new nucleorhabdovirus is 14,432 nucleotides long. Its genomic organization is very similar to those of unsegmented plant rhabdoviruses, containing six open reading frames in the order 3'-N-P-P3-M-G-L-5. The virus, which is provisionally named "black currant-associated rhabdovirus", is 41-52% identical in its genome nucleotide sequence to other nucleorhabdoviruses and may represent a new species in the genus Nucleorhabdovirus.

  6. The complete genomic sequence of a tentative new polerovirus identified in barley in South Korea.

    PubMed

    Zhao, Fumei; Lim, Seungmo; Yoo, Ran Hee; Igori, Davaajargal; Kim, Sang-Min; Kwak, Do Yeon; Kim, Sun Lim; Lee, Bong Choon; Moon, Jae Sun

    2016-07-01

    The complete nucleotide sequence of a new barley polerovirus, tentatively named barley virus G (BVG), which was isolated in Gimje, South Korea, has been determined using an RNA sequencing technique combined with polymerase chain reaction methods. The viral genomic RNA of BVG is 5,620 nucleotides long and contains six typical open reading frames commonly observed in other poleroviruses. Sequence comparisons revealed that BVG is most closely related to maize yellow dwarf virus-RMV, with the highest amino acid identities being less than 90 % for all of the corresponding proteins. These results suggested that BVG is a member of a new species in the genus Polerovirus.

  7. Characterization of a new apple luteovirus identified by high-throughput sequencing.

    PubMed

    Liu, Huawei; Wu, Liping; Nikolaeva, Ekaterina; Peter, Kari; Liu, Zongrang; Mollov, Dimitre; Cao, Mengji; Li, Ruhui

    2018-05-15

    'Rapid Apple Decline' (RAD) is a newly emerging problem of young, dwarf apple trees in the Northeastern USA. The affected trees show trunk necrosis, cracking and canker before collapse in summer. In this study, we discovered and characterized a new luteovirus from apple trees in RAD-affected orchards using high-throughput sequencing (HTS) technology and subsequent Sanger sequencing. Illumina NextSeq sequencing was applied to total RNAs prepared from three diseased apple trees. Sequence reads were de novo assembled, and contigs were annotated by BLASTx. RT-PCR and 5'/3' RACE sequencing were used to obtain the complete genome of a new virus. RT-PCR was used to detect the virus. Three common apple viruses and a new luteovirus were identified from the diseased trees by HTS and RT-PCR. Sequence analyses of the complete genome of the new virus show that it is a new species of the genus Luteovirus in the family Luteoviridae. The virus is graft transmissible and detected by RT-PCR in apple trees in a couple of orchards. A new luteovirus and/or three known viruses were found to be associated with RAD. Molecular characterization of the new luteovirus provides important information for further investigation of its distribution and etiological role.

  8. Identifying and mitigating batch effects in whole genome sequencing data.

    PubMed

    Tom, Jennifer A; Reeder, Jens; Forrest, William F; Graham, Robert R; Hunkapiller, Julie; Behrens, Timothy W; Bhangale, Tushar R

    2017-07-24

    Large sample sets of whole genome sequencing with deep coverage are being generated, however assembling datasets from different sources inevitably introduces batch effects. These batch effects are not well understood and can be due to changes in the sequencing protocol or bioinformatics tools used to process the data. No systematic algorithms or heuristics exist to detect and filter batch effects or remove associations impacted by batch effects in whole genome sequencing data. We describe key quality metrics, provide a freely available software package to compute them, and demonstrate that identification of batch effects is aided by principal components analysis of these metrics. To mitigate batch effects, we developed new site-specific filters that identified and removed variants that falsely associated with the phenotype due to batch effect. These include filtering based on: a haplotype based genotype correction, a differential genotype quality test, and removing sites with missing genotype rate greater than 30% after setting genotypes with quality scores less than 20 to missing. This method removed 96.1% of unconfirmed genome-wide significant SNP associations and 97.6% of unconfirmed genome-wide significant indel associations. We performed analyses to demonstrate that: 1) These filters impacted variants known to be disease associated as 2 out of 16 confirmed associations in an AMD candidate SNP analysis were filtered, representing a reduction in power of 12.5%, 2) In the absence of batch effects, these filters removed only a small proportion of variants across the genome (type I error rate of 3%), and 3) in an independent dataset, the method removed 90.2% of unconfirmed genome-wide SNP associations and 89.8% of unconfirmed genome-wide indel associations. Researchers currently do not have effective tools to identify and mitigate batch effects in whole genome sequencing data. We developed and validated methods and filters to address this deficiency.

  9. Human Retroviruses and AIDS. A compilation and analysis of nucleic acid and amino acid sequences: I--II; III--V

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Myers, G.; Korber, B.; Wain-Hobson, S.

    1993-12-31

    This compendium and the accompanying floppy diskettes are the result of an effort to compile and rapidly publish all relevant molecular data concerning the human immunodeficiency viruses (HIV) and related retroviruses. The scope of the compendium and database is best summarized by the five parts that it comprises: (I) HIV and SIV Nucleotide Sequences; (II) Amino Acid Sequences; (III) Analyses; (IV) Related Sequences; and (V) Database Communications. Information within all the parts is updated at least twice in each year, which accounts for the modes of binding and pagination in the compendium.

  10. kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets.

    PubMed

    Fletez-Brant, Christopher; Lee, Dongwon; McCallion, Andrew S; Beer, Michael A

    2013-07-01

    Massively parallel sequencing technologies have made the generation of genomic data sets a routine component of many biological investigations. For example, Chromatin immunoprecipitation followed by sequence assays detect genomic regions bound (directly or indirectly) by specific factors, and DNase-seq identifies regions of open chromatin. A major bottleneck in the interpretation of these data is the identification of the underlying DNA sequence code that defines, and ultimately facilitates prediction of, these transcription factor (TF) bound or open chromatin regions. We have recently developed a novel computational methodology, which uses a support vector machine (SVM) with kmer sequence features (kmer-SVM) to identify predictive combinations of short transcription factor-binding sites, which determine the tissue specificity of these genomic assays (Lee, Karchin and Beer, Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011; 21:2167-80). This regulatory information can (i) give confidence in genomic experiments by recovering previously known binding sites, and (ii) reveal novel sequence features for subsequent experimental testing of cooperative mechanisms. Here, we describe the development and implementation of a web server to allow the broader research community to independently apply our kmer-SVM to analyze and interpret their genomic datasets. We analyze five recently published data sets and demonstrate how this tool identifies accessory factors and repressive sequence elements. kmer-SVM is available at http://kmersvm.beerlab.org.

  11. kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets

    PubMed Central

    Fletez-Brant, Christopher; Lee, Dongwon; McCallion, Andrew S.; Beer, Michael A.

    2013-01-01

    Massively parallel sequencing technologies have made the generation of genomic data sets a routine component of many biological investigations. For example, Chromatin immunoprecipitation followed by sequence assays detect genomic regions bound (directly or indirectly) by specific factors, and DNase-seq identifies regions of open chromatin. A major bottleneck in the interpretation of these data is the identification of the underlying DNA sequence code that defines, and ultimately facilitates prediction of, these transcription factor (TF) bound or open chromatin regions. We have recently developed a novel computational methodology, which uses a support vector machine (SVM) with kmer sequence features (kmer-SVM) to identify predictive combinations of short transcription factor-binding sites, which determine the tissue specificity of these genomic assays (Lee, Karchin and Beer, Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011; 21:2167–80). This regulatory information can (i) give confidence in genomic experiments by recovering previously known binding sites, and (ii) reveal novel sequence features for subsequent experimental testing of cooperative mechanisms. Here, we describe the development and implementation of a web server to allow the broader research community to independently apply our kmer-SVM to analyze and interpret their genomic datasets. We analyze five recently published data sets and demonstrate how this tool identifies accessory factors and repressive sequence elements. kmer-SVM is available at http://kmersvm.beerlab.org. PMID:23771147

  12. SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data

    USGS Publications Warehouse

    Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).

  13. SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data.

    PubMed

    Miller, Mark P; Knaus, Brian J; Mullins, Thomas D; Haig, Susan M

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25 bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).

  14. RoboOligo: software for mass spectrometry data to support manual and de novo sequencing of post-transcriptionally modified ribonucleic acids

    PubMed Central

    Sample, Paul J.; Gaston, Kirk W.; Alfonzo, Juan D.; Limbach, Patrick A.

    2015-01-01

    Ribosomal ribonucleic acid (RNA), transfer RNA and other biological or synthetic RNA polymers can contain nucleotides that have been modified by the addition of chemical groups. Traditional Sanger sequencing methods cannot establish the chemical nature and sequence of these modified-nucleotide containing oligomers. Mass spectrometry (MS) has become the conventional approach for determining the nucleotide composition, modification status and sequence of modified RNAs. Modified RNAs are analyzed by MS using collision-induced dissociation tandem mass spectrometry (CID MS/MS), which produces a complex dataset of oligomeric fragments that must be interpreted to identify and place modified nucleosides within the RNA sequence. Here we report the development of RoboOligo, an interactive software program for the robust analysis of data generated by CID MS/MS of RNA oligomers. There are three main functions of RoboOligo: (i) automated de novo sequencing via the local search paradigm. (ii) Manual sequencing with real-time spectrum labeling and cumulative intensity scoring. (iii) A hybrid approach, coined ‘variable sequencing’, which combines the user intuition of manual sequencing with the high-throughput sampling of automated de novo sequencing. PMID:25820423

  15. Mutations in the newly identified RAX regulatory sequence are not a frequent cause of micro/anophthalmia.

    PubMed

    Chassaing, Nicolas; Vigouroux, Adeline; Calvas, Patrick

    2009-06-01

    Microphthalmia and anophthalmia are at the severe end of the spectrum of abnormalities in ocular development. A few genes (SOX2, OTX2, RAX, and CHX10) have been implicated in isolated micro/anophthalmia, but causative mutations of these genes explain less than a quarter of these developmental defects. A specifically conserved SOX2/OTX2-mediated RAX expression regulatory sequence has recently been identified. We postulated that mutations in this sequence could lead to micro/anophthalmia, and thus we performed molecular screening of this regulatory element in patients suffering from micro/anophthalmia. Fifty-one patients suffering from nonsyndromic microphthalmia (n = 40) or anophthalmia (n = 11) were included in this study after negative molecular screening for SOX2, OTX2, RAX, and CHX10 mutations. Mutation screening of the RAX regulatory sequence was performed by direct sequencing for these patients. No mutations were identified in the highly conserved RAX regulatory sequence in any of the 51 patients. Mutations in the newly identified RAX regulatory sequence do not represent a frequent cause of nonsyndromic micro/anophthalmia.

  16. Genetic Mapping and Exome Sequencing Identify Variants Associated with Five Novel Diseases

    PubMed Central

    Puffenberger, Erik G.; Jinks, Robert N.; Sougnez, Carrie; Cibulskis, Kristian; Willert, Rebecca A.; Achilly, Nathan P.; Cassidy, Ryan P.; Fiorentini, Christopher J.; Heiken, Kory F.; Lawrence, Johnny J.; Mahoney, Molly H.; Miller, Christopher J.; Nair, Devika T.; Politi, Kristin A.; Worcester, Kimberly N.; Setton, Roni A.; DiPiazza, Rosa; Sherman, Eric A.; Eastman, James T.; Francklyn, Christopher; Robey-Bond, Susan; Rider, Nicholas L.; Gabriel, Stacey; Morton, D. Holmes; Strauss, Kevin A.

    2012-01-01

    The Clinic for Special Children (CSC) has integrated biochemical and molecular methods into a rural pediatric practice serving Old Order Amish and Mennonite (Plain) children. Among the Plain people, we have used single nucleotide polymorphism (SNP) microarrays to genetically map recessive disorders to large autozygous haplotype blocks (mean = 4.4 Mb) that contain many genes (mean = 79). For some, uninformative mapping or large gene lists preclude disease-gene identification by Sanger sequencing. Seven such conditions were selected for exome sequencing at the Broad Institute; all had been previously mapped at the CSC using low density SNP microarrays coupled with autozygosity and linkage analyses. Using between 1 and 5 patient samples per disorder, we identified sequence variants in the known disease-causing genes SLC6A3 and FLVCR1, and present evidence to strongly support the pathogenicity of variants identified in TUBGCP6, BRAT1, SNIP1, CRADD, and HARS. Our results reveal the power of coupling new genotyping technologies to population-specific genetic knowledge and robust clinical data. PMID:22279524

  17. Opsin cDNA sequences of a UV and green rhodopsin of the satyrine butterfly Bicyclus anynana.

    PubMed

    Vanhoutte, K J A; Eggen, B J L; Janssen, J J M; Stavenga, D G

    2002-11-01

    The cDNAs of an ultraviolet (UV) and long-wavelength (LW) (green) absorbing rhodopsin of the bush brown Bicyclus anynana were partially identified. The UV sequence, encoding 377 amino acids, is 76-79% identical to the UV sequences of the papilionids Papilio glaucus and Papilio xuthus and the moth Manduca sexta. A dendrogram derived from aligning the amino acid sequences reveals an equidistant position of Bicyclus between Papilio and Manduca. The sequence of the green opsin cDNA fragment, which encodes 242 amino acids, represents six of the seven transmembrane regions. At the amino acid level, this fragment is more than 80% identical to the corresponding LW opsin sequences of Dryas, Heliconius, Papilio (rhodopsin 2) and Manduca. Whereas three LW absorbing rhodopsins were identified in the papilionid butterflies, only one green opsin was found in B. anynana.

  18. Nucleotide sequence of the phosphoglycerate kinase gene from the extreme thermophile Thermus thermophilus. Comparison of the deduced amino acid sequence with that of the mesophilic yeast phosphoglycerate kinase.

    PubMed Central

    Bowen, D; Littlechild, J A; Fothergill, J E; Watson, H C; Hall, L

    1988-01-01

    Using oligonucleotide probes derived from amino acid sequencing information, the structural gene for phosphoglycerate kinase from the extreme thermophile, Thermus thermophilus, was cloned in Escherichia coli and its complete nucleotide sequence determined. The gene consists of an open reading frame corresponding to a protein of 390 amino acid residues (calculated Mr 41,791) with an extreme bias for G or C (93.1%) in the codon third base position. Comparison of the deduced amino acid sequence with that of the corresponding mesophilic yeast enzyme indicated a number of significant differences. These are discussed in terms of the unusual codon bias and their possible role in enhanced protein thermal stability. Images Fig. 1. PMID:3052437

  19. Statistical Methods for Identifying Sequence Motifs Affecting Point Mutations

    PubMed Central

    Zhu, Yicheng; Neeman, Teresa; Yap, Von Bing; Huttley, Gavin A.

    2017-01-01

    Mutation processes differ between types of point mutation, genomic locations, cells, and biological species. For some point mutations, specific neighboring bases are known to be mechanistically influential. Beyond these cases, numerous questions remain unresolved, including: what are the sequence motifs that affect point mutations? How large are the motifs? Are they strand symmetric? And, do they vary between samples? We present new log-linear models that allow explicit examination of these questions, along with sequence logo style visualization to enable identifying specific motifs. We demonstrate the performance of these methods by analyzing mutation processes in human germline and malignant melanoma. We recapitulate the known CpG effect, and identify novel motifs, including a highly significant motif associated with A→G mutations. We show that major effects of neighbors on germline mutation lie within ±2 of the mutating base. Models are also presented for contrasting the entire mutation spectra (the distribution of the different point mutations). We show the spectra vary significantly between autosomes and X-chromosome, with a difference in T→C transition dominating. Analyses of malignant melanoma confirmed reported characteristic features of this cancer, including statistically significant strand asymmetry, and markedly different neighboring influences. The methods we present are made freely available as a Python library https://bitbucket.org/pycogent3/mutationmotif. PMID:27974498

  20. Sequence quality analysis tool for HIV type 1 protease and reverse transcriptase.

    PubMed

    Delong, Allison K; Wu, Mingham; Bennett, Diane; Parkin, Neil; Wu, Zhijin; Hogan, Joseph W; Kantor, Rami

    2012-08-01

    Access to antiretroviral therapy is increasing globally and drug resistance evolution is anticipated. Currently, protease (PR) and reverse transcriptase (RT) sequence generation is increasing, including the use of in-house sequencing assays, and quality assessment prior to sequence analysis is essential. We created a computational HIV PR/RT Sequence Quality Analysis Tool (SQUAT) that runs in the R statistical environment. Sequence quality thresholds are calculated from a large dataset (46,802 PR and 44,432 RT sequences) from the published literature ( http://hivdb.Stanford.edu ). Nucleic acid sequences are read into SQUAT, identified, aligned, and translated. Nucleic acid sequences are flagged if with >five 1-2-base insertions; >one 3-base insertion; >one deletion; >six PR or >18 RT ambiguous bases; >three consecutive PR or >four RT nucleic acid mutations; >zero stop codons; >three PR or >six RT ambiguous amino acids; >three consecutive PR or >four RT amino acid mutations; >zero unique amino acids; or <0.5% or >15% genetic distance from another submitted sequence. Thresholds are user modifiable. SQUAT output includes a summary report with detailed comments for troubleshooting of flagged sequences, histograms of pairwise genetic distances, neighbor joining phylogenetic trees, and aligned nucleic and amino acid sequences. SQUAT is a stand-alone, free, web-independent tool to ensure use of high-quality HIV PR/RT sequences in interpretation and reporting of drug resistance, while increasing awareness and expertise and facilitating troubleshooting of potentially problematic sequences.

  1. Unique Trichomonas vaginalis gene sequences identified in multinational regions of Northwest China.

    PubMed

    Liu, Jun; Feng, Meng; Wang, Xiaolan; Fu, Yongfeng; Ma, Cailing; Cheng, Xunjia

    2017-07-24

    Trichomonas vaginalis (T. vaginalis) is a flagellated protozoan parasite that infects humans worldwide. This study determined the sequence of the 18S ribosomal RNA gene of T. vaginalis infecting both females and males in Xinjiang, China. Samples from 73 females and 28 males were collected and confirmed for infection with T. vaginalis, a total of 110 sequences were identified when the T. vaginalis 18S ribosomal RNA gene was sequenced. These sequences were used to prepare a phylogenetic network. The rooted network comprised three large clades and several independent branches. Most of the Xinjiang sequences were in one group. Preliminary results suggest that Xinjiang T. vaginalis isolates might be genetically unique, as indicated by the sequence of their 18S ribosomal RNA gene. Low migration rate of local people in this province may contribute to a genetic conservativeness of T. vaginalis. The unique genetic feature of our isolates may suggest a different clinical presentation of trichomoniasis, including metronidazole susceptibility, T. vaginalis virus or Mycoplasma co-infection characteristics. The transmission and evolution of Xinjiang T. vaginalis is of interest and should be studied further. More attention should be given to T. vaginalis infection in both females and males in Xinjiang.

  2. Incorporation of unique molecular identifiers in TruSeq adapters improves the accuracy of quantitative sequencing.

    PubMed

    Hong, Jungeui; Gresham, David

    2017-11-01

    Quantitative analysis of next-generation sequencing (NGS) data requires discriminating duplicate reads generated by PCR from identical molecules that are of unique origin. Typically, PCR duplicates are identified as sequence reads that align to the same genomic coordinates using reference-based alignment. However, identical molecules can be independently generated during library preparation. Misidentification of these molecules as PCR duplicates can introduce unforeseen biases during analyses. Here, we developed a cost-effective sequencing adapter design by modifying Illumina TruSeq adapters to incorporate a unique molecular identifier (UMI) while maintaining the capacity to undertake multiplexed, single-index sequencing. Incorporation of UMIs into TruSeq adapters (TrUMIseq adapters) enables identification of bona fide PCR duplicates as identically mapped reads with identical UMIs. Using TrUMIseq adapters, we show that accurate removal of PCR duplicates results in improved accuracy of both allele frequency (AF) estimation in heterogeneous populations using DNA sequencing and gene expression quantification using RNA-Seq.

  3. Whole-exome sequencing identifies USH2A mutations in a pseudo-dominant Usher syndrome family.

    PubMed

    Zheng, Sui-Lian; Zhang, Hong-Liang; Lin, Zhen-Lang; Kang, Qian-Yan

    2015-10-01

    Usher syndrome (USH) is an autosomal recessive (AR) multi-sensory degenerative disorder leading to deaf-blindness. USH is clinically subdivided into three subclasses, and 10 genes have been identified thus far. Clinical and genetic heterogeneities in USH make a precise diagnosis difficult. A dominant‑like USH family in successive generations was identified, and the present study aimed to determine the genetic predisposition of this family. Whole‑exome sequencing was performed in two affected patients and an unaffected relative. Systematic data were analyzed by bioinformatic analysis to remove the candidate mutations via step‑wise filtering. Direct Sanger sequencing and co‑segregation analysis were performed in the pedigree. One novel and two known mutations in the USH2A gene were identified, and were further confirmed by direct sequencing and co‑segregation analysis. The affected mother carried compound mutations in the USH2A gene, while the unaffected father carried a heterozygous mutation. The present study demonstrates that whole‑exome sequencing is a robust approach for the molecular diagnosis of disorders with high levels of genetic heterogeneity.

  4. Human somatostatin I: sequence of the cDNA.

    PubMed Central

    Shen, L P; Pictet, R L; Rutter, W J

    1982-01-01

    RNA has been isolated from a human pancreatic somatostatinoma and used to prepare a cDNA library. After prescreening, clones containing somatostatin I sequences were identified by hybridization with an anglerfish somatostatin I-cloned cDNA probe. From the nucleotide sequence of two of these clones, we have deduced an essentially full-length mRNA sequence, including the preprosomatostatin coding region, 105 nucleotides from the 5' untranslated region and the complete 150-nucleotide 3' untranslated region. The coding region predicts a 116-amino acid precursor protein (Mr, 12.727) that contains somatostatin-14 and -28 at its COOH terminus. The predicted amino acid sequence of human somatostatin-28 is identical to that of somatostatin-28 isolated from the porcine and ovine species. A comparison of the amino acid sequences of human and anglerfish preprosomatostatin I indicated that the COOH-terminal region encoding somatostatin-14 and the adjacent 6 amino acids are highly conserved, whereas the remainder of the molecule, including the signal peptide region, is more divergent. However, many of the amino acid differences found in the pro region of the human and anglerfish proteins are conservative changes. This suggests that the propeptides have a similar secondary structure, which in turn may imply a biological function for this region of the molecule. Images PMID:6126875

  5. An Evolutionarily Young Polar Bear (Ursus maritimus) Endogenous Retrovirus Identified from Next Generation Sequence Data.

    PubMed

    Tsangaras, Kyriakos; Mayer, Jens; Alquezar-Planas, David E; Greenwood, Alex D

    2015-11-24

    Transcriptome analysis of polar bear (Ursus maritimus) tissues identified sequences with similarity to Porcine Endogenous Retroviruses (PERV). Based on these sequences, four proviral copies and 15 solo long terminal repeats (LTRs) of a newly described endogenous retrovirus were characterized from the polar bear draft genome sequence. Closely related sequences were identified by PCR analysis of brown bear (Ursus arctos) and black bear (Ursus americanus) but were absent in non-Ursinae bear species. The virus was therefore designated UrsusERV. Two distinct groups of LTRs were observed including a recombinant ERV that contained one LTR belonging to each group indicating that genomic invasions by at least two UrsusERV variants have recently occurred. Age estimates based on proviral LTR divergence and conservation of integration sites among ursids suggest the viral group is only a few million years old. The youngest provirus was polar bear specific, had intact open reading frames (ORFs) and could potentially encode functional proteins. Phylogenetic analyses of UrsusERV consensus protein sequences suggest that it is part of a pig, gibbon and koala retrovirus clade. The young age estimates and lineage specificity of the virus suggests UrsusERV is a recent cross species transmission from an unknown reservoir and places the viral group among the youngest of ERVs identified in mammals.

  6. An Evolutionarily Young Polar Bear (Ursus maritimus) Endogenous Retrovirus Identified from Next Generation Sequence Data

    PubMed Central

    Tsangaras, Kyriakos; Mayer, Jens; Alquezar-Planas, David E.; Greenwood, Alex D.

    2015-01-01

    Transcriptome analysis of polar bear (Ursus maritimus) tissues identified sequences with similarity to Porcine Endogenous Retroviruses (PERV). Based on these sequences, four proviral copies and 15 solo long terminal repeats (LTRs) of a newly described endogenous retrovirus were characterized from the polar bear draft genome sequence. Closely related sequences were identified by PCR analysis of brown bear (Ursus arctos) and black bear (Ursus americanus) but were absent in non-Ursinae bear species. The virus was therefore designated UrsusERV. Two distinct groups of LTRs were observed including a recombinant ERV that contained one LTR belonging to each group indicating that genomic invasions by at least two UrsusERV variants have recently occurred. Age estimates based on proviral LTR divergence and conservation of integration sites among ursids suggest the viral group is only a few million years old. The youngest provirus was polar bear specific, had intact open reading frames (ORFs) and could potentially encode functional proteins. Phylogenetic analyses of UrsusERV consensus protein sequences suggest that it is part of a pig, gibbon and koala retrovirus clade. The young age estimates and lineage specificity of the virus suggests UrsusERV is a recent cross species transmission from an unknown reservoir and places the viral group among the youngest of ERVs identified in mammals. PMID:26610552

  7. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... in WIPO Standard ST.25 (1998), Appendix 2, Tables 1 and 3. This incorporation by reference was... ST.25 (1998), Appendix 2, Tables 1 and 3, shall be listed in a given sequence as “n” or “Xaa... acids. (1) The amino acids in a protein or peptide sequence shall be listed using the three-letter...

  8. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... in WIPO Standard ST.25 (1998), Appendix 2, Tables 1 and 3. This incorporation by reference was... ST.25 (1998), Appendix 2, Tables 1 and 3, shall be listed in a given sequence as “n” or “Xaa... acids. (1) The amino acids in a protein or peptide sequence shall be listed using the three-letter...

  9. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... in WIPO Standard ST.25 (1998), Appendix 2, Tables 1 and 3. This incorporation by reference was... ST.25 (1998), Appendix 2, Tables 1 and 3, shall be listed in a given sequence as “n” or “Xaa... acids. (1) The amino acids in a protein or peptide sequence shall be listed using the three-letter...

  10. Identification of metal ion binding sites based on amino acid sequences.

    PubMed

    Cao, Xiaoyong; Hu, Xiuzhen; Zhang, Xiaojin; Gao, Sujuan; Ding, Changjiang; Feng, Yonge; Bao, Weihua

    2017-01-01

    The identification of metal ion binding sites is important for protein function annotation and the design of new drug molecules. This study presents an effective method of analyzing and identifying the binding residues of metal ions based solely on sequence information. Ten metal ions were extracted from the BioLip database: Zn2+, Cu2+, Fe2+, Fe3+, Ca2+, Mg2+, Mn2+, Na+, K+ and Co2+. The analysis showed that Zn2+, Cu2+, Fe2+, Fe3+, and Co2+ were sensitive to the conservation of amino acids at binding sites, and promising results can be achieved using the Position Weight Scoring Matrix algorithm, with an accuracy of over 79.9% and a Matthews correlation coefficient of over 0.6. The binding sites of other metals can also be accurately identified using the Support Vector Machine algorithm with multifeature parameters as input. In addition, we found that Ca2+ was insensitive to hydrophobicity and hydrophilicity information and Mn2+ was insensitive to polarization charge information. An online server was constructed based on the framework of the proposed method and is freely available at http://60.31.198.140:8081/metal/HomePage/HomePage.html.

  11. Identification of metal ion binding sites based on amino acid sequences

    PubMed Central

    Cao, Xiaoyong; Zhang, Xiaojin; Gao, Sujuan; Ding, Changjiang; Feng, Yonge; Bao, Weihua

    2017-01-01

    The identification of metal ion binding sites is important for protein function annotation and the design of new drug molecules. This study presents an effective method of analyzing and identifying the binding residues of metal ions based solely on sequence information. Ten metal ions were extracted from the BioLip database: Zn2+, Cu2+, Fe2+, Fe3+, Ca2+, Mg2+, Mn2+, Na+, K+ and Co2+. The analysis showed that Zn2+, Cu2+, Fe2+, Fe3+, and Co2+ were sensitive to the conservation of amino acids at binding sites, and promising results can be achieved using the Position Weight Scoring Matrix algorithm, with an accuracy of over 79.9% and a Matthews correlation coefficient of over 0.6. The binding sites of other metals can also be accurately identified using the Support Vector Machine algorithm with multifeature parameters as input. In addition, we found that Ca2+ was insensitive to hydrophobicity and hydrophilicity information and Mn2+ was insensitive to polarization charge information. An online server was constructed based on the framework of the proposed method and is freely available at http://60.31.198.140:8081/metal/HomePage/HomePage.html. PMID:28854211

  12. Draft genome sequence of the docosahexaenoic acid producing thraustochytrid Aurantiochytrium sp. T66.

    PubMed

    Liu, Bin; Ertesvåg, Helga; Aasen, Inga Marie; Vadstein, Olav; Brautaset, Trygve; Heggeset, Tonje Marita Bjerkan

    2016-06-01

    Thraustochytrids are unicellular, marine protists, and there is a growing industrial interest in these organisms, particularly because some species, including strains belonging to the genus Aurantiochytrium, accumulate high levels of docosahexaenoic acid (DHA). Here, we report the draft genome sequence of Aurantiochytrium sp. T66 (ATCC PRA-276), with a size of 43 Mbp, and 11,683 predicted protein-coding sequences. The data has been deposited at DDBJ/EMBL/Genbank under the accession LNGJ00000000. The genome sequence will contribute new insight into DHA biosynthesis and regulation, providing a basis for metabolic engineering of thraustochytrids.

  13. Random Amplification and Pyrosequencing for Identification of Novel Viral Genome Sequences

    PubMed Central

    Hang, Jun; Forshey, Brett M.; Kochel, Tadeusz J.; Li, Tao; Solórzano, Víctor Fiestas; Halsey, Eric S.; Kuschner, Robert A.

    2012-01-01

    ssRNA viruses have high levels of genomic divergence, which can lead to difficulty in genomic characterization of new viruses using traditional PCR amplification and sequencing methods. In this study, random reverse transcription, anchored random PCR amplification, and high-throughput pyrosequencing were used to identify orthobunyavirus sequences from total RNA extracted from viral cultures of acute febrile illness specimens. Draft genome sequence for the orthobunyavirus L segment was assembled and sequentially extended using de novo assembly contigs from pyrosequencing reads and orthobunyavirus sequences in GenBank as guidance. Accuracy and continuous coverage were achieved by mapping all reads to the L segment draft sequence. Subsequently, RT-PCR and Sanger sequencing were used to complete the genome sequence. The complete L segment was found to be 6936 bases in length, encoding a 2248-aa putative RNA polymerase. The identified L segment was distinct from previously published South American orthobunyaviruses, sharing 63% and 54% identity at the nucleotide and amino acid level, respectively, with the complete Oropouche virus L segment and 73% and 81% identity at the nucleotide and amino acid level, respectively, with a partial Caraparu virus L segment. The result demonstrated the effectiveness of a sequence-independent amplification and next-generation sequencing approach for obtaining complete viral genomes from total nucleic acid extracts and its use in pathogen discovery. PMID:22468136

  14. ANCAC: amino acid, nucleotide, and codon analysis of COGs – a tool for sequence bias analysis in microbial orthologs

    PubMed Central

    2012-01-01

    Background The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Results Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC’s NUCOCOG dataset as the largest one available for that purpose thus far. Conclusions Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills. PMID:22958836

  15. Comparison of complete genome sequences of dog rabies viruses isolated from China and Mexico reveals key amino acid changes that may be associated with virus replication and virulence.

    PubMed

    Yu, Fulai; Zhang, Guoqing; Zhong, Xiangfu; Han, Na; Song, Yunfeng; Zhao, Ling; Cui, Min; Rayner, Simon; Fu, Zhen F

    2014-07-01

    Rabies is a global problem, but its impact and prevalence vary across different regions. In some areas, such as parts of Africa and Asia, the virus is prevalent in the domestic dog population, leading to epidemic waves and large numbers of human fatalities. In other regions, such as the Americas, the virus predominates in wildlife and bat populations, with sporadic spillover into domestic animals. In this work, we attempted to investigate whether these distinct environments led to selective pressures that result in measurable changes within the genome at the amino acid level. To this end, we collected and sequenced the full genome of two isolates from divergent environments. The first isolate (DRV-AH08) was from China, where the virus is present in the dog population and the country is experiencing a serious epidemic. The second isolate (DRV-Mexico) was taken from Mexico, where the virus is present in both wildlife and domestic dog populations, but at low levels as a consequence of an effective vaccination program. We then combined and compared these with other full genome sequences to identify distinct amino acid changes that might be associated with environment. Phylogenetic analysis identified strain DRV-AH08 as belonging to the China-I lineage, which has emerged to become the dominant lineage in the current epidemic. The Mexico strain was placed in the D11 Mexico lineage, associated with the West USA-Mexico border clade. Amino acid sequence analysis identified only 17 amino acid differences in the N, G and L proteins. These differences may be associated with virus replication and virulence-for example, the short incubation period observed in the current epidemic in China.

  16. Amino acid selective unlabeling for sequence specific resonance assignments in proteins

    PubMed Central

    Krishnarjuna, B.; Jaipuria, Garima; Thakur, Anushikha

    2010-01-01

    Sequence specific resonance assignment constitutes an important step towards high-resolution structure determination of proteins by NMR and is aided by selective identification and assignment of amino acid types. The traditional approach to selective labeling yields only the chemical shifts of the particular amino acid being selected and does not help in establishing a link between adjacent residues along the polypeptide chain, which is important for sequential assignments. An alternative approach is the method of amino acid selective ‘unlabeling’ or reverse labeling, which involves selective unlabeling of specific amino acid types against a uniformly 13C/15N labeled background. Based on this method, we present a novel approach for sequential assignments in proteins. The method involves a new NMR experiment named, {12COi–15Ni+1}-filtered HSQC, which aids in linking the 1HN/15N resonances of the selectively unlabeled residue, i, and its C-terminal neighbor, i + 1, in HN-detected double and triple resonance spectra. This leads to the assignment of a tri-peptide segment from the knowledge of the amino acid types of residues: i − 1, i and i + 1, thereby speeding up the sequential assignment process. The method has the advantage of being relatively inexpensive, applicable to 2H labeled protein and can be coupled with cell-free synthesis and/or automated assignment approaches. A detailed survey involving unlabeling of different amino acid types individually or in pairs reveals that the proposed approach is also robust to misincorporation of 14N at undesired sites. Taken together, this study represents the first application of selective unlabeling for sequence specific resonance assignments and opens up new avenues to using this methodology in protein structural studies. Electronic supplementary material The online version of this article (doi:10.1007/s10858-010-9459-z) contains supplementary material, which is available to authorized users. PMID:21153044

  17. Nucleic Acid Amplification Testing and Sequencing Combined with Acid-Fast Staining in Needle Biopsy Lung Tissues for the Diagnosis of Smear-Negative Pulmonary Tuberculosis.

    PubMed

    Jiang, Faming; Huang, Weiwei; Wang, Ye; Tian, Panwen; Chen, Xuerong; Liang, Zongan

    2016-01-01

    sensitivity, specificity, PPV, NPV, and accuracy to be 94.0% (125/133), 95.8% (46/48), 98.4% (125/127), 85.2% (46/54), and 94.5% (171/181), respectively. Among patients with positive AFB and negative PCR results in lung tissue specimens, two were diagnosed with NTM infections (Mycobacterium avium-intracellulare complex and Mycobacterium kansasii). Nucleic acid amplification testing combined with acid-fast staining in lung biopsy tissues can lead to early and accurate diagnosis in patients with smear-negative pulmonary tuberculosis. For patients with positive histological AFB and negative tuberculous PCR results in lung tissue, NTM infection should be suspected and could be identified by specific probe assays or 16S rRNA sequencing.

  18. 37 CFR 1.824 - Form and format for nucleotide and/or amino acid sequence submissions in computer readable form.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2010-07-01 2010-07-01 false Form and format for... And/or Amino Acid Sequences § 1.824 Form and format for nucleotide and/or amino acid sequence... Code for Information Interchange (ASCII) text. No other formats shall be allowed. (3) The computer...

  19. Sequencing, bioinformatic characterization and expression pattern of a putative amino acid transporter from the parasitic cestode Echinococcus granulosus.

    PubMed

    Camicia, Federico; Paredes, Rodolfo; Chalar, Cora; Galanti, Norbel; Kamenetzky, Laura; Gutierrez, Ariana; Rosenzvit, Mara C

    2008-03-31

    We have sequenced and partially characterized an Echinococcus granulosus cDNA, termed egat1, from a protoscolex signal sequence trap (SST) cDNA library. The isolated 1627 bp long cDNA contains an ORF of 489 amino acids and shows an amino acid identity of 30% with neutral and excitatory amino acid transporters members of the Dicarboxylate/Amino Acid Na+ and/or H+ Cation Symporter family (DAACS) (TC 2.A.23). Additional bioinformatics analysis of EgAT1, confirmed the results obtained by similarity searches and showed the presence of 9 to 10 transmembrane domains, consensus sequences for N-glycosylation between the third and fourth transmembrane domain, a highly similar hydropathy profile with ASCT1 (a known member of DAACS family), high score with SDF (Sodium Dicarboxilate Family) and similar motifs with EDTRANSPORT, a fingerprint of excitatory amino acid transporters. The localization of the putative amino acid transporter was analyzed by in situ hybridization and immunofluorescence in protoscoleces and associated germinal layer. The in situ hybridization labelling indicates the distribution of egat1 mRNA throughout the tegument. EgAT1 protein, which showed in Western blots a molecular mass of approximately 60 kD, is localized in the subtegumental region of the metacestode, particularly around suckers and rostellum of protoscoleces and layers from brood capsules. The sequence and expression analyses of EgAT1 pave the way for functional analysis of amino acids transporters of E. granulosus and its evaluation as new drug targets against cystic echinococcosis.

  20. Amino- and carboxyl-terminal amino acid sequences of proteins coded by gag gene of murine leukemia virus

    PubMed Central

    Oroszlan, Stephen; Henderson, Louis E.; Stephenson, John R.; Copeland, Terry D.; Long, Cedric W.; Ihle, James N.; Gilden, Raymond V.

    1978-01-01

    The amino- and carboxyl-terminal amino acid sequences of proteins (p10, p12, p15, and p30) coded by the gag gene of Rauscher and AKR murine leukemia viruses were determined. Among these proteins, p15 from both viruses appears to have a blocked amino end. Proline was found to be the common NH2 terminus of both p30s and both p12s, and alanine of both p10s. The amino-terminal sequences of p30s are identical, as are those of p10s, while the p12 sequences are clearly distinctive but also show substantial homology. The carboxyl-terminal amino acids of both viral p30s and p12s are leucine and phenylalanine, respectively. Rauscher leukemia virus p15 has tyrosine as the carboxyl terminus while AKR virus p15 has phenylalanine in this position. The compositional and sequence data provide definite chemical criteria for the identification of analogous gag gene products and for the comparison of viral proteins isolated in different laboratories. On the basis of amino acid sequences and the previously proposed H-p15-p12-p30-p10-COOH peptide sequence in the precursor polyprotein, a model for cleavage sites involved in the post-translational processing of the precursor coded for by the gag gene is proposed. PMID:206897

  1. Whole-Exome Sequencing Identifies Rare and Low-Frequency Coding Variants Associated with LDL Cholesterol

    PubMed Central

    Lange, Leslie A.; Hu, Youna; Zhang, He; Xue, Chenyi; Schmidt, Ellen M.; Tang, Zheng-Zheng; Bizon, Chris; Lange, Ethan M.; Smith, Joshua D.; Turner, Emily H.; Jun, Goo; Kang, Hyun Min; Peloso, Gina; Auer, Paul; Li, Kuo-ping; Flannick, Jason; Zhang, Ji; Fuchsberger, Christian; Gaulton, Kyle; Lindgren, Cecilia; Locke, Adam; Manning, Alisa; Sim, Xueling; Rivas, Manuel A.; Holmen, Oddgeir L.; Gottesman, Omri; Lu, Yingchang; Ruderfer, Douglas; Stahl, Eli A.; Duan, Qing; Li, Yun; Durda, Peter; Jiao, Shuo; Isaacs, Aaron; Hofman, Albert; Bis, Joshua C.; Correa, Adolfo; Griswold, Michael E.; Jakobsdottir, Johanna; Smith, Albert V.; Schreiner, Pamela J.; Feitosa, Mary F.; Zhang, Qunyuan; Huffman, Jennifer E.; Crosby, Jacy; Wassel, Christina L.; Do, Ron; Franceschini, Nora; Martin, Lisa W.; Robinson, Jennifer G.; Assimes, Themistocles L.; Crosslin, David R.; Rosenthal, Elisabeth A.; Tsai, Michael; Rieder, Mark J.; Farlow, Deborah N.; Folsom, Aaron R.; Lumley, Thomas; Fox, Ervin R.; Carlson, Christopher S.; Peters, Ulrike; Jackson, Rebecca D.; van Duijn, Cornelia M.; Uitterlinden, André G.; Levy, Daniel; Rotter, Jerome I.; Taylor, Herman A.; Gudnason, Vilmundur; Siscovick, David S.; Fornage, Myriam; Borecki, Ingrid B.; Hayward, Caroline; Rudan, Igor; Chen, Y. Eugene; Bottinger, Erwin P.; Loos, Ruth J.F.; Sætrom, Pål; Hveem, Kristian; Boehnke, Michael; Groop, Leif; McCarthy, Mark; Meitinger, Thomas; Ballantyne, Christie M.; Gabriel, Stacey B.; O’Donnell, Christopher J.; Post, Wendy S.; North, Kari E.; Reiner, Alexander P.; Boerwinkle, Eric; Psaty, Bruce M.; Altshuler, David; Kathiresan, Sekar; Lin, Dan-Yu; Jarvik, Gail P.; Cupples, L. Adrienne; Kooperberg, Charles; Wilson, James G.; Nickerson, Deborah A.; Abecasis, Goncalo R.; Rich, Stephen S.; Tracy, Russell P.; Willer, Cristen J.; Gabriel, Stacey B.; Altshuler, David M.; Abecasis, Gonçalo R.; Allayee, Hooman; Cresci, Sharon; Daly, Mark J.; de Bakker, Paul I.W.; DePristo, Mark A.; Do, Ron; Donnelly, Peter; Farlow, Deborah N.; Fennell, Tim; Garimella, Kiran; Hazen, Stanley L.; Hu, Youna; Jordan, Daniel M.; Jun, Goo; Kathiresan, Sekar; Kang, Hyun Min; Kiezun, Adam; Lettre, Guillaume; Li, Bingshan; Li, Mingyao; Newton-Cheh, Christopher H.; Padmanabhan, Sandosh; Peloso, Gina; Pulit, Sara; Rader, Daniel J.; Reich, David; Reilly, Muredach P.; Rivas, Manuel A.; Schwartz, Steve; Scott, Laura; Siscovick, David S.; Spertus, John A.; Stitziel, Nathaniel O.; Stoletzki, Nina; Sunyaev, Shamil R.; Voight, Benjamin F.; Willer, Cristen J.; Rich, Stephen S.; Akylbekova, Ermeg; Atwood, Larry D.; Ballantyne, Christie M.; Barbalic, Maja; Barr, R. Graham; Benjamin, Emelia J.; Bis, Joshua; Boerwinkle, Eric; Bowden, Donald W.; Brody, Jennifer; Budoff, Matthew; Burke, Greg; Buxbaum, Sarah; Carr, Jeff; Chen, Donna T.; Chen, Ida Y.; Chen, Wei-Min; Concannon, Pat; Crosby, Jacy; Cupples, L. Adrienne; D’Agostino, Ralph; DeStefano, Anita L.; Dreisbach, Albert; Dupuis, Josée; Durda, J. Peter; Ellis, Jaclyn; Folsom, Aaron R.; Fornage, Myriam; Fox, Caroline S.; Fox, Ervin; Funari, Vincent; Ganesh, Santhi K.; Gardin, Julius; Goff, David; Gordon, Ora; Grody, Wayne; Gross, Myron; Guo, Xiuqing; Hall, Ira M.; Heard-Costa, Nancy L.; Heckbert, Susan R.; Heintz, Nicholas; Herrington, David M.; Hickson, DeMarc; Huang, Jie; Hwang, Shih-Jen; Jacobs, David R.; Jenny, Nancy S.; Johnson, Andrew D.; Johnson, Craig W.; Kawut, Steven; Kronmal, Richard; Kurz, Raluca; Lange, Ethan M.; Lange, Leslie A.; Larson, Martin G.; Lawson, Mark; Lewis, Cora E.; Levy, Daniel; Li, Dalin; Lin, Honghuang; Liu, Chunyu; Liu, Jiankang; Liu, Kiang; Liu, Xiaoming; Liu, Yongmei; Longstreth, William T.; Loria, Cay; Lumley, Thomas; Lunetta, Kathryn; Mackey, Aaron J.; Mackey, Rachel; Manichaikul, Ani; Maxwell, Taylor; McKnight, Barbara; Meigs, James B.; Morrison, Alanna C.; Musani, Solomon K.; Mychaleckyj, Josyf C.; Nettleton, Jennifer A.; North, Kari; O’Donnell, Christopher J.; O’Leary, Daniel; Ong, Frank; Palmas, Walter; Pankow, James S.; Pankratz, Nathan D.; Paul, Shom; Perez, Marco; Person, Sharina D.; Polak, Joseph; Post, Wendy S.; Psaty, Bruce M.; Quinlan, Aaron R.; Raffel, Leslie J.; Ramachandran, Vasan S.; Reiner, Alexander P.; Rice, Kenneth; Rotter, Jerome I.; Sanders, Jill P.; Schreiner, Pamela; Seshadri, Sudha; Shea, Steve; Sidney, Stephen; Silverstein, Kevin; Smith, Nicholas L.; Sotoodehnia, Nona; Srinivasan, Asoke; Taylor, Herman A.; Taylor, Kent; Thomas, Fridtjof; Tracy, Russell P.; Tsai, Michael Y.; Volcik, Kelly A.; Wassel, Chrstina L.; Watson, Karol; Wei, Gina; White, Wendy; Wiggins, Kerri L.; Wilk, Jemma B.; Williams, O. Dale; Wilson, Gregory; Wilson, James G.; Wolf, Phillip; Zakai, Neil A.; Hardy, John; Meschia, James F.; Nalls, Michael; Singleton, Andrew; Worrall, Brad; Bamshad, Michael J.; Barnes, Kathleen C.; Abdulhamid, Ibrahim; Accurso, Frank; Anbar, Ran; Beaty, Terri; Bigham, Abigail; Black, Phillip; Bleecker, Eugene; Buckingham, Kati; Cairns, Anne Marie; Caplan, Daniel; Chatfield, Barbara; Chidekel, Aaron; Cho, Michael; Christiani, David C.; Crapo, James D.; Crouch, Julia; Daley, Denise; Dang, Anthony; Dang, Hong; De Paula, Alicia; DeCelie-Germana, Joan; Drumm, Allen DozorMitch; Dyson, Maynard; Emerson, Julia; Emond, Mary J.; Ferkol, Thomas; Fink, Robert; Foster, Cassandra; Froh, Deborah; Gao, Li; Gershan, William; Gibson, Ronald L.; Godwin, Elizabeth; Gondor, Magdalen; Gutierrez, Hector; Hansel, Nadia N.; Hassoun, Paul M.; Hiatt, Peter; Hokanson, John E.; Howenstine, Michelle; Hummer, Laura K.; Kanga, Jamshed; Kim, Yoonhee; Knowles, Michael R.; Konstan, Michael; Lahiri, Thomas; Laird, Nan; Lange, Christoph; Lin, Lin; Lin, Xihong; Louie, Tin L.; Lynch, David; Make, Barry; Martin, Thomas R.; Mathai, Steve C.; Mathias, Rasika A.; McNamara, John; McNamara, Sharon; Meyers, Deborah; Millard, Susan; Mogayzel, Peter; Moss, Richard; Murray, Tanda; Nielson, Dennis; Noyes, Blakeslee; O’Neal, Wanda; Orenstein, David; O’Sullivan, Brian; Pace, Rhonda; Pare, Peter; Parker, H. Worth; Passero, Mary Ann; Perkett, Elizabeth; Prestridge, Adrienne; Rafaels, Nicholas M.; Ramsey, Bonnie; Regan, Elizabeth; Ren, Clement; Retsch-Bogart, George; Rock, Michael; Rosen, Antony; Rosenfeld, Margaret; Ruczinski, Ingo; Sanford, Andrew; Schaeffer, David; Sell, Cindy; Sheehan, Daniel; Silverman, Edwin K.; Sin, Don; Spencer, Terry; Stonebraker, Jackie; Tabor, Holly K.; Varlotta, Laurie; Vergara, Candelaria I.; Weiss, Robert; Wigley, Fred; Wise, Robert A.; Wright, Fred A.; Wurfel, Mark M.; Zanni, Robert; Zou, Fei; Nickerson, Deborah A.; Rieder, Mark J.; Green, Phil; Shendure, Jay; Akey, Joshua M.; Bustamante, Carlos D.; Crosslin, David R.; Eichler, Evan E.; Fox, P. Keolu; Fu, Wenqing; Gordon, Adam; Gravel, Simon; Jarvik, Gail P.; Johnsen, Jill M.; Kan, Mengyuan; Kenny, Eimear E.; Kidd, Jeffrey M.; Lara-Garduno, Fremiet; Leal, Suzanne M.; Liu, Dajiang J.; McGee, Sean; O’Connor, Timothy D.; Paeper, Bryan; Robertson, Peggy D.; Smith, Joshua D.; Staples, Jeffrey C.; Tennessen, Jacob A.; Turner, Emily H.; Wang, Gao; Yi, Qian; Jackson, Rebecca; Peters, Ulrike; Carlson, Christopher S.; Anderson, Garnet; Anton-Culver, Hoda; Assimes, Themistocles L.; Auer, Paul L.; Beresford, Shirley; Bizon, Chris; Black, Henry; Brunner, Robert; Brzyski, Robert; Burwen, Dale; Caan, Bette; Carty, Cara L.; Chlebowski, Rowan; Cummings, Steven; Curb, J. David; Eaton, Charles B.; Ford, Leslie; Franceschini, Nora; Fullerton, Stephanie M.; Gass, Margery; Geller, Nancy; Heiss, Gerardo; Howard, Barbara V.; Hsu, Li; Hutter, Carolyn M.; Ioannidis, John; Jiao, Shuo; Johnson, Karen C.; Kooperberg, Charles; Kuller, Lewis; LaCroix, Andrea; Lakshminarayan, Kamakshi; Lane, Dorothy; Lasser, Norman; LeBlanc, Erin; Li, Kuo-Ping; Limacher, Marian; Lin, Dan-Yu; Logsdon, Benjamin A.; Ludlam, Shari; Manson, JoAnn E.; Margolis, Karen; Martin, Lisa; McGowan, Joan; Monda, Keri L.; Kotchen, Jane Morley; Nathan, Lauren; Ockene, Judith; O’Sullivan, Mary Jo; Phillips, Lawrence S.; Prentice, Ross L.; Robbins, John; Robinson, Jennifer G.; Rossouw, Jacques E.; Sangi-Haghpeykar, Haleh; Sarto, Gloria E.; Shumaker, Sally; Simon, Michael S.; Stefanick, Marcia L.; Stein, Evan; Tang, Hua; Taylor, Kira C.; Thomson, Cynthia A.; Thornton, Timothy A.; Van Horn, Linda; Vitolins, Mara; Wactawski-Wende, Jean; Wallace, Robert; Wassertheil-Smoller, Sylvia; Zeng, Donglin; Applebaum-Bowden, Deborah; Feolo, Michael; Gan, Weiniu; Paltoo, Dina N.; Sholinsky, Phyliss; Sturcke, Anne

    2014-01-01

    Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98th or <2nd percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments. PMID:24507775

  2. Functional Brain Activation Differences in Stuttering Identified with a Rapid fMRI Sequence

    ERIC Educational Resources Information Center

    Loucks, Torrey; Kraft, Shelly Jo; Choo, Ai Leen; Sharma, Harish; Ambrose, Nicoline G.

    2011-01-01

    The purpose of this study was to investigate whether brain activity related to the presence of stuttering can be identified with rapid functional MRI (fMRI) sequences that involved overt and covert speech processing tasks. The long-term goal is to develop sensitive fMRI approaches with developmentally appropriate tasks to identify deviant speech…

  3. Whole-Exome Sequencing to Identify Novel Biological Pathways Associated With Infertility After Pelvic Inflammatory Disease.

    PubMed

    Taylor, Brandie D; Zheng, Xiaojing; Darville, Toni; Zhong, Wujuan; Konganti, Kranti; Abiodun-Ojo, Olayinka; Ness, Roberta B; O'Connell, Catherine M; Haggerty, Catherine L

    2017-01-01

    Ideal management of sexually transmitted infections (STI) may require risk markers for pathology or vaccine development. Previously, we identified common genetic variants associated with chlamydial pelvic inflammatory disease (PID) and reduced fecundity. As this explains only a proportion of the long-term morbidity risk, we used whole-exome sequencing to identify biological pathways that may be associated with STI-related infertility. We obtained stored DNA from 43 non-Hispanic black women with PID from the PID Evaluation and Clinical Health Study. Infertility was assessed at a mean of 84 months. Principal component analysis revealed no population stratification. Potential covariates did not significantly differ between groups. Sequencing kernel association test was used to examine associations between aggregates of variants on a single gene and infertility. The results from the sequencing kernel association test were used to choose "focus genes" (P < 0.01; n = 150) for subsequent Ingenuity Pathway Analysis to identify "gene sets" that are enriched in biologically relevant pathways. Pathway analysis revealed that focus genes were enriched in canonical pathways including, IL-1 signaling, P2Y purinergic receptor signaling, and bone morphogenic protein signaling. Focus genes were enriched in pathways that impact innate and adaptive immunity, protein kinase A activity, cellular growth, and DNA repair. These may alter host resistance or immunopathology after infection. Targeted sequencing of biological pathways identified in this study may provide insight into STI-related infertility.

  4. Implication of the cause of differences in 3D structures of proteins with high sequence identity based on analyses of amino acid sequences and 3D structures.

    PubMed

    Matsuoka, Masanari; Sugita, Masatake; Kikuchi, Takeshi

    2014-09-18

    Proteins that share a high sequence homology while exhibiting drastically different 3D structures are investigated in this study. Recently, artificial proteins related to the sequences of the GA and IgG binding GB domains of human serum albumin have been designed. These artificial proteins, referred to as GA and GB, share 98% amino acid sequence identity but exhibit different 3D structures, namely, a 3α bundle versus a 4β + α structure. Discriminating between their 3D structures based on their amino acid sequences is a very difficult problem. In the present work, in addition to using bioinformatics techniques, an analysis based on inter-residue average distance statistics is used to address this problem. It was hard to distinguish which structure a given sequence would take only with the results of ordinary analyses like BLAST and conservation analyses. However, in addition to these analyses, with the analysis based on the inter-residue average distance statistics and our sequence tendency analysis, we could infer which part would play an important role in its structural formation. The results suggest possible determinants of the different 3D structures for sequences with high sequence identity. The possibility of discriminating between the 3D structures based on the given sequences is also discussed.

  5. Longitudinal Antigenic Sequences and Sites from Intra-Host Evolution (LASSIE) identifies immune-selected HIV variants

    DOE PAGES

    Hraber, Peter; Korber, Bette; Wagh, Kshitij; ...

    2015-10-21

    Within-host genetic sequencing from samples collected over time provides a dynamic view of how viruses evade host immunity. Immune-driven mutations might stimulate neutralization breadth by selecting antibodies adapted to cycles of immune escape that generate within-subject epitope diversity. Comprehensive identification of immune-escape mutations is experimentally and computationally challenging. With current technology, many more viral sequences can readily be obtained than can be tested for binding and neutralization, making down-selection necessary. Typically, this is done manually, by picking variants that represent different time-points and branches on a phylogenetic tree. Such strategies are likely to miss many relevant mutations and combinations ofmore » mutations, and to be redundant for other mutations. Longitudinal Antigenic Sequences and Sites from Intrahost Evolution (LASSIE) uses transmitted founder loss to identify virus “hot-spots” under putative immune selection and chooses sequences that represent recurrent mutations in selected sites. LASSIE favors earliest sequences in which mutations arise. Here, with well-characterized longitudinal Env sequences, we confirmed selected sites were concentrated in antibody contacts and selected sequences represented diverse antigenic phenotypes. Finally, practical applications include rapidly identifying immune targets under selective pressure within a subject, selecting minimal sets of reagents for immunological assays that characterize evolving antibody responses, and for immunogens in polyvalent “cocktail” vaccines.« less

  6. Whole Genome Sequencing Demonstrates Limited Transmission within Identified Mycobacterium tuberculosis Clusters in New South Wales, Australia

    PubMed Central

    Gurjav, Ulziijargal; Outhred, Alexander C.; Jelfs, Peter; McCallum, Nadine; Wang, Qinning; Hill-Cawthorne, Grant A.; Marais, Ben J.; Sintchenko, Vitali

    2016-01-01

    Australia has a low tuberculosis incidence rate with most cases occurring among recent immigrants. Given suboptimal cluster resolution achieved with 24-locus mycobacterium interspersed repetitive unit (MIRU-24) genotyping, the added value of whole genome sequencing was explored. MIRU-24 profiles of all Mycobacterium tuberculosis culture-confirmed tuberculosis cases diagnosed between 2009 and 2013 in New South Wales (NSW), Australia, were examined and clusters identified. The relatedness of cases within the largest MIRU-24 clusters was assessed using whole genome sequencing and phylogenetic analyses. Of 1841 culture-confirmed TB cases, 91.9% (1692/1841) had complete demographic and genotyping data. East-African Indian (474; 28.0%) and Beijing (470; 27.8%) lineage strains predominated. The overall rate of MIRU-24 clustering was 20.1% (340/1692) and was highest among Beijing lineage strains (35.7%; 168/470). One Beijing and three East-African Indian (EAI) clonal complexes were responsible for the majority of observed clusters. Whole genome sequencing of the 4 largest clusters (30 isolates) demonstrated diverse single nucleotide polymorphisms (SNPs) within identified clusters. All sequenced EAI strains and 70% of Beijing lineage strains clustered by MIRU-24 typing demonstrated distinct SNP profiles. The superior resolution provided by whole genome sequencing demonstrated limited M. tuberculosis transmission within NSW, even within identified MIRU-24 clusters. Routine whole genome sequencing could provide valuable public health guidance in low burden settings. PMID:27737005

  7. Amino acid sequence of the smaller basic protein from rat brain myelin

    PubMed Central

    Dunkley, Peter R.; Carnegie, Patrick R.

    1974-01-01

    1. The complete amino acid sequence of the smaller basic protein from rat brain myelin was determined. This protein differs from myelin basic proteins of other species in having a deletion of a polypeptide of 40 amino acid residues from the centre of the molecule. 2. A detailed comparison is made of the constant and variable regions in a group of myelin basic proteins from six species. 3. An arginine residue in the rat protein was found to be partially methylated. The ratio of methylated to unmethylated arginine at this position differed from that found for the human basic protein. 4. Three tryptic peptides were isolated in more than one form. The differences between the two forms of each peptide are discussed in relation to the electrophoretic heterogeneity of myelin basic proteins, which is known to occur at alkaline pH values. 5. Detailed evidence for the amino acid sequence of the protein has been deposited as Supplementary Publication SUP 50029 at the British Library (Lending Division) (formerly the National Lending Library for Science and Technology), Boston Spa, Yorks. LS23 7BQ, U.K., from whom copies may be obtained on the terms given in Biochem. J. (1973) 131, 5. PMID:4141893

  8. Solid phase sequencing of biopolymers

    DOEpatents

    Cantor, Charles; Koster, Hubert

    2010-09-28

    This invention relates to methods for detecting and sequencing target nucleic acid sequences, to mass modified nucleic acid probes and arrays of probes useful in these methods, and to kits and systems which contain these probes. Useful methods involve hybridizing the nucleic acids or nucleic acids which represent complementary or homologous sequences of the target to an array of nucleic acid probes. These probes comprise a single-stranded portion, an optional double-stranded portion and a variable sequence within the single-stranded portion. The molecular weights of the hybridized nucleic acids of the set can be determined by mass spectroscopy, and the sequence of the target determined from the molecular weights of the fragments. Nucleic acids whose sequences can be determined include DNA or RNA in biological samples such as patient biopsies and environmental samples. Probes may be fixed to a solid support such as a hybridization chip to facilitate automated molecular weight analysis and identification of the target sequence.

  9. N-Terminal Amino Acid Sequence Determination of Proteins by N-Terminal Dimethyl Labeling: Pitfalls and Advantages When Compared with Edman Degradation Sequence Analysis.

    PubMed

    Chang, Elizabeth; Pourmal, Sergei; Zhou, Chun; Kumar, Rupesh; Teplova, Marianna; Pavletich, Nikola P; Marians, Kenneth J; Erdjument-Bromage, Hediye

    2016-07-01

    In recent history, alternative approaches to Edman sequencing have been investigated, and to this end, the Association of Biomolecular Resource Facilities (ABRF) Protein Sequencing Research Group (PSRG) initiated studies in 2014 and 2015, looking into bottom-up and top-down N-terminal (Nt) dimethyl derivatization of standard quantities of intact proteins with the aim to determine Nt sequence information. We have expanded this initiative and used low picomole amounts of myoglobin to determine the efficiency of Nt-dimethylation. Application of this approach on protein domains, generated by limited proteolysis of overexpressed proteins, confirms that it is a universal labeling technique and is very sensitive when compared with Edman sequencing. Finally, we compared Edman sequencing and Nt-dimethylation of the same polypeptide fragments; results confirm that there is agreement in the identity of the Nt amino acid sequence between these 2 methods.

  10. Extension of the COG and arCOG databases by amino acid and nucleotide sequences

    PubMed Central

    Meereis, Florian; Kaufmann, Michael

    2008-01-01

    Background The current versions of the COG and arCOG databases, both excellent frameworks for studies in comparative and functional genomics, do not contain the nucleotide sequences corresponding to their protein or protein domain entries. Results Using sequence information obtained from GenBank flat files covering the completely sequenced genomes of the COG and arCOG databases, we constructed NUCOCOG (nucleotide sequences containing COG databases) as an extended version including all nucleotide sequences and in addition the amino acid sequences originally utilized to construct the current COG and arCOG databases. We make available three comprehensive single XML files containing the complete databases including all sequence information. In addition, we provide a web interface as a utility suitable to browse the NUCOCOG database for sequence retrieval. The database is accessible at . Conclusion NUCOCOG offers the possibility to analyze any sequence related property in the context of the COG and arCOG framework simply by using script languages such as PERL applied to a large but single XML document. PMID:19014535

  11. Small RNA deep sequencing identifies novel and salt-stress-regulated microRNAs from roots of Medicago sativa and Medicago truncatula.

    PubMed

    Long, Rui-Cai; Li, Ming-Na; Kang, Jun-Mei; Zhang, Tie-Jun; Sun, Yan; Yang, Qing-Chuan

    2015-05-01

    Small 21- to 24-nucleotide (nt) ribonucleic acids (RNAs), notably the microRNA (miRNA), are emerging as a posttranscriptional regulation mechanism. Salt stress is one of the primary abiotic stresses that cause the crop losses worldwide. In saline lands, root growth and function of plant are determined by the action of environmental salt stress through specific genes that adapt root development to the restrictive condition. To elucidate the role of miRNAs in salt stress regulation in Medicago, we used a high-throughput sequencing approach to analyze four small RNA libraries from roots of Zhongmu-1 (Medicago sativa) and Jemalong A17 (Medicago truncatula), which were treated with 300 mM NaCl for 0 and 8 h. Each library generated about 20 million short sequences and contained predominantly small RNAs of 24-nt length, followed by 21-nt and 22-nt small RNAs. Using sequence analysis, we identified 385 conserved miRNAs from 96 families, along with 68 novel candidate miRNAs. Of all the 68 predicted novel miRNAs, 15 miRNAs were identified to have miRNA*. Statistical analysis on abundance of sequencing read revealed specific miRNA showing contrasting expression patterns between M. sativa and M. truncatula roots, as well as between roots treated for 0 and 8 h. The expression of 10 conserved and novel miRNAs was also quantified by quantitative real-time reverse transcription polymerase chain reaction (qRT-PCR). The miRNA precursor and target genes were predicted by bioinformatics analysis. We concluded that the salt stress related conserved and novel miRNAs may have a large variety of target mRNAs, some of which might play key roles in salt stress regulation of Medicago. © 2014 Scandinavian Plant Physiology Society.

  12. Whole Wiskott‑Aldrich syndrome protein gene deletion identified by high throughput sequencing.

    PubMed

    He, Xiangling; Zou, Runying; Zhang, Bing; You, Yalan; Yang, Yang; Tian, Xin

    2017-11-01

    Wiskott‑Aldrich syndrome (WAS) is a rare X‑linked recessive immunodeficiency disorder, characterized by thrombocytopenia, small platelets, eczema and recurrent infections associated with increased risk of autoimmunity and malignancy disorders. Mutations in the WAS protein (WASP) gene are responsible for WAS. To date, WASP mutations, including missense/nonsense, splicing, small deletions, small insertions, gross deletions, and gross insertions have been identified in patients with WAS. In addition, WASP‑interacting proteins are suspected in patients with clinical features of WAS, in whom the WASP gene sequence and mRNA levels are normal. The present study aimed to investigate the application of next generation sequencing in definitive diagnosis and clinical therapy for WAS. A 5 month‑old child with WAS who displayed symptoms of thrombocytopenia was examined. Whole exome sequence analysis of genomic DNA showed that the coverage and depth of WASP were extremely low. Quantitative polymerase chain reaction indicated total WASP gene deletion in the proband. In conclusion, high throughput sequencing is useful for the verification of WAS on the genetic profile, and has implications for family planning guidance and establishment of clinical programs.

  13. Conversion of amino-acid sequence in proteins to classical music: search for auditory patterns

    PubMed Central

    2007-01-01

    We have converted genome-encoded protein sequences into musical notes to reveal auditory patterns without compromising musicality. We derived a reduced range of 13 base notes by pairing similar amino acids and distinguishing them using variations of three-note chords and codon distribution to dictate rhythm. The conversion will help make genomic coding sequences more approachable for the general public, young children, and vision-impaired scientists. PMID:17477882

  14. Functional brain activation differences in stuttering identified with a rapid fMRI sequence

    PubMed Central

    Kraft, Shelly Jo; Choo, Ai Leen; Sharma, Harish; Ambrose, Nicoline G.

    2011-01-01

    The purpose of this study was to investigate whether brain activity related to the presence of stuttering can be identified with rapid functional MRI (fMRI) sequences that involved overt and covert speech processing tasks. The long-term goal is to develop sensitive fMRI approaches with developmentally appropriate tasks to identify deviant speech motor and auditory brain activity in children who stutter closer to the age at which recovery from stuttering is documented. Rapid sequences may be preferred for individuals or populations who do not tolerate long scanning sessions. In this report, we document the application of a picture naming and phoneme monitoring task in three minute fMRI sequences with adults who stutter (AWS). If relevant brain differences are found in AWS with these approaches that conform to previous reports, then these approaches can be extended to younger populations. Pairwise contrasts of brain BOLD activity between AWS and normally fluent adults indicated the AWS showed higher BOLD activity in the right inferior frontal gyrus (IFG), right temporal lobe and sensorimotor cortices during picture naming and and higher activity in the right IFG during phoneme monitoring. The right lateralized pattern of BOLD activity together with higher activity in sensorimotor cortices is consistent with previous reports, which indicates rapid fMRI sequences can be considered for investigating stuttering in younger participants. PMID:22133409

  15. Exome Capture and Massively Parallel Sequencing Identifies a Novel HPSE2 Mutation in a Saudi Arabian Child with Ochoa (Urofacial) Syndrome

    PubMed Central

    Al Badr, Wisam; Al Bader, Suha; Otto, Edgar; Hildebrandt, Friedhelm; Ackley, Todd; Peng, Weiping; Xu, Jishu; Li, Jun; Owens, Kailey M.; Bloom, David; Innis, Jeffrey W.

    2011-01-01

    We describe a child of Middle Eastern descent by first-cousin mating with idiopathic neurogenic bladder and high grade vesicoureteral reflux at 1 year of age, whose characteristic facial grimace led to the diagnosis of Ochoa (Urofacial) syndrome at age 5 years. We used homozygosity mapping, exome capture and paired end sequencing to identify the disease causing mutation in the proband. We reviewed the literature with respect to the urologic manifestations of Ochoa syndrome. A large region of marker homozygosity was observed at 10q24, consistent with known autosomal recessive inheritance, family consanguinity and previous genetic mapping in other families with Ochoa syndrome. A homozygous mutation was identified in the proband in HPSE2: c.1374_1378delTGTGC, a deletion of 5 nucleotides in exon 10 that is predicted to lead to a frameshift followed by replacement of 132 C-terminal amino acids with 153 novel amino acids (p.Ala458Alafsdel132ins153). This mutation is novel relative to very recently published mutations in HPSE2 in other families. Early intervention and recognition of Ochoa syndrome with control of risk factors and close surveillance will decrease complications and renal failure. PMID:21450525

  16. alpha-Amylase gene of Streptomyces limosus: nucleotide sequence, expression motifs, and amino acid sequence homology to mammalian and invertebrate alpha-amylases.

    PubMed Central

    Long, C M; Virolle, M J; Chang, S Y; Chang, S; Bibb, M J

    1987-01-01

    The nucleotide sequence of the coding and regulatory regions of the alpha-amylase gene (aml) of Streptomyces limosus was determined. High-resolution S1 mapping was used to locate the 5' end of the transcript and demonstrated that the gene is transcribed from a unique promoter. The predicted amino acid sequence has considerable identity to mammalian and invertebrate alpha-amylases, but not to those of plant, fungal, or eubacterial origin. Consistent with this is the susceptibility of the enzyme to an inhibitor of mammalian alpha-amylases. The amino-terminal sequence of the extracellular enzyme was determined, revealing the presence of a typical signal peptide preceding the mature form of the alpha-amylase. Images PMID:3500166

  17. Whole-exome sequencing identifies rare and low-frequency coding variants associated with LDL cholesterol.

    PubMed

    Lange, Leslie A; Hu, Youna; Zhang, He; Xue, Chenyi; Schmidt, Ellen M; Tang, Zheng-Zheng; Bizon, Chris; Lange, Ethan M; Smith, Joshua D; Turner, Emily H; Jun, Goo; Kang, Hyun Min; Peloso, Gina; Auer, Paul; Li, Kuo-Ping; Flannick, Jason; Zhang, Ji; Fuchsberger, Christian; Gaulton, Kyle; Lindgren, Cecilia; Locke, Adam; Manning, Alisa; Sim, Xueling; Rivas, Manuel A; Holmen, Oddgeir L; Gottesman, Omri; Lu, Yingchang; Ruderfer, Douglas; Stahl, Eli A; Duan, Qing; Li, Yun; Durda, Peter; Jiao, Shuo; Isaacs, Aaron; Hofman, Albert; Bis, Joshua C; Correa, Adolfo; Griswold, Michael E; Jakobsdottir, Johanna; Smith, Albert V; Schreiner, Pamela J; Feitosa, Mary F; Zhang, Qunyuan; Huffman, Jennifer E; Crosby, Jacy; Wassel, Christina L; Do, Ron; Franceschini, Nora; Martin, Lisa W; Robinson, Jennifer G; Assimes, Themistocles L; Crosslin, David R; Rosenthal, Elisabeth A; Tsai, Michael; Rieder, Mark J; Farlow, Deborah N; Folsom, Aaron R; Lumley, Thomas; Fox, Ervin R; Carlson, Christopher S; Peters, Ulrike; Jackson, Rebecca D; van Duijn, Cornelia M; Uitterlinden, André G; Levy, Daniel; Rotter, Jerome I; Taylor, Herman A; Gudnason, Vilmundur; Siscovick, David S; Fornage, Myriam; Borecki, Ingrid B; Hayward, Caroline; Rudan, Igor; Chen, Y Eugene; Bottinger, Erwin P; Loos, Ruth J F; Sætrom, Pål; Hveem, Kristian; Boehnke, Michael; Groop, Leif; McCarthy, Mark; Meitinger, Thomas; Ballantyne, Christie M; Gabriel, Stacey B; O'Donnell, Christopher J; Post, Wendy S; North, Kari E; Reiner, Alexander P; Boerwinkle, Eric; Psaty, Bruce M; Altshuler, David; Kathiresan, Sekar; Lin, Dan-Yu; Jarvik, Gail P; Cupples, L Adrienne; Kooperberg, Charles; Wilson, James G; Nickerson, Deborah A; Abecasis, Goncalo R; Rich, Stephen S; Tracy, Russell P; Willer, Cristen J

    2014-02-06

    Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98(th) or <2(nd) percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments. Copyright © 2014 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  18. Accuracy of the high-throughput amplicon sequencing to identify species within the genus Aspergillus.

    PubMed

    Lee, Seungeun; Yamamoto, Naomichi

    2015-12-01

    This study characterized the accuracy of high-throughput amplicon sequencing to identify species within the genus Aspergillus. To this end, we sequenced the internal transcribed spacer 1 (ITS1), β-tubulin (BenA), and calmodulin (CaM) gene encoding sequences as DNA markers from eight reference Aspergillus strains with known identities using 300-bp sequencing on the Illumina MiSeq platform, and compared them with the BLASTn outputs. The identifications with the sequences longer than 250 bp were accurate at the section rank, with some ambiguities observed at the species rank due to mostly cross detection of sibling species. Additionally, in silico analysis was performed to predict the identification accuracy for all species in the genus Aspergillus, where 107, 210, and 187 species were predicted to be identifiable down to the species rank based on ITS1, BenA, and CaM, respectively. Finally, air filter samples were analysed to quantify the relative abundances of Aspergillus species in outdoor air. The results were reproducible across biological duplicates both at the species and section ranks, but not strongly correlated between ITS1 and BenA, suggesting the Aspergillus detection can be taxonomically biased depending on the selection of the DNA markers and/or primers. Copyright © 2015 The British Mycological Society. Published by Elsevier Ltd. All rights reserved.

  19. Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Patel, Kamlesh D.

    2018-01-22

    Kamlesh (Ken) Patel from Sandia National Laboratories (Livermore, California) presents "Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology " at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  20. Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Patel, Kamlesh D.

    2012-06-01

    Kamlesh (Ken) Patel from Sandia National Laboratories (Livermore, California) presents "Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology " at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  1. Partial amino acid sequence of the branched chain amino acid aminotransferase (TmB) of E. coli JA199 pDU11

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Feild, M.J.; Armstrong, F.B.

    1987-05-01

    E. coli JA199 pDU11 harbors a multicopy plasmid containing the ilv GEDAY gene cluster of S. typhimurium. TmB, gene product of ilv E, was purified, crystallized, and subjected to Edman degradation using a gas phase sequencer. The intact protein yielded an amino terminal 31 residue sequence. Both carboxymethylated apoenzyme and (/sup 3/H)-NaBH-reduced holoenzyme were then subjected to digestion by trypsin. The digests were fractionated using reversed phase HPLC, and the peptides isolated were sequenced. The borohydride-treated holoenzyme was used to isolate the cofactor-binding peptide. The peptide is 27 residues long and a comparison with known sequences of other aminotransferases revealedmore » limited homology. Peptides accounting for 211 of 288 predicted residues have been sequenced, including 9 residues of the carboxyl terminus. Comparison of peptides with the inferred amino acid sequence of the E. coli K-12 enzyme has helped determine the sequence of the amino terminal 59 residues; only two differences between the sequences are noted in this region.« less

  2. Somatic mosaicism of a CDKL5 mutation identified by next-generation sequencing.

    PubMed

    Kato, Takeshi; Morisada, Naoya; Nagase, Hiroaki; Nishiyama, Masahiro; Toyoshima, Daisaku; Nakagawa, Taku; Maruyama, Azusa; Fu, Xue Jun; Nozu, Kandai; Wada, Hiroko; Takada, Satoshi; Iijima, Kazumoto

    2015-10-01

    CDKL5-related encephalopathy is an X-linked dominantly inherited disorder that is characterized by early infantile epileptic encephalopathy or atypical Rett syndrome. We describe a 5-year-old Japanese boy with intractable epilepsy, severe developmental delay, and Rett syndrome-like features. Onset was at 2 months, when his electroencephalogram showed sporadic single poly spikes and diffuse irregular poly spikes. We conducted a genetic analysis using an Illumina® TruSight™ One sequencing panel on a next-generation sequencer. We identified two epilepsy-associated single nucleotide variants in our case: CDKL5 p.Ala40Val and KCNQ2 p.Glu515Asp. CDKL5 p.Ala40Val has been previously reported to be responsible for early infantile epileptic encephalopathy. In our case, the CDKL5 heterozygous mutation showed somatic mosaicism because the boy's karyotype was 46,XY. The KCNQ2 variant p.Glu515Asp is known to cause benign familial neonatal seizures-1, and this variant showed paternal inheritance. Although we believe that the somatic mosaic CDKL5 mutation is mainly responsible for the neurological phenotype in the patient, the KCNQ2 variant might have some neurological effect. Genetic analysis by next-generation sequencing is capable of identifying multiple variants in a patient. Copyright © 2015 The Japanese Society of Child Neurology. Published by Elsevier B.V. All rights reserved.

  3. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... base or modified or unusual amino acid may be presented in a given sequence as the corresponding unmodified base or amino acid if the modified base or modified or unusual amino acid is one of those listed... the Feature section. Otherwise, each occurrence of a base or amino acid not appearing in WIPO Standard...

  4. Making sense of deep sequencing

    PubMed Central

    Goldman, D.; Domschke, K.

    2016-01-01

    This review, the first of an occasional series, tries to make sense of the concepts and uses of deep sequencing of polynucleic acids (DNA and RNA). Deep sequencing, synonymous with next-generation sequencing, high-throughput sequencing and massively parallel sequencing, includes whole genome sequencing but is more often and diversely applied to specific parts of the genome captured in different ways, for example the highly expressed portion of the genome known as the exome and portions of the genome that are epigenetically marked either by DNA methylation, the binding of proteins including histones, or that are in different configurations and thus more or less accessible to enzymes that cleave DNA. Deep sequencing of RNA (RNASeq) reverse-transcribed to complementary DNA is invaluable for measuring RNA expression and detecting changes in RNA structure. Important concepts in deep sequencing include the length and depth of sequence reads, mapping and assembly of reads, sequencing error, haplotypes, and the propensity of deep sequencing, as with other types of ‘big data’, to generate large numbers of errors, requiring monitoring for methodologic biases and strategies for replication and validation. Deep sequencing yields a unique genetic fingerprint that can be used to identify a person, and a trove of predictors of genetic medical diseases. Deep sequencing to identify epigenetic events including changes in DNA methylation and RNA expression can reveal the history and impact of environmental exposures. Because of the power of sequencing to identify and deliver biomedically significant information about a person and their blood relatives, it creates ethical dilemmas and practical challenges in research and clinical care, for example the decision and procedures to report incidental findings that will increasingly and frequently be discovered. PMID:24925306

  5. Self-sequencing of amino acids and origins of polyfunctional protocells

    NASA Technical Reports Server (NTRS)

    Fox, S. W.

    1984-01-01

    The role of proteins in the origin of living things is discussed. It has been experimentally established that amino acids can sequence themselves under simulated geological conditions with highly nonrandom products which accordingly contain diverse information. Multiple copies of each type of macromolecule are formed, resulting in greater power for any protoenzymic molecule than would accrue from a single copy of each type. Thermal proteins are readily incorporated into laboratory protocells. The experimental evidence for original polyfunctional protocells is discussed.

  6. Bioinformatic analysis of neurotropic HIV envelope sequences identifies polymorphisms in the gp120 bridging sheet that increase macrophage-tropism through enhanced interactions with CCR5

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mefford, Megan E., E-mail: megan_mefford@hms.harvard.edu; Kunstman, Kevin, E-mail: kunstman@northwestern.edu; Wolinsky, Steven M., E-mail: s-wolinsky@northwestern.edu

    Macrophages express low levels of the CD4 receptor compared to T-cells. Macrophage-tropic HIV strains replicating in brain of untreated patients with HIV-associated dementia (HAD) express Envs that are adapted to overcome this restriction through mechanisms that are poorly understood. Here, bioinformatic analysis of env sequence datasets together with functional studies identified polymorphisms in the β3 strand of the HIV gp120 bridging sheet that increase M-tropism. D197, which results in loss of an N-glycan located near the HIV Env trimer apex, was detected in brain in some HAD patients, while position 200 was estimated to be under positive selection. D197 andmore » T/V200 increased fusion and infection of cells expressing low CD4 by enhancing gp120 binding to CCR5. These results identify polymorphisms in the HIV gp120 bridging sheet that overcome the restriction to macrophage infection imposed by low CD4 through enhanced gp120–CCR5 interactions, thereby promoting infection of brain and other macrophage-rich tissues. - Highlights: • We analyze HIV Env sequences and identify amino acids in beta 3 of the gp120 bridging sheet that enhance macrophage tropism. • These amino acids at positions 197 and 200 are present in brain of some patients with HIV-associated dementia. • D197 results in loss of a glycan near the HIV Env trimer apex, which may increase exposure of V3. • These variants may promote infection of macrophages in the brain by enhancing gp120–CCR5 interactions.« less

  7. Microwave-assisted acid and base hydrolysis of intact proteins containing disulfide bonds for protein sequence analysis by mass spectrometry.

    PubMed

    Reiz, Bela; Li, Liang

    2010-09-01

    Controlled hydrolysis of proteins to generate peptide ladders combined with mass spectrometric analysis of the resultant peptides can be used for protein sequencing. In this paper, two methods of improving the microwave-assisted protein hydrolysis process are described to enable rapid sequencing of proteins containing disulfide bonds and increase sequence coverage, respectively. It was demonstrated that proteins containing disulfide bonds could be sequenced by MS analysis by first performing hydrolysis for less than 2 min, followed by 1 h of reduction to release the peptides originally linked by disulfide bonds. It was shown that a strong base could be used as a catalyst for microwave-assisted protein hydrolysis, producing complementary sequence information to that generated by microwave-assisted acid hydrolysis. However, using either acid or base hydrolysis, amide bond breakages in small regions of the polypeptide chains of the model proteins (e.g., cytochrome c and lysozyme) were not detected. Dynamic light scattering measurement of the proteins solubilized in an acid or base indicated that protein-protein interaction or aggregation was not the cause of the failure to hydrolyze certain amide bonds. It was speculated that there were some unknown local structures that might play a role in preventing an acid or base from reacting with the peptide bonds therein. 2010 American Society for Mass Spectrometry. Published by Elsevier Inc. All rights reserved.

  8. Cloning, sequencing, and expression of the gene coding for bile acid 7 alpha-hydroxysteroid dehydrogenase from Eubacterium sp. strain VPI 12708.

    PubMed Central

    Baron, S F; Franklund, C V; Hylemon, P B

    1991-01-01

    Southern blot analysis indicated that the gene encoding the constitutive, NADP-linked bile acid 7 alpha-hydroxysteroid dehydrogenase of Eubacterium sp. strain VPI 12708 was located on a 6.5-kb EcoRI fragment of the chromosomal DNA. This fragment was cloned into bacteriophage lambda gt11, and a 2.9-kb piece of this insert was subcloned into pUC19, yielding the recombinant plasmid pBH51. DNA sequence analysis of the 7 alpha-hydroxysteroid dehydrogenase gene in pBH51 revealed a 798-bp open reading frame, coding for a protein with a calculated molecular weight of 28,500. A putative promoter sequence and ribosome binding site were identified. The 7 alpha-hydroxysteroid dehydrogenase mRNA transcript in Eubacterium sp. strain VPI 12708 was about 0.94 kb in length, suggesting that it is monocistronic. An Escherichia coli DH5 alpha transformant harboring pBH51 had approximately 30-fold greater levels of 7 alpha-hydroxysteroid dehydrogenase mRNA, immunoreactive protein, and specific activity than Eubacterium sp. strain VPI 12708. The 7 alpha-hydroxysteroid dehydrogenase purified from the pBH51 transformant was similar in subunit molecular weight, specific activity, and kinetic properties to that from Eubacterium sp. strain VPI 12708, and it reached with antiserum raised against the authentic enzyme on Western immunoblots. Alignment of the amino acid sequence of the 7 alpha-hydroxysteroid dehydrogenase with those of 10 other pyridine nucleotide-linked alcohol/polyol dehydrogenases revealed six conserved amino acid residues in the N-terminal regions thought to function in coenzyme binding. Images PMID:1856160

  9. Whole-Exome Sequencing Identifies Novel Variants for Tooth Agenesis.

    PubMed

    Dinckan, N; Du, R; Petty, L E; Coban-Akdemir, Z; Jhangiani, S N; Paine, I; Baugh, E H; Erdem, A P; Kayserili, H; Doddapaneni, H; Hu, J; Muzny, D M; Boerwinkle, E; Gibbs, R A; Lupski, J R; Uyguner, Z O; Below, J E; Letra, A

    2018-01-01

    Tooth agenesis is a common craniofacial abnormality in humans and represents failure to develop 1 or more permanent teeth. Tooth agenesis is complex, and variations in about a dozen genes have been reported as contributing to the etiology. Here, we combined whole-exome sequencing, array-based genotyping, and linkage analysis to identify putative pathogenic variants in candidate disease genes for tooth agenesis in 10 multiplex Turkish families. Novel homozygous and heterozygous variants in LRP6, DKK1, LAMA3, and COL17A1 genes, as well as known variants in WNT10A, were identified as likely pathogenic in isolated tooth agenesis. Novel variants in KREMEN1 were identified as likely pathogenic in 2 families with suspected syndromic tooth agenesis. Variants in more than 1 gene were identified segregating with tooth agenesis in 2 families, suggesting oligogenic inheritance. Structural modeling of missense variants suggests deleterious effects to the encoded proteins. Functional analysis of an indel variant (c.3607+3_6del) in LRP6 suggested that the predicted resulting mRNA is subject to nonsense-mediated decay. Our results support a major role for WNT pathways genes in the etiology of tooth agenesis while revealing new candidate genes. Moreover, oligogenic cosegregation was suggestive for complex inheritance and potentially complex gene product interactions during development, contributing to improved understanding of the genetic etiology of familial tooth agenesis.

  10. Development of a rapid and simple immunochromatographic assay to identify Vibrio parahaemolyticus.

    PubMed

    Sakata, Junko; Kawatsu, Kentaro; Iwasaki, Tadashi; Kumeda, Yuko

    2015-09-01

    To rapidly and simply determine whether or not bacterial colonies growing on agar were Vibrio parahaemolyticus, we developed an immunochromatographic assay (VP-ICA) using two different monoclonal antibodies (designated mAb-VP34 and mAb-VP109) against the delta subunit of V. parahaemolyticus-F0F1 ATP synthase. The epitopes recognized by mAb-VP34 and mAb-VP109 were mapped to sequences of eight ((47)LLTSSFSA(54)) and six amino acid residues ((16)FDFAVD(21)), respectively. An amino acid sequence similarity search of the NCBI database using BLASTP showed that both epitopic amino acid sequences were present together only in V. parahaemolyticus. When 124 V. parahaemolyticus strains and 94 strains of 27 other Vibrio species or 35 non-Vibrio species were tested using the VP-ICA, the VP-ICA identified V. parahaemolyticus with 100% accuracy. The VP-ICA rapidly and simply identified the pathogen directly from a single agar colony within 30 min, indicating that VP-ICA will greatly reduce labor and time required to identify V. parahaemolyticus compared with conventional biochemical tests. Copyright © 2015. Published by Elsevier B.V.

  11. Mapping by sequencing in cotton (Gossypium hirsutum) line MD52ne identified candidate genes for fiber strength and its related quality attributes.

    PubMed

    Islam, Md S; Zeng, Linghe; Thyssen, Gregory N; Delhom, Christopher D; Kim, Hee Jin; Li, Ping; Fang, David D

    2016-06-01

    Three QTL regions controlling three fiber quality traits were validated and further fine-mapped with 27 new single nucleotide polymorphism (SNP) markers. Transcriptome analysis suggests that receptor-like kinases found within the validated QTLs are potential candidate genes responsible for superior fiber strength in cotton line MD52ne. Fiber strength, length, maturity and fineness determine the market value of cotton fibers and the quality of spun yarn. Cotton fiber strength has been recognized as a critical quality attribute in the modern textile industry. Fine mapping along with quantitative trait loci (QTL) validation and candidate gene prediction can uncover the genetic and molecular basis of fiber quality traits. Four previously-identified QTLs (qFBS-c3, qSFI-c14, qUHML-c14 and qUHML-c24) related to fiber bundle strength, short fiber index and fiber length, respectively, were validated using an F3 population that originated from a cross of MD90ne × MD52ne. A group of 27 new SNP markers generated from mapping-by-sequencing (MBS) were placed in QTL regions to improve and validate earlier maps. Our refined QTL regions spanned 4.4, 1.8 and 3.7 Mb of physical distance in the Gossypium raimondii reference genome. We performed RNA sequencing (RNA-seq) of 15 and 20 days post-anthesis fiber cells from MD52ne and MD90ne and aligned reads to the G. raimondii genome. The QTL regions contained 21 significantly differentially expressed genes (DEGs) between the two near-isogenic parental lines. SNPs that result in non-synonymous substitutions to amino acid sequences of annotated genes were identified within these DEGs, and mapped. Taken together, transcriptome and amino acid mutation analysis indicate that receptor-like kinase pathway genes are likely candidates for superior fiber strength and length in MD52ne. MBS along with RNA-seq demonstrated a powerful strategy to elucidate candidate genes for the QTLs that control complex traits in a complex genome like tetraploid

  12. Next-generation sequencing identifies the natural killer cell microRNA transcriptome

    PubMed Central

    Fehniger, Todd A.; Wylie, Todd; Germino, Elizabeth; Leong, Jeffrey W.; Magrini, Vincent J.; Koul, Sunita; Keppel, Catherine R.; Schneider, Stephanie E.; Koboldt, Daniel C.; Sullivan, Ryan P.; Heinz, Michael E.; Crosby, Seth D.; Nagarajan, Rakesh; Ramsingh, Giridharan; Link, Daniel C.; Ley, Timothy J.; Mardis, Elaine R.

    2010-01-01

    Natural killer (NK) cells are innate lymphocytes important for early host defense against infectious pathogens and surveillance against malignant transformation. Resting murine NK cells regulate the translation of effector molecule mRNAs (e.g., granzyme B, GzmB) through unclear molecular mechanisms. MicroRNAs (miRNAs) are small noncoding RNAs that post-transcriptionally regulate the translation of their mRNA targets, and are therefore candidates for mediating this control process. While the expression and importance of miRNAs in T and B lymphocytes have been established, little is known about miRNAs in NK cells. Here, we used two next-generation sequencing (NGS) platforms to define the miRNA transcriptomes of resting and cytokine-activated primary murine NK cells, with confirmation by quantitative real-time PCR (qRT-PCR) and microarrays. We delineate a bioinformatics analysis pipeline that identified 302 known and 21 novel mature miRNAs from sequences obtained from NK cell small RNA libraries. These miRNAs are expressed over a broad range and exhibit isomiR complexity, and a subset is differentially expressed following cytokine activation. Using these miRNA NGS data, miR-223 was identified as a mature miRNA present in resting NK cells with decreased expression following cytokine activation. Furthermore, we demonstrate that miR-223 specifically targets the 3′ untranslated region of murine GzmB in vitro, indicating that this miRNA may contribute to control of GzmB translation in resting NK cells. Thus, the sequenced NK cell miRNA transcriptome provides a valuable framework for further elucidation of miRNA expression and function in NK cell biology. PMID:20935160

  13. Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia

    PubMed Central

    Puente, Xose S.; Pinyol, Magda; Quesada, Víctor; Conde, Laura; Ordóñez, Gonzalo R.; Villamor, Neus; Escaramis, Georgia; Jares, Pedro; Beà, Sílvia; González-Díaz, Marcos; Bassaganyas, Laia; Baumann, Tycho; Juan, Manel; López-Guerra, Mónica; Colomer, Dolors; Tubío, José M. C.; López, Cristina; Navarro, Alba; Tornador, Cristian; Aymerich, Marta; Rozman, María; Hernández, Jesús M.; Puente, Diana A.; Freije, José M. P.; Velasco, Gloria; Gutiérrez-Fernández, Ana; Costa, Dolors; Carrió, Anna; Guijarro, Sara; Enjuanes, Anna; Hernández, Lluís; Yagüe, Jordi; Nicolás, Pilar; Romeo-Casabona, Carlos M.; Himmelbauer, Heinz; Castillo, Ester; Dohm, Juliane C.; de Sanjosé, Silvia; Piris, Miguel A.; de Alava, Enrique; Miguel, Jesús San; Royo, Romina; Gelpí, Josep L.; Torrents, David; Orozco, Modesto; Pisano, David G.; Valencia, Alfonso; Guigó, Roderic; Bayés, Mónica; Heath, Simon; Gut, Marta; Klatt, Peter; Marshall, John; Raine, Keiran; Stebbings, Lucy A.; Futreal, P. Andrew; Stratton, Michael R.; Campbell, Peter J.; Gut, Ivo; López-Guillermo, Armando; Estivill, Xavier; Montserrat, Emili; López-Otín, Carlos; Campo, Elías

    2012-01-01

    Chronic lymphocytic leukaemia (CLL), the most frequent leukaemia in adults in Western countries, is a heterogeneous disease with variable clinical presentation and evolution1,2. Two major molecular subtypes can be distinguished, characterized respectively by a high or low number of somatic hypermutations in the variable region of immunoglobulin genes3,4. The molecular changes leading to the pathogenesis of the disease are still poorly understood. Here we performed whole-genome sequencing of four cases of CLL and identified 46 somatic mutations that potentially affect gene function. Further analysis of these mutations in 363 patients with CLL identified four genes that are recurrently mutated: notch 1 (NOTCH1), exportin 1 (XPO1), myeloid differentiation primary response gene 88 (MYD88) and kelch-like 6 (KLHL6). Mutations in MYD88 and KLHL6 are predominant in cases of CLL with mutated immunoglobulin genes, whereas NOTCH1 and XPO1 mutations are mainly detected in patients with unmutated immunoglobulins. The patterns of somatic mutation, supported by functional and clinical analyses, strongly indicate that the recurrent NOTCH1, MYD88 and XPO1 mutations are oncogenic changes that contribute to the clinical evolution of the disease. To our knowledge, this is the first comprehensive analysis of CLL combining whole-genome sequencing with clinical characteristics and clinical outcomes. It highlights the usefulness of this approach for the identification of clinically relevant mutations in cancer. PMID:21642962

  14. Statistical distribution of amino acid sequences: a proof of Darwinian evolution.

    PubMed

    Eitner, Krystian; Koch, Uwe; Gaweda, Tomasz; Marciniak, Jedrzej

    2010-12-01

    The article presents results of the listing of the quantity of amino acids, dipeptides and tripeptides for all proteins available in the UNIPROT-TREMBL database and the listing for selected species and enzymes. UNIPROT-TREMBL contains protein sequences associated with computationally generated annotations and large-scale functional characterization. Due to the distinct metabolic pathways of amino acid syntheses and their physicochemical properties, the quantities of subpeptides in proteins vary. We have proved that the distribution of amino acids, dipeptides and tripeptides is statistical which confirms that the evolutionary biodiversity development model is subject to the theory of independent events. It seems interesting that certain short peptide combinations occur relatively rarely or even not at all. First, it confirms the Darwinian theory of evolution and second, it opens up opportunities for designing pharmaceuticals among rarely represented short peptide combinations. Furthermore, an innovative approach to the mass analysis of bioinformatic data is presented. eitner@amu.edu.pl Supplementary data are available at Bioinformatics online.

  15. Exome sequencing in amyotrophic lateral sclerosis identifies risk genes and pathways.

    PubMed

    Cirulli, Elizabeth T; Lasseigne, Brittany N; Petrovski, Slavé; Sapp, Peter C; Dion, Patrick A; Leblond, Claire S; Couthouis, Julien; Lu, Yi-Fan; Wang, Quanli; Krueger, Brian J; Ren, Zhong; Keebler, Jonathan; Han, Yujun; Levy, Shawn E; Boone, Braden E; Wimbish, Jack R; Waite, Lindsay L; Jones, Angela L; Carulli, John P; Day-Williams, Aaron G; Staropoli, John F; Xin, Winnie W; Chesi, Alessandra; Raphael, Alya R; McKenna-Yasek, Diane; Cady, Janet; Vianney de Jong, J M B; Kenna, Kevin P; Smith, Bradley N; Topp, Simon; Miller, Jack; Gkazi, Athina; Al-Chalabi, Ammar; van den Berg, Leonard H; Veldink, Jan; Silani, Vincenzo; Ticozzi, Nicola; Shaw, Christopher E; Baloh, Robert H; Appel, Stanley; Simpson, Ericka; Lagier-Tourenne, Clotilde; Pulst, Stefan M; Gibson, Summer; Trojanowski, John Q; Elman, Lauren; McCluskey, Leo; Grossman, Murray; Shneider, Neil A; Chung, Wendy K; Ravits, John M; Glass, Jonathan D; Sims, Katherine B; Van Deerlin, Vivianna M; Maniatis, Tom; Hayes, Sebastian D; Ordureau, Alban; Swarup, Sharan; Landers, John; Baas, Frank; Allen, Andrew S; Bedlack, Richard S; Harper, J Wade; Gitler, Aaron D; Rouleau, Guy A; Brown, Robert; Harms, Matthew B; Cooper, Gregory M; Harris, Tim; Myers, Richard M; Goldstein, David B

    2015-03-27

    Amyotrophic lateral sclerosis (ALS) is a devastating neurological disease with no effective treatment. We report the results of a moderate-scale sequencing study aimed at increasing the number of genes known to contribute to predisposition for ALS. We performed whole-exome sequencing of 2869 ALS patients and 6405 controls. Several known ALS genes were found to be associated, and TBK1 (the gene encoding TANK-binding kinase 1) was identified as an ALS gene. TBK1 is known to bind to and phosphorylate a number of proteins involved in innate immunity and autophagy, including optineurin (OPTN) and p62 (SQSTM1/sequestosome), both of which have also been implicated in ALS. These observations reveal a key role of the autophagic pathway in ALS and suggest specific targets for therapeutic intervention. Copyright © 2015, American Association for the Advancement of Science.

  16. Whole Genome Sequencing of High-Risk Families to Identify New Mutational Mechanisms of Breast Cancer Predisposition

    DTIC Science & Technology

    2014-10-01

    INTRODUCTION: Despite tremendous advances in mutation detection with gene panels and exome sequencing the majority of high risk breast...2a. Align reads to the reference sequence (months 4-10) 2b. Identify SNPs, indels, CNVs and rearrangements by bioinformatic tools (months 4-10) 2c

  17. A Novel Phytase with Sequence Similarity to Purple Acid Phosphatases Is Expressed in Cotyledons of Germinating Soybean Seedlings 1

    PubMed Central

    Hegeman, Carla E.; Grabau, Elizabeth A.

    2001-01-01

    Phytic acid (myo-inositol hexakisphosphate) is the major storage form of phosphorus in plant seeds. During germination, stored reserves are used as a source of nutrients by the plant seedling. Phytic acid is degraded by the activity of phytases to yield inositol and free phosphate. Due to the lack of phytases in the non-ruminant digestive tract, monogastric animals cannot utilize dietary phytic acid and it is excreted into manure. High phytic acid content in manure results in elevated phosphorus levels in soil and water and accompanying environmental concerns. The use of phytases to degrade seed phytic acid has potential for reducing the negative environmental impact of livestock production. A phytase was purified to electrophoretic homogeneity from cotyledons of germinated soybeans (Glycine max L. Merr.). Peptide sequence data generated from the purified enzyme facilitated the cloning of the phytase sequence (GmPhy) employing a polymerase chain reaction strategy. The introduction of GmPhy into soybean tissue culture resulted in increased phytase activity in transformed cells, which confirmed the identity of the phytase gene. It is surprising that the soybean phytase was unrelated to previously characterized microbial or maize (Zea mays) phytases, which were classified as histidine acid phosphatases. The soybean phytase sequence exhibited a high degree of similarity to purple acid phosphatases, a class of metallophosphoesterases. PMID:11500558

  18. The isolation, purification and amino-acid sequence of insulin from the teleost fish Cottus scorpius (daddy sculpin).

    PubMed

    Cutfield, J F; Cutfield, S M; Carne, A; Emdin, S O; Falkmer, S

    1986-07-01

    Insulin from the principal islets of the teleost fish, Cottus scorpius (daddy sculpin), has been isolated and sequenced. Purification involved acid/alcohol extraction, gel filtration, and reverse-phase high-performance liquid chromatography to yield nearly 1 mg pure insulin/g wet weight islet tissue. Biological potency was estimated as 40% compared to porcine insulin. The sculpin insulin crystallised in the absence of zinc ions although zinc is known to be present in the islets in significant amounts. Two other hormones, glucagon and pancreatic polypeptide, were copurified with the insulin, and an N-terminal sequence for pancreatic polypeptide was determined. The primary structure of sculpin insulin shows a number of sequence changes unique so far amongst teleost fish. These changes occur at A14 (Arg), A15 (Val), and B2 (Asp). The B chain contains 29 amino acids and there is no N-terminal extension as seen with several other fish. Presumably as a result of the amino acid substitutions, sculpin insulin does not readily form crystals containing zinc-insulin hexamers, despite the presence of the coordinating B10 His.

  19. Design of nucleic acid sequences for DNA computing based on a thermodynamic approach

    PubMed Central

    Tanaka, Fumiaki; Kameda, Atsushi; Yamamoto, Masahito; Ohuchi, Azuma

    2005-01-01

    We have developed an algorithm for designing multiple sequences of nucleic acids that have a uniform melting temperature between the sequence and its complement and that do not hybridize non-specifically with each other based on the minimum free energy (ΔGmin). Sequences that satisfy these constraints can be utilized in computations, various engineering applications such as microarrays, and nano-fabrications. Our algorithm is a random generate-and-test algorithm: it generates a candidate sequence randomly and tests whether the sequence satisfies the constraints. The novelty of our algorithm is that the filtering method uses a greedy search to calculate ΔGmin. This effectively excludes inappropriate sequences before ΔGmin is calculated, thereby reducing computation time drastically when compared with an algorithm without the filtering. Experimental results in silico showed the superiority of the greedy search over the traditional approach based on the hamming distance. In addition, experimental results in vitro demonstrated that the experimental free energy (ΔGexp) of 126 sequences correlated well with ΔGmin (|R| = 0.90) than with the hamming distance (|R| = 0.80). These results validate the rationality of a thermodynamic approach. We implemented our algorithm in a graphic user interface-based program written in Java. PMID:15701762

  20. Molecular defects identified by whole exome sequencing in a child with Fanconi anemia.

    PubMed

    Zheng, Zhaojing; Geng, Juan; Yao, Ru-En; Li, Caihua; Ying, Daming; Shen, Yongnian; Ying, Lei; Yu, Yongguo; Fu, Qihua

    2013-11-10

    Fanconi anemia is a rare genetic disease characterized by bone marrow failure, multiple congenital malformations, and an increased susceptibility to malignancy. At least 15 genes have been identified that are involved in the pathogenesis of Fanconi anemia. However, it is still a challenge to assign the complementation group and to characterize the molecular defects in patients with Fanconi anemia. In the current study, whole exome sequencing was used to identify the affected gene(s) in a boy with Fanconi anemia. A recurring, non-synonymous mutation was found (c.3971C>T, p.P1324L) as well as a novel frameshift mutation (c.989_995del, p.H330LfsX2) in FANCA gene. Our results indicate that whole exome sequencing may be useful in clinical settings for rapid identification of disease-causing mutations in rare genetic disorders such as Fanconi anemia. © 2013 Elsevier B.V. All rights reserved.

  1. iNR-PhysChem: A Sequence-Based Predictor for Identifying Nuclear Receptors and Their Subfamilies via Physical-Chemical Property Matrix

    PubMed Central

    Xiao, Xuan; Wang, Pu; Chou, Kuo-Chen

    2012-01-01

    Nuclear receptors (NRs) form a family of ligand-activated transcription factors that regulate a wide variety of biological processes, such as homeostasis, reproduction, development, and metabolism. Human genome contains 48 genes encoding NRs. These receptors have become one of the most important targets for therapeutic drug development. According to their different action mechanisms or functions, NRs have been classified into seven subfamilies. With the avalanche of protein sequences generated in the postgenomic age, we are facing the following challenging problems. Given an uncharacterized protein sequence, how can we identify whether it is a nuclear receptor? If it is, what subfamily it belongs to? To address these problems, we developed a predictor called iNR-PhysChem in which the protein samples were expressed by a novel mode of pseudo amino acid composition (PseAAC) whose components were derived from a physical-chemical matrix via a series of auto-covariance and cross-covariance transformations. It was observed that the overall success rate achieved by iNR-PhysChem was over 98% in identifying NRs or non-NRs, and over 92% in identifying NRs among the following seven subfamilies: NR1thyroid hormone like, NR2HNF4-like, NR3estrogen like, NR4nerve growth factor IB-like, NR5fushi tarazu-F1 like, NR6germ cell nuclear factor like, and NR0knirps like. These rates were derived by the jackknife tests on a stringent benchmark dataset in which none of protein sequences included has pairwise sequence identity to any other in a same subset. As a user-friendly web-server, iNR-PhysChem is freely accessible to the public at either http://www.jci-bioinfo.cn/iNR-PhysChem or http://icpr.jci.edu.cn/bioinfo/iNR-PhysChem. Also a step-by-step guide is provided on how to use the web-server to get the desired results without the need to follow the complicated mathematics involved in developing the predictor. It is anticipated that iNR-PhysChem may become a useful high throughput tool

  2. Nucleotide sequence of the gene encoding the nitrogenase iron protein of Thiobacillus ferrooxidans

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pretorius, I.M.; Rawlings, D.E.; O'Neill, E.G.

    1987-01-01

    The DNA sequence was determined for the cloned Thiobacillus ferrooxidans nifH and part of the nifD genes. The DNA chains were radiolabeled with (..cap alpha..-/sup 32/P)dCTP (3000 Ci/mmol) or (..cap alpha..-/sup 35/S)dCTP (400 Ci/mmol). A putative T. ferrooxidans nifH promoter was identified whose sequences showed perfect consensus with those of the Klebsiella pneumoniae nif promoter. Two putative consensus upstream activator sequences were also identified. The amino acid sequence was deduced from the DNA sequence. In a comparison of nifH DNA sequences from T. ferrooxidans and eight other nitrogen-fixing microbes, a Rhizobium sp. isolated from Parasponia andersonii showed the greatest homologymore » (74%) and Clostridium pasteurianum (nifH1) showed the least homology (54%). In the comparison of the amino acid sequences of the Fe proteins, the Rhizobium sp. and Rhizobium japonicum showed the greatest homology (both 86%) and C. pasteurianum (nifH1 gene product) demonstrated the least homology (56%) to the T. ferrooxidans Fe protein.« less

  3. Single-Exome sequencing identified a novel RP2 mutation in a child with X-linked retinitis pigmentosa.

    PubMed

    Lim, Hassol; Park, Young-Mi; Lee, Jong-Keuk; Taek Lim, Hyun

    2016-10-01

    To present an efficient and successful application of a single-exome sequencing study in a family clinically diagnosed with X-linked retinitis pigmentosa. Exome sequencing study based on clinical examination data. An 8-year-old proband and his family. The proband and his family members underwent comprehensive ophthalmologic examinations. Exome sequencing was undertaken in the proband using Agilent SureSelect Human All Exon Kit and Illumina HiSeq 2000 platform. Bioinformatic analysis used Illumina pipeline with Burrows-Wheeler Aligner-Genome Analysis Toolkit (BWA-GATK), followed by ANNOVAR to perform variant functional annotation. All variants passing filter criteria were validated by Sanger sequencing to confirm familial segregation. Analysis of exome sequence data identified a novel frameshift mutation in RP2 gene resulting in a premature stop codon (c.665delC, p.Pro222fsTer237). Sanger sequencing revealed this mutation co-segregated with the disease phenotype in the child's family. We identified a novel causative mutation in RP2 from a single proband's exome sequence data analysis. This study highlights the effectiveness of the whole-exome sequencing in the genetic diagnosis of X-linked retinitis pigmentosa, over the conventional sequencing methods. Even using a single exome, exome sequencing technology would be able to pinpoint pathogenic variant(s) for X-linked retinitis pigmentosa, when properly applied with aid of adequate variant filtering strategy. Copyright © 2016 Canadian Ophthalmological Society. Published by Elsevier Inc. All rights reserved.

  4. cis-β-Bromostyrene derivatives from cinnamic acids via a tandem substitutive bromination-decarboxylation sequence.

    PubMed

    Tang, Khanh G; Kent, Greggory T; Erden, Ihsan; Wu, Weiming

    2017-10-04

    cis -β-Bromostyrene derivatives were synthesized stereospecifically from cinnamic acids through β-lactone intermediates. The synthetic sequence did not require the purification of the β-lactone intermediates although they were found to be stable and readily purified in most cases.

  5. Exome Sequencing Identifies Three Novel Candidate Genes Implicated in Intellectual Disability

    PubMed Central

    Azam, Maleeha; Ayub, Humaira; Vissers, Lisenka E. L. M.; Gilissen, Christian; Ali, Syeda Hafiza Benish; Riaz, Moeen; Veltman, Joris A.; Pfundt, Rolph; van Bokhoven, Hans; Qamar, Raheel

    2014-01-01

    Intellectual disability (ID) is a major health problem mostly with an unknown etiology. Recently exome sequencing of individuals with ID identified novel genes implicated in the disease. Therefore the purpose of the present study was to identify the genetic cause of ID in one syndromic and two non-syndromic Pakistani families. Whole exome of three ID probands was sequenced. Missense variations in two plausible novel genes implicated in autosomal recessive ID were identified: lysine (K)-specific methyltransferase 2B (KMT2B), zinc finger protein 589 (ZNF589), as well as hedgehog acyltransferase (HHAT) with a de novo mutation with autosomal dominant mode of inheritance. The KMT2B recessive variant is the first report of recessive Kleefstra syndrome-like phenotype. Identification of plausible causative mutations for two recessive and a dominant type of ID, in genes not previously implicated in disease, underscores the large genetic heterogeneity of ID. These results also support the viewpoint that large number of ID genes converge on limited number of common networks i.e. ZNF589 belongs to KRAB-domain zinc-finger proteins previously implicated in ID, HHAT is predicted to affect sonic hedgehog, which is involved in several disorders with ID, KMT2B associated with syndromic ID fits the epigenetic module underlying the Kleefstra syndromic spectrum. The association of these novel genes in three different Pakistani ID families highlights the importance of screening these genes in more families with similar phenotypes from different populations to confirm the involvement of these genes in pathogenesis of ID. PMID:25405613

  6. Variation of amino acid sequences of serum amyloid a (SAA) and immunohistochemical analysis of amyloid a (AA) in Japanese domestic cats.

    PubMed

    Tei, Meina; Uchida, Kazuyuki; Chambers, James K; Watanabe, Ken-Ichi; Tamamoto, Takashi; Ohno, Koichi; Nakayama, Hiroyuki

    2018-02-02

    Amyloid A (AA) amyloidosis, a fatal systemic amyloid disease, occurs secondary to chronic inflammatory conditions in humans. Although persistently elevated serum amyloid A (SAA) levels are required for its pathogenesis, not all individuals with chronic inflammation necessarily develop AA amyloidosis. Furthermore, many diseases in cats are associated with the elevated production of SAA, whereas only a small number actually develop AA amyloidosis. We hypothesized that a genetic mutation in the SAA gene may strongly contribute to the pathogenesis of feline AA amyloidosis. In the present study, genomic DNA from four Japanese domestic cats (JDCs) with AA amyloidosis and from five without amyloidosis was analyzed using polymerase chain reaction (PCR) amplification and direct sequencing. We identified the novel variation combination of 45R-51A in the deduced amino acid sequences of four JDCs with amyloidosis and five without. However, there was no relationship between amino acid variations and the distribution of AA amyloid deposits, indicating that differences in SAA sequences do not contribute to the pathogenesis of AA amyloidosis. Immunohistochemical analysis using antisera against the three different parts of the feline SAA protein-i.e., the N-terminal, central, and C-terminal regions-revealed that feline AA contained the C-terminus, unlike human AA. These results indicate that the cleavage and degradation of the C-terminus are not essential for amyloid fibril formation in JDCs.

  7. Repeat sequence chromosome specific nucleic acid probes and methods of preparing and using

    DOEpatents

    Weier, H.U.G.; Gray, J.W.

    1995-06-27

    A primer directed DNA amplification method to isolate efficiently chromosome-specific repeated DNA wherein degenerate oligonucleotide primers are used is disclosed. The probes produced are a heterogeneous mixture that can be used with blocking DNA as a chromosome-specific staining reagent, and/or the elements of the mixture can be screened for high specificity, size and/or high degree of repetition among other parameters. The degenerate primers are sets of primers that vary in sequence but are substantially complementary to highly repeated nucleic acid sequences, preferably clustered within the template DNA, for example, pericentromeric alpha satellite repeat sequences. The template DNA is preferably chromosome-specific. Exemplary primers and probes are disclosed. The probes of this invention can be used to determine the number of chromosomes of a specific type in metaphase spreads, in germ line and/or somatic cell interphase nuclei, micronuclei and/or in tissue sections. Also provided is a method to select arbitrarily repeat sequence probes that can be screened for chromosome-specificity. 18 figs.

  8. Repeat sequence chromosome specific nucleic acid probes and methods of preparing and using

    DOEpatents

    Weier, Heinz-Ulrich G.; Gray, Joe W.

    1995-01-01

    A primer directed DNA amplification method to isolate efficiently chromosome-specific repeated DNA wherein degenerate oligonucleotide primers are used is disclosed. The probes produced are a heterogeneous mixture that can be used with blocking DNA as a chromosome-specific staining reagent, and/or the elements of the mixture can be screened for high specificity, size and/or high degree of repetition among other parameters. The degenerate primers are sets of primers that vary in sequence but are substantially complementary to highly repeated nucleic acid sequences, preferably clustered within the template DNA, for example, pericentromeric alpha satellite repeat sequences. The template DNA is preferably chromosome-specific. Exemplary primers ard probes are disclosed. The probes of this invention can be used to determine the number of chromosomes of a specific type in metaphase spreads, in germ line and/or somatic cell interphase nuclei, micronuclei and/or in tissue sections. Also provided is a method to select arbitrarily repeat sequence probes that can be screened for chromosome-specificity.

  9. Analysis of sequence diversity through internal transcribed spacers and simple sequence repeats to identify Dendrobium species.

    PubMed

    Liu, Y T; Chen, R K; Lin, S J; Chen, Y C; Chin, S W; Chen, F C; Lee, C Y

    2014-04-08

    The Orchidaceae is one of the largest and most diverse families of flowering plants. The Dendrobium genus has high economic potential as ornamental plants and for medicinal purposes. In addition, the species of this genus are able to produce large crops. However, many Dendrobium varieties are very similar in outward appearance, making it difficult to distinguish one species from another. This study demonstrated that the 12 Dendrobium species used in this study may be divided into 2 groups by internal transcribed spacer (ITS) sequence analysis. Red and yellow flowers may also be used to separate these species into 2 main groups. In particular, the deciduous characteristic is associated with the ITS genetic diversity of the A group. Of 53 designed simple sequence repeat (SSR) primer pairs, 7 pairs were polymorphic for polymerase chain reaction products that were amplified from a specific band. The results of this study demonstrate that these 7 SSR primer pairs may potentially be used to identify Dendrobium species and their progeny in future studies.

  10. DNA Cloning of Plasmodium falciparum Circumsporozoite Gene: Amino Acid Sequence of Repetitive Epitope

    NASA Astrophysics Data System (ADS)

    Enea, Vincenzo; Ellis, Joan; Zavala, Fidel; Arnot, David E.; Asavanich, Achara; Masuda, Aoi; Quakyi, Isabella; Nussenzweig, Ruth S.

    1984-08-01

    A clone of complementary DNA encoding the circumsporozoite (CS) protein of the human malaria parasite Plasmodium falciparum has been isolated by screening an Escherichia coli complementary DNA library with a monoclonal antibody to the CS protein. The DNA sequence of the complementary DNA insert encodes a four-amino acid sequence: proline-asparagine-alanine-asparagine, tandemly repeated 23 times. The CS β -lactamase fusion protein specifically binds monoclonal antibodies to the CS protein and inhibits the binding of these antibodies to native Plasmodium falciparum CS protein. These findings provide a basis for the development of a vaccine against Plasmodium falciparum malaria.

  11. Hydroquinone: O-glucosyltransferase from cultivated Rauvolfia cells: enrichment and partial amino acid sequences.

    PubMed

    Arend, J; Warzecha, H; Stöckigt, J

    2000-01-01

    Plant cell suspension cultures of Rauvolfia are able to produce a high amount of arbutin by glucosylation of exogenously added hydroquinone. A four step purification procedure using anion exchange, hydrophobic interaction, hydroxyapatite-chromatography and chromatofocusing delivered in a yield of 0.5%, an approximately 390 fold enrichment of the involved glucosyltransferase. SDS-PAGE showed a M(r) for the enzyme of 52 kDa. Proteolysis of the pure enzyme with endoproteinase LysC revealed six peptide fragments with 9-23 amino acids which were sequenced. Sequence alignment of the six peptides showed high homologies to glycosyltransferases from other higher plants.

  12. Partial amino-acid sequence of the precursor of an immunoglobulin light chain containing NH2-terminal pyroglutamic acid.

    PubMed Central

    Burstein, Y; Kantour, F; Schechter, I

    1976-01-01

    Analyses of amino-acid sequences of the total cell-free products programmed by the mRNA of MOPC-104E gamma light (L)-chain show that over 95% of the products have sequences of a distinct protein that correspond to the L-chain precursor. In this precursor an extra piece is coupled to the NH2-terminus of the mature L-chain. Analyses of products labeled with [3H]alanine, [3H]leucine, and [3H]proline demonstrate that the extra piece is composed of at least 18 residues. Analyses of [35S]methione-labeled product indicate that the extra piece may contain an additional NH2-terminal methionine, which is detected in about 10% of the molecules. Partial recovery of the NJ2-terminal methionine (alanine, leucine, and proline are recovered in yields close to theoretical, greater than 95%) suggests that it is the initiator methionine, which is known to be short lived in eukaryotes due to rapid hydrolysis. Thus, the extra piece seems to be 19 residues in length, and it contains one methionine at the NH2-terminus, three alanines at positions 2, 12, and 17, and five leucines at positions 6, 8, 10, 11, and 13. The close gathering of leucine residues, as well as their abundance (26%), suggest that the extra piece would be quite hydrophobic. Hydrophobicity seems to be a general property of the extra piece, since similar clusters of leucine were found in the precursors of 3 KL-chains (Burstein, Y. & Schechter, I. (1976) Biochem. J. 157, 145-151). The NH2-terminus of the mature MOPC-104E gamma L-chain is blocked by pyroglutamic acid. The fact that in the precursor a peptide segment precedes this NH2-terminus establishes that pyroglutamic acid is not the initiator residue for synthesis of the L-chain. Apparently, the pyroglutamic acid is formed by cyclization of glutamic acid or glutamine during cleavage of the extra piece to yield the mature L-chain. Images PMID:822420

  13. Evolution of sequence-defined highly functionalized nucleic acid polymers

    NASA Astrophysics Data System (ADS)

    Chen, Zhen; Lichtor, Phillip A.; Berliner, Adrian P.; Chen, Jonathan C.; Liu, David R.

    2018-03-01

    The evolution of sequence-defined synthetic polymers made of building blocks beyond those compatible with polymerase enzymes or the ribosome has the potential to generate new classes of receptors, catalysts and materials. Here we describe a ligase-mediated DNA-templated polymerization and in vitro selection system to evolve highly functionalized nucleic acid polymers (HFNAPs) made from 32 building blocks that contain eight chemically diverse side chains on a DNA backbone. Through iterated cycles of polymer translation, selection and reverse translation, we discovered HFNAPs that bind proprotein convertase subtilisin/kexin type 9 (PCSK9) and interleukin-6, two protein targets implicated in human diseases. Mutation and reselection of an active PCSK9-binding polymer yielded evolved polymers with high affinity (KD = 3 nM). This evolved polymer potently inhibited the binding between PCSK9 and the low-density lipoprotein receptor. Structure-activity relationship studies revealed that specific side chains at defined positions in the polymers are required for binding to their respective targets. Our findings expand the chemical space of evolvable polymers to include densely functionalized nucleic acids with diverse, researcher-defined chemical repertoires.

  14. New mutations in non-syndromic primary ovarian insufficiency patients identified via whole-exome sequencing.

    PubMed

    Patiño, Liliana Catherine; Beau, Isabelle; Carlosama, Carolina; Buitrago, July Constanza; González, Ronald; Suárez, Carlos Fernando; Patarroyo, Manuel Alfonso; Delemer, Brigitte; Young, Jacques; Binart, Nadine; Laissue, Paul

    2017-07-01

    Is it possible to identify new mutations potentially associated with non-syndromic primary ovarian insufficiency (POI) via whole-exome sequencing (WES)? WES is an efficient tool to study genetic causes of POI as we have identified new mutations, some of which lead to protein destablization potentially contributing to the disease etiology. POI is a frequently occurring complex pathology leading to infertility. Mutations in only few candidate genes, mainly identified by Sanger sequencing, have been definitively related to the pathogenesis of the disease. This is a retrospective cohort study performed on 69 women affected by POI. WES and an innovative bioinformatics analysis were used on non-synonymous sequence variants in a subset of 420 selected POI candidate genes. Mutations in BMPR1B and GREM1 were modeled by using fragment molecular orbital analysis. Fifty-five coding variants in 49 genes potentially related to POI were identified in 33 out of 69 patients (48%). These genes participate in key biological processes in the ovary, such as meiosis, follicular development, granulosa cell differentiation/proliferation and ovulation. The presence of at least two mutations in distinct genes in 42% of the patients argued in favor of a polygenic nature of POI. It is possible that regulatory regions, not analyzed in the present study, carry further variants related to POI. WES and the in silico analyses presented here represent an efficient approach for mapping variants associated with POI etiology. Sequence variants presented here represents potential future genetic biomarkers. This study was supported by the Universidad del Rosario and Colciencias (Grants CS/CIGGUR-ABN062-2016 and 672-2014). Colciencias supported Liliana Catherine Patiño´s work (Fellowship: 617, 2013). The authors declare no conflict of interest. © The Author 2017. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved. For

  15. Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences.

    PubMed

    Chen, Peng; Li, Jinyan; Wong, Limsoon; Kuwahara, Hiroyuki; Huang, Jianhua Z; Gao, Xin

    2013-08-01

    Hot spot residues of proteins are fundamental interface residues that help proteins perform their functions. Detecting hot spots by experimental methods is costly and time-consuming. Sequential and structural information has been widely used in the computational prediction of hot spots. However, structural information is not always available. In this article, we investigated the problem of identifying hot spots using only physicochemical characteristics extracted from amino acid sequences. We first extracted 132 relatively independent physicochemical features from a set of the 544 properties in AAindex1, an amino acid index database. Each feature was utilized to train a classification model with a novel encoding schema for hot spot prediction by the IBk algorithm, an extension of the K-nearest neighbor algorithm. The combinations of the individual classifiers were explored and the classifiers that appeared frequently in the top performing combinations were selected. The hot spot predictor was built based on an ensemble of these classifiers and to work in a voting manner. Experimental results demonstrated that our method effectively exploited the feature space and allowed flexible weights of features for different queries. On the commonly used hot spot benchmark sets, our method significantly outperformed other machine learning algorithms and state-of-the-art hot spot predictors. The program is available at http://sfb.kaust.edu.sa/pages/software.aspx. Copyright © 2013 Wiley Periodicals, Inc.

  16. A Novel Prosthetic Joint Infection Pathogen, Mycoplasma salivarium, Identified by Metagenomic Shotgun Sequencing.

    PubMed

    Thoendel, Matthew; Jeraldo, Patricio; Greenwood-Quaintance, Kerryl E; Chia, Nicholas; Abdel, Matthew P; Steckelberg, James M; Osmon, Douglas R; Patel, Robin

    2017-07-15

    Defining the microbial etiology of culture-negative prosthetic joint infection (PJI) can be challenging. Metagenomic shotgun sequencing is a new tool to identify organisms undetected by conventional methods. We present a case where metagenomics was used to identify Mycoplasma salivarium as a novel PJI pathogen in a patient with hypogammaglobulinemia. © The Author 2017. Published by Oxford University Press for the Infectious Diseases Society of America. All rights reserved. For permissions, e-mail: journals.permissions@oup.com.

  17. An Alignment-Free Algorithm in Comparing the Similarity of Protein Sequences Based on Pseudo-Markov Transition Probabilities among Amino Acids

    PubMed Central

    Li, Yushuang; Yang, Jiasheng; Zhang, Yi

    2016-01-01

    In this paper, we have proposed a novel alignment-free method for comparing the similarity of protein sequences. We first encode a protein sequence into a 440 dimensional feature vector consisting of a 400 dimensional Pseudo-Markov transition probability vector among the 20 amino acids, a 20 dimensional content ratio vector, and a 20 dimensional position ratio vector of the amino acids in the sequence. By evaluating the Euclidean distances among the representing vectors, we compare the similarity of protein sequences. We then apply this method into the ND5 dataset consisting of the ND5 protein sequences of 9 species, and the F10 and G11 datasets representing two of the xylanases containing glycoside hydrolase families, i.e., families 10 and 11. As a result, our method achieves a correlation coefficient of 0.962 with the canonical protein sequence aligner ClustalW in the ND5 dataset, much higher than those of other 5 popular alignment-free methods. In addition, we successfully separate the xylanases sequences in the F10 family and the G11 family and illustrate that the F10 family is more heat stable than the G11 family, consistent with a few previous studies. Moreover, we prove mathematically an identity equation involving the Pseudo-Markov transition probability vector and the amino acids content ratio vector. PMID:27918587

  18. Amino acid sequence of human cholinesterase. Annual report, 30 September 1984-30 September 1985

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lockridge, O.

    1985-10-01

    The active-site serine residue is located 198 amino acids from the N-terminal. The active-site peptide was isolated from three different genetic types of human serum cholinesterase: from usual, atypical, and atypical-silent genotypes. It was found that the amino acid sequence of the active-site peptide was identical in all three genotypes. Comparison of the complete sequences of cholinesterase from human serum and acetylcholinesterase from the electric organ of Torpedo californica shows an identity of 53%. Cholinesterase is of interest to the Department of Defense because cholinesterase protects against organophosphate poisons of the type used in chemical warfare. The structural results presentedmore » here will serve as the basis for cloning the gene for cholinesterase. The potential uses of large amounts of cholinesterase would be for cleaning up spills of organophosphates and possibly for detoxifying exposed personnel.« less

  19. The sequence of sequencers: The history of sequencing DNA

    PubMed Central

    Heather, James M.; Chain, Benjamin

    2016-01-01

    Determining the order of nucleic acid residues in biological samples is an integral component of a wide variety of research applications. Over the last fifty years large numbers of researchers have applied themselves to the production of techniques and technologies to facilitate this feat, sequencing DNA and RNA molecules. This time-scale has witnessed tremendous changes, moving from sequencing short oligonucleotides to millions of bases, from struggling towards the deduction of the coding sequence of a single gene to rapid and widely available whole genome sequencing. This article traverses those years, iterating through the different generations of sequencing technology, highlighting some of the key discoveries, researchers, and sequences along the way. PMID:26554401

  20. Two-level QSAR network (2L-QSAR) for peptide inhibitor design based on amino acid properties and sequence positions.

    PubMed

    Du, Q S; Ma, Y; Xie, N Z; Huang, R B

    2014-01-01

    In the design of peptide inhibitors the huge possible variety of the peptide sequences is of high concern. In collaboration with the fast accumulation of the peptide experimental data and database, a statistical method is suggested for peptide inhibitor design. In the two-level peptide prediction network (2L-QSAR) one level is the physicochemical properties of amino acids and the other level is the peptide sequence position. The activity contributions of amino acids are the functions of physicochemical properties and the sequence positions. In the prediction equation two weight coefficient sets {ak} and {bl} are assigned to the physicochemical properties and to the sequence positions, respectively. After the two coefficient sets are optimized based on the experimental data of known peptide inhibitors using the iterative double least square (IDLS) procedure, the coefficients are used to evaluate the bioactivities of new designed peptide inhibitors. The two-level prediction network can be applied to the peptide inhibitor design that may aim for different target proteins, or different positions of a protein. A notable advantage of the two-level statistical algorithm is that there is no need for host protein structural information. It may also provide useful insight into the amino acid properties and the roles of sequence positions.

  1. Amino-acid sequence and predicted three-dimensional structure of pea seed (Pisum sativum) ferritin.

    PubMed Central

    Lobreaux, S; Yewdall, S J; Briat, J F; Harrison, P M

    1992-01-01

    The iron storage protein, ferritin, is widely distributed in the living kingdom. Here the complete cDNA and derived amino-acid sequence of pea seed ferritin are described, together with its predicted secondary structure, namely a four-helix-bundle fold similar to those of mammalian ferritins, with a fifth short helix at the C-terminus. An N-terminal extension of 71 residues contains a transit peptide (first 47 residues) responsible for plastid targetting as in other plant ferritins, and this is cleaved before assembly. The second part of the extension (24 residues) belongs to the mature subunit; it is cleaved during germination. The amino-acid sequence of pea seed ferritin is aligned with those of other ferritins (49% amino-acid identity with H-chains and 40% with L-chains of human liver ferritin in the aligned region). A three-dimensional model has been constructed by fitting the aligned sequence to the coordinates of human H-chains, with appropriate modifications. A folded conformation with an 11-residue helix is predicted for the N-terminal extension. As in mammalian ferritins, 24 subunits assemble into a hollow shell. In pea seed ferritin, its N-terminal extension is exposed on the outside surface of the shell. Within each pea subunit is a ferroxidase centre resembling those of human ferritin H-chains except for a replacement of Glu-62 by His. The channel at the 4-fold-symmetry axes defined by E-helices, is predicted to be hydrophilic in plant ferritins, whereas it is hydrophobic in mammalian ferritins. Images Fig. 3. Fig. 5. Fig. 6. PMID:1472006

  2. Identifying the Critical Time Period for Information Extraction when Recognizing Sequences of Play

    ERIC Educational Resources Information Center

    North, Jamie S.; Williams, A. Mark

    2008-01-01

    The authors attempted to determine the critical time period for information extraction when recognizing play sequences in soccer. Although efforts have been made to identify the perceptual information underpinning such decisions, no researchers have attempted to determine "when" this information may be extracted from the display. The authors…

  3. PACCMIT/PACCMIT-CDS: identifying microRNA targets in 3' UTRs and coding sequences.

    PubMed

    Šulc, Miroslav; Marín, Ray M; Robins, Harlan S; Vaníček, Jiří

    2015-07-01

    The purpose of the proposed web server, publicly available at http://paccmit.epfl.ch, is to provide a user-friendly interface to two algorithms for predicting messenger RNA (mRNA) molecules regulated by microRNAs: (i) PACCMIT (Prediction of ACcessible and/or Conserved MIcroRNA Targets), which identifies primarily mRNA transcripts targeted in their 3' untranslated regions (3' UTRs), and (ii) PACCMIT-CDS, designed to find mRNAs targeted within their coding sequences (CDSs). While PACCMIT belongs among the accurate algorithms for predicting conserved microRNA targets in the 3' UTRs, the main contribution of the web server is 2-fold: PACCMIT provides an accurate tool for predicting targets also of weakly conserved or non-conserved microRNAs, whereas PACCMIT-CDS addresses the lack of similar portals adapted specifically for targets in CDS. The web server asks the user for microRNAs and mRNAs to be analyzed, accesses the precomputed P-values for all microRNA-mRNA pairs from a database for all mRNAs and microRNAs in a given species, ranks the predicted microRNA-mRNA pairs, evaluates their significance according to the false discovery rate and finally displays the predictions in a tabular form. The results are also available for download in several standard formats. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. Isolation and characterization of the promoter sequence of a cassava gene coding for Pt2L4, a glutamic acid-rich protein differentially expressed in storage roots.

    PubMed

    de Souza, C R; Aragão, F J; Moreira, E C O; Costa, C N M; Nascimento, S B; Carvalho, L J

    2009-03-24

    Cassava is one of the most important tropical food crops for more than 600 million people worldwide. Transgenic technologies can be useful for increasing its nutritional value and its resistance to viral diseases and insect pests. However, tissue-specific promoters that guarantee correct expression of transgenes would be necessary. We used inverse polymerase chain reaction to isolate a promoter sequence of the Mec1 gene coding for Pt2L4, a glutamic acid-rich protein differentially expressed in cassava storage roots. In silico analysis revealed putative cis-acting regulatory elements within this promoter sequence, including root-specific elements that may be required for its expression in vascular tissues. Transient expression experiments showed that the Mec1 promoter is functional, since this sequence was able to drive GUS expression in bean embryonic axes. Results from our computational analysis can serve as a guide for functional experiments to identify regions with tissue-specific Mec1 promoter activity. The DNA sequence that we identified is a new promoter that could be a candidate for genetic engineering of cassava roots.

  5. Exome Sequencing Identifies Potential Risk Variants for Mendelian Disorders at High Prevalence in Qatar

    PubMed Central

    Rodriguez-Flores, Juan L.; Fakhro, Khalid; Hackett, Neil R.; Salit, Jacqueline; Fuller, Jennifer; Agosto-Perez, Francisco; Gharbiah, Maey; Malek, Joel A.; Zirie, Mahmoud; Jayyousi, Amin; Badii, Ramin; Al-Marri, Ajayeb Al-Nabet; Chouchane, Lotfi; Stadler, Dora J.; Hunter-Zinck, Haley; Mezey, Jason G.; Crystal, Ronald G.

    2013-01-01

    Exome sequencing of families of related individuals has been highly successful in identifying genetic polymorphisms responsible for Mendelian disorders. Here, we demonstrate the value of the reverse approach, where we use exome sequencing of a sample of unrelated individuals to analyze allele frequencies of known causal mutations for Mendelian diseases. We sequenced the exomes of 100 individuals representing the three major genetic subgroups of the Qatari population (Q1 Bedouin, Q2 Persian-South Asian, Q3 African) and identified 37 variants in 33 genes with effects on 36 clinically significant Mendelian diseases. These include variants not present in 1000 Genomes and variants at high frequency when compared to 1000 Genomes populations. Several of these Mendelian variants were only segregating in one Qatari subpopulation, where the observed subpopulation specificity trends were confirmed in an independent population of 386 Qataris. Pre-marital genetic screening in Qatar tests for only 4 out of the 37, such that this study provides a set of Mendelian disease variants with potential impact on the epidemiological profile of the population that could be incorporated into the testing program if further experimental and clinical characterization confirms high penetrance. PMID:24123366

  6. Exome sequencing identifies CTSK mutations in patients originally diagnosed as intermediate osteopetrosis☆

    PubMed Central

    Pangrazio, Alessandra; Puddu, Alessandro; Oppo, Manuela; Valentini, Maria; Zammataro, Luca; Vellodi, Ashok; Gener, Blanca; Llano-Rivas, Isabel; Raza, Jamal; Atta, Irum; Vezzoni, Paolo; Superti-Furga, Andrea; Villa, Anna; Sobacchi, Cristina

    2014-01-01

    Autosomal Recessive Osteopetrosis is a genetic disorder characterized by increased bone density due to lack of resorption by the osteoclasts. Genetic studies have widely unraveled the molecular basis of the most severe forms, while cases of intermediate severity are more difficult to characterize, probably because of a large heterogeneity. Here, we describe the use of exome sequencing in the molecular diagnosis of 2 siblings initially thought to be affected by “intermediate osteopetrosis”, which identified a homozygous mutation in the CTSK gene. Prompted by this finding, we tested by Sanger sequencing 25 additional patients addressed to us for recessive osteopetrosis and found CTSK mutations in 4 of them. In retrospect, their clinical and radiographic features were found to be compatible with, but not typical for, Pycnodysostosis. We sought to identify modifier genes that might have played a role in the clinical manifestation of the disease in these patients, but our results were not informative. In conclusion, we underline the difficulties of differential diagnosis in some patients whose clinical appearance does not fit the classical malignant or benign picture and recommend that CTSK gene be included in the molecular diagnosis of high bone density conditions. PMID:24269275

  7. Identifying mRNA sequence elements for target recognition by human Argonaute proteins

    PubMed Central

    Li, Jingjing; Kim, TaeHyung; Nutiu, Razvan; Ray, Debashish; Hughes, Timothy R.; Zhang, Zhaolei

    2014-01-01

    It is commonly known that mammalian microRNAs (miRNAs) guide the RNA-induced silencing complex (RISC) to target mRNAs through the seed-pairing rule. However, recent experiments that coimmunoprecipitate the Argonaute proteins (AGOs), the central catalytic component of RISC, have consistently revealed extensive AGO-associated mRNAs that lack seed complementarity with miRNAs. We herein test the hypothesis that AGO has its own binding preference within target mRNAs, independent of guide miRNAs. By systematically analyzing the data from in vivo cross-linking experiments with human AGOs, we have identified a structurally accessible and evolutionarily conserved region (∼10 nucleotides in length) that alone can accurately predict AGO–mRNA associations, independent of the presence of miRNA binding sites. Within this region, we further identified an enriched motif that was replicable on independent AGO-immunoprecipitation data sets. We used RNAcompete to enumerate the RNA-binding preference of human AGO2 to all possible 7-mer RNA sequences and validated the AGO motif in vitro. These findings reveal a novel function of AGOs as sequence-specific RNA-binding proteins, which may aid miRNAs in recognizing their targets with high specificity. PMID:24663241

  8. Amino acid and nucleotide recurrence in aligned sequences: synonymous substitution patterns in association with global and local base compositions.

    PubMed

    Nishizawa, M; Nishizawa, K

    2000-10-01

    The tendency for repetitiveness of nucleotides in DNA sequences has been reported for a variety of organisms. We show that the tendency for repetitive use of amino acids is widespread and is observed even for segments conserved between human and Drosophila melanogaster at the level of >50% amino acid identity. This indicates that repetitiveness influences not only the weakly constrained segments but also those sequence segments conserved among phyla. Not only glutamine (Q) but also many of the 20 amino acids show a comparable level of repetitiveness. Repetitiveness in bases at codon position 3 is stronger for human than for D.melanogaster, whereas local repetitiveness in intron sequences is similar between the two organisms. While genes for immune system-specific proteins, but not ancient human genes (i.e. human homologs of Escherichia coli genes), have repetitiveness at codon bases 1 and 2, repetitiveness at codon base 3 for these groups is similar, suggesting that the human genome has at least two mechanisms generating local repetitiveness. Neither amino acid nor nucleotide repetitiveness is observed beyond the exon boundary, denying the possibility that such repetitiveness could mainly stem from natural selection on mRNA or protein sequences. Analyses of mammalian sequence alignments show that while the 'between gene' GC content heterogeneity, which is linked to 'isochores', is a principal factor associated with the bias in substitution patterns in human, 'within gene' heterogeneity in nucleotide composition is also associated with such bias on a more local scale. The relationship amongst the various types of repetitiveness is discussed.

  9. Amino acid and nucleotide recurrence in aligned sequences: synonymous substitution patterns in association with global and local base compositions

    PubMed Central

    Nishizawa, Manami; Nishizawa, Kazuhisa

    2000-01-01

    The tendency for repetitiveness of nucleotides in DNA sequences has been reported for a variety of organisms. We show that the tendency for repetitive use of amino acids is widespread and is observed even for segments conserved between human and Drosophila melanogaster at the level of >50% amino acid identity. This indicates that repetitiveness influences not only the weakly constrained segments but also those sequence segments conserved among phyla. Not only glutamine (Q) but also many of the 20 amino acids show a comparable level of repetitiveness. Repetitiveness in bases at codon position 3 is stronger for human than for D.melanogaster, whereas local repetitiveness in intron sequences is similar between the two organisms. While genes for immune system-specific proteins, but not ancient human genes (i.e. human homologs of Escherichia coli genes), have repetitiveness at codon bases 1 and 2, repetitiveness at codon base 3 for these groups is similar, suggesting that the human genome has at least two mechanisms generating local repetitiveness. Neither amino acid nor nucleotide repetitiveness is observed beyond the exon boundary, denying the possibility that such repetitiveness could mainly stem from natural selection on mRNA or protein sequences. Analyses of mammalian sequence alignments show that while the ‘between gene’ GC content heterogeneity, which is linked to ‘isochores’, is a principal factor associated with the bias in substitution patterns in human, ‘within gene’ heterogeneity in nucleotide composition is also associated with such bias on a more local scale. The relationship amongst the various types of repetitiveness is discussed. PMID:11000273

  10. Rapid Polymer Sequencer

    NASA Technical Reports Server (NTRS)

    Stolc, Viktor (Inventor); Brock, Mathew W. (Inventor)

    2011-01-01

    Method and system for rapid and accurate determination of each of a sequence of unknown polymer components, such as nucleic acid components. A self-assembling monolayer of a selected substance is optionally provided on an interior surface of a pipette tip, and the interior surface is immersed in a selected liquid. A selected electrical field is impressed in a longitudinal or transverse direction at the tip, a polymer sequence is passed through the tip, and a change in an electrical current signal is measured as each polymer component passes through the tip. Each measured change in electrical current signals is compared with a database of reference signals, with each reference signal identified with a polymer component, to identify the unknown polymer component. The tip preferably has a pore inner diameter of no more than about 40 nm and is prepared by heating and pulling a very small section of a glass tubing.

  11. Comparative RNA-Sequence Transcriptome Analysis of Phenolic Acid Metabolism in Salvia miltiorrhiza, a Traditional Chinese Medicine Model Plant

    PubMed Central

    Song, Zhenqiao; Guo, Linlin; Liu, Tian; Lin, Caicai; Wang, Jianhua

    2017-01-01

    Salvia miltiorrhiza Bunge is an important traditional Chinese medicine (TCM). In this study, two S. miltiorrhiza genotypes (BH18 and ZH23) with different phenolic acid concentrations were used for de novo RNA sequencing (RNA-seq). A total of 170,787 transcripts and 56,216 unigenes were obtained. There were 670 differentially expressed genes (DEGs) identified between BH18 and ZH23, 250 of which were upregulated in ZH23, with genes involved in the phenylpropanoid biosynthesis pathway being the most upregulated genes. Nine genes involved in the lignin biosynthesis pathway were upregulated in BH18 and thus result in higher lignin content in BH18. However, expression profiles of most genes involved in the core common upstream phenylpropanoid biosynthesis pathway were higher in ZH23 than that in BH18. These results indicated that genes involved in the core common upstream phenylpropanoid biosynthesis pathway might play an important role in downstream secondary metabolism and demonstrated that lignin biosynthesis was a putative partially competing pathway with phenolic acid biosynthesis. The results of this study expanded our understanding of the regulation of phenolic acid biosynthesis in S. miltiorrhiza. PMID:28194403

  12. Simple sequence repeat markers that identify Claviceps species and strains.

    PubMed

    Gilmore, Barbara S; Alderman, Stephen C; Knaus, Brian J; Bassil, Nahla V; Martin, Ruth C; Dombrowski, James E; Dung, Jeremiah K S

    2016-01-01

    Claviceps purpurea is a pathogen that infects most members of Pooideae, a subfamily of Poaceae, and causes ergot, a floral disease in which the ovary is replaced with a sclerotium. When the ergot body is accidently consumed by either man or animal in high enough quantities, there is extreme pain, limb loss and sometimes death. This study was initiated to develop simple sequence repeat (SSRs) markers for rapid identification of  C. purpurea . SSRs were designed from sequence data stored at the National Center for Biotechnology Information database. The study consisted of 74 ergot isolates, from four different host species, Lolium perenne , Poa pratensis , Bromus inermis , and Secale cereale plus three additional Claviceps species, C. pusilla , C. paspali and C. fusiformis. Samples were collected from six different counties in Oregon and Washington over a 5-year period. Thirty-four SSR markers were selected, which enabled the differentiation of each isolate from one another based solely on their molecular fingerprints. Discriminant analysis of principle components was used to identify four isolate groups, CA Group 1, 2, 3, and 4, for subsequent cluster and molecular variance analyses. CA Group 1 consisting of eight isolates from the host species P. pratensis , was separated on the cluster analysis plot from the remaining three groups and this group was later identified as C. humidiphila . The other three groups were distinct from one another, but closely related. These three groups contained samples from all four of the host species. These SSRs are simple to use, reliable and allowed clear differentiation of C. humidiphila from C. purpurea . Isolates from the three separate species, C. pusilla , C. paspali and C. fusiformis , also amplified with these markers. The SSR markers developed in this study will be helpful in defining the population structure and genetics of Claviceps strains. They will also provide valuable tools for plant breeders needing to identify

  13. The amino acid sequence around the active-site cysteine and histidine residues of stem bromelain

    PubMed Central

    Husain, S. S.; Lowe, G.

    1970-01-01

    Stem bromelain that had been irreversibly inhibited with 1,3-dibromo[2-14C]-acetone was reduced with sodium borohydride and carboxymethylated with iodoacetic acid. After digestion with trypsin and α-chymotrypsin three radioactive peptides were isolated chromatographically. The amino acid sequences around the cross-linked cysteine and histidine residues were determined and showed a high degree of homology with those around the active-site cysteine and histidine residues of papain and ficin. PMID:5420046

  14. Predicted secondary structure similarity in the absence of primary amino acid sequence homology: hepatitis B virus open reading frames.

    PubMed Central

    Schaeffer, E; Sninsky, J J

    1984-01-01

    Proteins that are related evolutionarily may have diverged at the level of primary amino acid sequence while maintaining similar secondary structures. Computer analysis has been used to compare the open reading frames of the hepatitis B virus to those of the woodchuck hepatitis virus at the level of amino acid sequence, and to predict the relative hydrophilic character and the secondary structure of putative polypeptides. Similarity is seen at the levels of relative hydrophilicity and secondary structure, in the absence of sequence homology. These data reinforce the proposal that these open reading frames encode viral proteins. Computer analysis of this type can be more generally used to establish structural similarities between proteins that do not share obvious sequence homology as well as to assess whether an open reading frame is fortuitous or codes for a protein. PMID:6585835

  15. The sequence of sequencers: The history of sequencing DNA.

    PubMed

    Heather, James M; Chain, Benjamin

    2016-01-01

    Determining the order of nucleic acid residues in biological samples is an integral component of a wide variety of research applications. Over the last fifty years large numbers of researchers have applied themselves to the production of techniques and technologies to facilitate this feat, sequencing DNA and RNA molecules. This time-scale has witnessed tremendous changes, moving from sequencing short oligonucleotides to millions of bases, from struggling towards the deduction of the coding sequence of a single gene to rapid and widely available whole genome sequencing. This article traverses those years, iterating through the different generations of sequencing technology, highlighting some of the key discoveries, researchers, and sequences along the way. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  16. A complete Neandertal mitochondrial genome sequence determined by high-throughput sequencing

    PubMed Central

    Green, Richard E.; Malaspinas, Anna-Sapfo; Krause, Johannes; Briggs, Adrian W.; Johnson, Philip L. F.; Uhler, Caroline; Meyer, Matthias; Good, Jeffrey M.; Maricic, Tomislav; Stenzel, Udo; Prüfer, Kay; Siebauer, Michael; Burbano, Hernán A.; Ronan, Michael; Rothberg, Jonathan M.; Egholm, Michael; Rudan, Pavao; Brajković, Dejana; Kućan, Željko; Gušić, Ivan; Wikström, Mårten; Laakkonen, Liisa; Kelso, Janet; Slatkin, Montgomery; Pääbo, Svante

    2008-01-01

    Summary A complete mitochondrial (mt) genome sequence was reconstructed from a 38,000-year-old Neandertal individual using 8,341 mtDNA sequences identified among 4.8 Gb of DNA generated from ~0.3 grams of bone. Analysis of the assembled sequence unequivocally establishes that the Neandertal mtDNA falls outside the variation of extant human mtDNAs and allows an estimate of the divergence date between the two mtDNA lineages of 660,000±140,000 years. Of the 13 proteins encoded in the mtDNA, subunit 2 of cytochrome c oxidase of the mitochondrial electron transport chain has experienced the largest number of amino acid substitutions in human ancestors since the separation from Neandertals. There is evidence that purifying selection in the Neandertal mtDNA was reduced compared to other primate lineages suggesting that the effective population size of Neandertals was small. PMID:18692465

  17. TP53, PIK3CA, FBXW7 and KRAS Mutations in Esophageal Cancer Identified by Targeted Sequencing.

    PubMed

    Zheng, Huili; Wang, Yan; Tang, Chuanning; Jones, Lindsey; Ye, Hua; Zhang, Guangchun; Cao, Weihai; Li, Jingwen; Liu, Lifeng; Liu, Zhencong; Zhang, Chao; Lou, Feng; Liu, Zhiyuan; Li, Yangyang; Shi, Zhenfen; Zhang, Jingbo; Zhang, Dandan; Sun, Hong; Dong, Haichao; Dong, Zhishou; Guo, Baishuai; Yan, H E; Lu, Qingyu; Huang, Xue; Chen, Si-Yi

    2016-01-01

    Esophageal cancer (EC) is a common malignancy with significant morbidity and mortality. As individual cancers exhibit unique mutation patterns, identifying and characterizing gene mutations in EC that may serve as biomarkers might help predict patient outcome and guide treatment. Traditionally, personalized cancer DNA sequencing was impractical and expensive. Recent technological advancements have made targeted DNA sequencing more cost- and time-effective with reliable results. This technology may be useful for clinicians to direct patient treatment. The Ion PGM and AmpliSeq Cancer Panel was used to identify mutations at 737 hotspot loci of 45 cancer-related genes in 64 EC samples from Chinese patients. Frequent mutations were found in TP53 and less frequent mutations in PIK3CA, FBXW7 and KRAS. These results demonstrate that targeted sequencing can reliably identify mutations in individual tumors that make this technology a possibility for clinical use. Copyright© 2016, International Institute of Anticancer Research (Dr. John G. Delinasios), All rights reserved.

  18. Deep sequencing of the Mexican avocado transcriptome, an ancient angiosperm with a high content of fatty acids.

    PubMed

    Ibarra-Laclette, Enrique; Méndez-Bravo, Alfonso; Pérez-Torres, Claudia Anahí; Albert, Victor A; Mockaitis, Keithanne; Kilaru, Aruna; López-Gómez, Rodolfo; Cervantes-Luevano, Jacob Israel; Herrera-Estrella, Luis

    2015-08-13

    Avocado (Persea americana) is an economically important tropical fruit considered to be a good source of fatty acids. Despite its importance, the molecular and cellular characterization of biochemical and developmental processes in avocado is limited due to the lack of transcriptome and genomic information. The transcriptomes of seeds, roots, stems, leaves, aerial buds and flowers were determined using different sequencing platforms. Additionally, the transcriptomes of three different stages of fruit ripening (pre-climacteric, climacteric and post-climacteric) were also analyzed. The analysis of the RNAseqatlas presented here reveals strong differences in gene expression patterns between different organs, especially between root and flower, but also reveals similarities among the gene expression patterns in other organs, such as stem, leaves and aerial buds (vegetative organs) or seed and fruit (storage organs). Important regulators, functional categories, and differentially expressed genes involved in avocado fruit ripening were identified. Additionally, to demonstrate the utility of the avocado gene expression atlas, we investigated the expression patterns of genes implicated in fatty acid metabolism and fruit ripening. A description of transcriptomic changes occurring during fruit ripening was obtained in Mexican avocado, contributing to a dynamic view of the expression patterns of genes involved in fatty acid biosynthesis and the fruit ripening process.

  19. Coexpression Analysis Identifies Two Oxidoreductases Involved in the Biosynthesis of the Monoterpene Acid Moiety of Natural Pyrethrin Insecticides in Tanacetum cinerariifolium1[OPEN

    PubMed Central

    Moghe, Gaurav D.

    2018-01-01

    Flowers of Tanacetum cinerariifolium produce a set of compounds known collectively as pyrethrins, which are commercially important pesticides that are strongly toxic to flying insects but not to most vertebrates. A pyrethrin molecule is an ester consisting of either trans-chrysanthemic acid or its modified form, pyrethric acid, and one of three alcohols, jasmolone, pyrethrolone, and cinerolone, that appear to be derived from jasmonic acid. Chrysanthemyl diphosphate synthase (CDS), the first enzyme involved in the synthesis of trans-chrysanthemic acid, was characterized previously and its gene isolated. TcCDS produces free trans-chrysanthemol in addition to trans-chrysanthemyl diphosphate, but the enzymes responsible for the conversion of trans-chrysanthemol to the corresponding aldehyde and then to the acid have not been reported. We used an RNA sequencing-based approach and coexpression correlation analysis to identify several candidate genes encoding putative trans-chrysanthemol and trans-chrysanthemal dehydrogenases. We functionally characterized the proteins encoded by these genes using a combination of in vitro biochemical assays and heterologous expression in planta to demonstrate that TcADH2 encodes an enzyme that oxidizes trans-chrysanthemol to trans-chrysanthemal, while TcALDH1 encodes an enzyme that oxidizes trans-chrysanthemal into trans-chrysanthemic acid. Transient coexpression of TcADH2 and TcALDH1 together with TcCDS in Nicotiana benthamiana leaves results in the production of trans-chrysanthemic acid as well as several other side products. The majority (58%) of trans-chrysanthemic acid was glycosylated or otherwise modified. Overall, these data identify key steps in the biosynthesis of pyrethrins and demonstrate the feasibility of metabolic engineering to produce components of these defense compounds in a heterologous host. PMID:29122986

  20. Targeted next-generation sequencing analysis identifies novel mutations in families with severe familial exudative vitreoretinopathy.

    PubMed

    Huang, Xiao-Yan; Zhuang, Hong; Wu, Ji-Hong; Li, Jian-Kang; Hu, Fang-Yuan; Zheng, Yu; Tellier, Laurent Christian Asker M; Zhang, Sheng-Hai; Gao, Feng-Juan; Zhang, Jian-Guo; Xu, Ge-Zhi

    2017-01-01

    Familial exudative vitreoretinopathy (FEVR) is a genetically and clinically heterogeneous disease, characterized by failure of vascular development of the peripheral retina. The symptoms of FEVR vary widely among patients in the same family, and even between the two eyes of a given patient. This study was designed to identify the genetic defect in a patient cohort of ten Chinese families with a definitive diagnosis of FEVR. To identify the causative gene, next-generation sequencing (NGS)-based target capture sequencing was performed. Segregation analysis of the candidate variant was performed in additional family members by using Sanger sequencing and quantitative real-time PCR (QPCR). Of the cohort of ten FEVR families, six pathogenic variants were identified, including four novel and two known heterozygous mutations. Of the variants identified, four were missense variants, and two were novel heterozygous deletion mutations [ LRP5 , c.4053 DelC (p.Ile1351IlefsX88); TSPAN12 , EX8Del]. The two novel heterozygous deletion mutations were not observed in the control subjects and could give rise to a relatively severe FEVR phenotype, which could be explained by the protein function prediction. We identified two novel heterozygous deletion mutations [ LRP5 , c.4053 DelC (p.Ile1351IlefsX88); TSPAN12 , EX8Del] using targeted NGS as a causative mutation for FEVR. These genetic deletion variations exhibit a severe form of FEVR, with tractional retinal detachments compared with other known point mutations. The data further enrich the mutation spectrum of FEVR and enhance our understanding of genotype-phenotype correlations to provide useful information for disease diagnosis, prognosis, and effective genetic counseling.

  1. Targeted next-generation sequencing analysis identifies novel mutations in families with severe familial exudative vitreoretinopathy

    PubMed Central

    Huang, Xiao-Yan; Zhuang, Hong; Wu, Ji-Hong; Li, Jian-Kang; Hu, Fang-Yuan; Zheng, Yu; Tellier, Laurent Christian Asker M.; Zhang, Sheng-Hai; Gao, Feng-Juan; Zhang, Jian-Guo

    2017-01-01

    Purpose Familial exudative vitreoretinopathy (FEVR) is a genetically and clinically heterogeneous disease, characterized by failure of vascular development of the peripheral retina. The symptoms of FEVR vary widely among patients in the same family, and even between the two eyes of a given patient. This study was designed to identify the genetic defect in a patient cohort of ten Chinese families with a definitive diagnosis of FEVR. Methods To identify the causative gene, next-generation sequencing (NGS)-based target capture sequencing was performed. Segregation analysis of the candidate variant was performed in additional family members by using Sanger sequencing and quantitative real-time PCR (QPCR). Results Of the cohort of ten FEVR families, six pathogenic variants were identified, including four novel and two known heterozygous mutations. Of the variants identified, four were missense variants, and two were novel heterozygous deletion mutations [LRP5, c.4053 DelC (p.Ile1351IlefsX88); TSPAN12, EX8Del]. The two novel heterozygous deletion mutations were not observed in the control subjects and could give rise to a relatively severe FEVR phenotype, which could be explained by the protein function prediction. Conclusions We identified two novel heterozygous deletion mutations [LRP5, c.4053 DelC (p.Ile1351IlefsX88); TSPAN12, EX8Del] using targeted NGS as a causative mutation for FEVR. These genetic deletion variations exhibit a severe form of FEVR, with tractional retinal detachments compared with other known point mutations. The data further enrich the mutation spectrum of FEVR and enhance our understanding of genotype–phenotype correlations to provide useful information for disease diagnosis, prognosis, and effective genetic counseling. PMID:28867931

  2. Whole-exome sequencing identifies novel candidate predisposition genes for familial polycythemia vera.

    PubMed

    Hirvonen, Elina A M; Pitkänen, Esa; Hemminki, Kari; Aaltonen, Lauri A; Kilpivaara, Outi

    2017-04-20

    Polycythemia vera (PV), characterized by massive production of erythrocytes, is one of the myeloproliferative neoplasms. Most patients carry a somatic gain-of-function mutation in JAK2, c.1849G > T (p.Val617Phe), leading to constitutive activation of JAK-STAT signaling pathway. Familial clustering is also observed occasionally, but high-penetrance predisposition genes to PV have remained unidentified. We studied the predisposition to PV by exome sequencing (three cases) in a Finnish PV family with four patients. The 12 shared variants (maximum allowed minor allele frequency <0.001 in Finnish population in ExAC database) predicted damaging in silico and absent in an additional control set of over 500 Finns were further validated by Sanger sequencing in a fourth affected family member. Three novel predisposition candidate variants were identified: c.1254C > G (p.Phe418Leu) in ZXDC, c.1931C > G (p.Pro644Arg) in ATN1, and c.701G > A (p.Arg234Gln) in LRRC3. We also observed a rare, predicted benign germline variant c.2912C > G (p.Ala971Gly) in BCORL1 in all four patients. Somatic mutations in BCORL1 have been reported in myeloid malignancies. We further screened the variants in eight PV patients in six other Finnish families, but no other carriers were found. Exome sequencing provides a powerful tool for the identification of novel variants, and understanding the familial predisposition of diseases. This is the first report on Finnish familial PV cases, and we identified three novel candidate variants that may predispose to the disease.

  3. Identifying molecular drivers of gastric cancer through next-generation sequencing.

    PubMed

    Liang, Han; Kim, Yon Hui

    2013-11-01

    Gastric cancer is the second most common cause of cancer-related death in the world, representing a major global health issue. The high mortality rate is largely due to the lack of effective medical treatment for advanced stages of this disease. Recently next-generation sequencing (NGS) technology has become a revolutionary tool for cancer research, and several NGS studies in gastric cancer have been published. Here we review the insights gained from these studies regarding how use NGS to elucidate the molecular basis of gastric cancer and identify potential therapeutic targets. We also discuss the challenges and future directions of such efforts. Published by Elsevier Ireland Ltd.

  4. X-exome sequencing of 405 unresolved families identifies seven novel intellectual disability genes.

    PubMed

    Hu, H; Haas, S A; Chelly, J; Van Esch, H; Raynaud, M; de Brouwer, A P M; Weinert, S; Froyen, G; Frints, S G M; Laumonnier, F; Zemojtel, T; Love, M I; Richard, H; Emde, A-K; Bienek, M; Jensen, C; Hambrock, M; Fischer, U; Langnick, C; Feldkamp, M; Wissink-Lindhout, W; Lebrun, N; Castelnau, L; Rucci, J; Montjean, R; Dorseuil, O; Billuart, P; Stuhlmann, T; Shaw, M; Corbett, M A; Gardner, A; Willis-Owen, S; Tan, C; Friend, K L; Belet, S; van Roozendaal, K E P; Jimenez-Pocquet, M; Moizard, M-P; Ronce, N; Sun, R; O'Keeffe, S; Chenna, R; van Bömmel, A; Göke, J; Hackett, A; Field, M; Christie, L; Boyle, J; Haan, E; Nelson, J; Turner, G; Baynam, G; Gillessen-Kaesbach, G; Müller, U; Steinberger, D; Budny, B; Badura-Stronka, M; Latos-Bieleńska, A; Ousager, L B; Wieacker, P; Rodríguez Criado, G; Bondeson, M-L; Annerén, G; Dufke, A; Cohen, M; Van Maldergem, L; Vincent-Delorme, C; Echenne, B; Simon-Bouy, B; Kleefstra, T; Willemsen, M; Fryns, J-P; Devriendt, K; Ullmann, R; Vingron, M; Wrogemann, K; Wienker, T F; Tzschach, A; van Bokhoven, H; Gecz, J; Jentsch, T J; Chen, W; Ropers, H-H; Kalscheuer, V M

    2016-01-01

    X-linked intellectual disability (XLID) is a clinically and genetically heterogeneous disorder. During the past two decades in excess of 100 X-chromosome ID genes have been identified. Yet, a large number of families mapping to the X-chromosome remained unresolved suggesting that more XLID genes or loci are yet to be identified. Here, we have investigated 405 unresolved families with XLID. We employed massively parallel sequencing of all X-chromosome exons in the index males. The majority of these males were previously tested negative for copy number variations and for mutations in a subset of known XLID genes by Sanger sequencing. In total, 745 X-chromosomal genes were screened. After stringent filtering, a total of 1297 non-recurrent exonic variants remained for prioritization. Co-segregation analysis of potential clinically relevant changes revealed that 80 families (20%) carried pathogenic variants in established XLID genes. In 19 families, we detected likely causative protein truncating and missense variants in 7 novel and validated XLID genes (CLCN4, CNKSR2, FRMPD4, KLHL15, LAS1L, RLIM and USP27X) and potentially deleterious variants in 2 novel candidate XLID genes (CDK16 and TAF1). We show that the CLCN4 and CNKSR2 variants impair protein functions as indicated by electrophysiological studies and altered differentiation of cultured primary neurons from Clcn4(-/-) mice or after mRNA knock-down. The newly identified and candidate XLID proteins belong to pathways and networks with established roles in cognitive function and intellectual disability in particular. We suggest that systematic sequencing of all X-chromosomal genes in a cohort of patients with genetic evidence for X-chromosome locus involvement may resolve up to 58% of Fragile X-negative cases.

  5. X-exome sequencing of 405 unresolved families identifies seven novel intellectual disability genes

    PubMed Central

    Hu, H; Haas, S A; Chelly, J; Van Esch, H; Raynaud, M; de Brouwer, A P M; Weinert, S; Froyen, G; Frints, S G M; Laumonnier, F; Zemojtel, T; Love, M I; Richard, H; Emde, A-K; Bienek, M; Jensen, C; Hambrock, M; Fischer, U; Langnick, C; Feldkamp, M; Wissink-Lindhout, W; Lebrun, N; Castelnau, L; Rucci, J; Montjean, R; Dorseuil, O; Billuart, P; Stuhlmann, T; Shaw, M; Corbett, M A; Gardner, A; Willis-Owen, S; Tan, C; Friend, K L; Belet, S; van Roozendaal, K E P; Jimenez-Pocquet, M; Moizard, M-P; Ronce, N; Sun, R; O'Keeffe, S; Chenna, R; van Bömmel, A; Göke, J; Hackett, A; Field, M; Christie, L; Boyle, J; Haan, E; Nelson, J; Turner, G; Baynam, G; Gillessen-Kaesbach, G; Müller, U; Steinberger, D; Budny, B; Badura-Stronka, M; Latos-Bieleńska, A; Ousager, L B; Wieacker, P; Rodríguez Criado, G; Bondeson, M-L; Annerén, G; Dufke, A; Cohen, M; Van Maldergem, L; Vincent-Delorme, C; Echenne, B; Simon-Bouy, B; Kleefstra, T; Willemsen, M; Fryns, J-P; Devriendt, K; Ullmann, R; Vingron, M; Wrogemann, K; Wienker, T F; Tzschach, A; van Bokhoven, H; Gecz, J; Jentsch, T J; Chen, W; Ropers, H-H; Kalscheuer, V M

    2016-01-01

    X-linked intellectual disability (XLID) is a clinically and genetically heterogeneous disorder. During the past two decades in excess of 100 X-chromosome ID genes have been identified. Yet, a large number of families mapping to the X-chromosome remained unresolved suggesting that more XLID genes or loci are yet to be identified. Here, we have investigated 405 unresolved families with XLID. We employed massively parallel sequencing of all X-chromosome exons in the index males. The majority of these males were previously tested negative for copy number variations and for mutations in a subset of known XLID genes by Sanger sequencing. In total, 745 X-chromosomal genes were screened. After stringent filtering, a total of 1297 non-recurrent exonic variants remained for prioritization. Co-segregation analysis of potential clinically relevant changes revealed that 80 families (20%) carried pathogenic variants in established XLID genes. In 19 families, we detected likely causative protein truncating and missense variants in 7 novel and validated XLID genes (CLCN4, CNKSR2, FRMPD4, KLHL15, LAS1L, RLIM and USP27X) and potentially deleterious variants in 2 novel candidate XLID genes (CDK16 and TAF1). We show that the CLCN4 and CNKSR2 variants impair protein functions as indicated by electrophysiological studies and altered differentiation of cultured primary neurons from Clcn4−/− mice or after mRNA knock-down. The newly identified and candidate XLID proteins belong to pathways and networks with established roles in cognitive function and intellectual disability in particular. We suggest that systematic sequencing of all X-chromosomal genes in a cohort of patients with genetic evidence for X-chromosome locus involvement may resolve up to 58% of Fragile X-negative cases. PMID:25644381

  6. From amino acid sequence to bioactivity: The biomedical potential of antitumor peptides.

    PubMed

    Blanco-Míguez, Aitor; Gutiérrez-Jácome, Alberto; Pérez-Pérez, Martín; Pérez-Rodríguez, Gael; Catalán-García, Sandra; Fdez-Riverola, Florentino; Lourenço, Anália; Sánchez, Borja

    2016-06-01

    Chemoprevention is the use of natural and/or synthetic substances to block, reverse, or retard the process of carcinogenesis. In this field, the use of antitumor peptides is of interest as, (i) these molecules are small in size, (ii) they show good cell diffusion and permeability, (iii) they affect one or more specific molecular pathways involved in carcinogenesis, and (iv) they are not usually genotoxic. We have checked the Web of Science Database (23/11/2015) in order to collect papers reporting on bioactive peptide (1691 registers), which was further filtered searching terms such as "antiproliferative," "antitumoral," or "apoptosis" among others. Works reporting the amino acid sequence of an antiproliferative peptide were kept (60 registers), and this was complemented with the peptides included in CancerPPD, an extensive resource for antiproliferative peptides and proteins. Peptides were grouped according to one of the following mechanism of action: inhibition of cell migration, inhibition of tumor angiogenesis, antioxidative mechanisms, inhibition of gene transcription/cell proliferation, induction of apoptosis, disorganization of tubulin structure, cytotoxicity, or unknown mechanisms. The main mechanisms of action of those antiproliferative peptides with known amino acid sequences are presented and finally, their potential clinical usefulness and future challenges on their application is discussed. © 2016 The Protein Society.

  7. From amino acid sequence to bioactivity: The biomedical potential of antitumor peptides

    PubMed Central

    Blanco‐Míguez, Aitor; Gutiérrez‐Jácome, Alberto; Pérez‐Pérez, Martín; Pérez‐Rodríguez, Gael; Catalán‐García, Sandra; Fdez‐Riverola, Florentino; Lourenço, Anália

    2016-01-01

    Abstract Chemoprevention is the use of natural and/or synthetic substances to block, reverse, or retard the process of carcinogenesis. In this field, the use of antitumor peptides is of interest as, (i) these molecules are small in size, (ii) they show good cell diffusion and permeability, (iii) they affect one or more specific molecular pathways involved in carcinogenesis, and (iv) they are not usually genotoxic. We have checked the Web of Science Database (23/11/2015) in order to collect papers reporting on bioactive peptide (1691 registers), which was further filtered searching terms such as “antiproliferative,” “antitumoral,” or “apoptosis” among others. Works reporting the amino acid sequence of an antiproliferative peptide were kept (60 registers), and this was complemented with the peptides included in CancerPPD, an extensive resource for antiproliferative peptides and proteins. Peptides were grouped according to one of the following mechanism of action: inhibition of cell migration, inhibition of tumor angiogenesis, antioxidative mechanisms, inhibition of gene transcription/cell proliferation, induction of apoptosis, disorganization of tubulin structure, cytotoxicity, or unknown mechanisms. The main mechanisms of action of those antiproliferative peptides with known amino acid sequences are presented and finally, their potential clinical usefulness and future challenges on their application is discussed. PMID:27010507

  8. Complete nucleotide and derived amino acid sequence of cDNA encoding the mitochondrial uncoupling protein of rat brown adipose tissue: lack of a mitochondrial targeting presequence.

    PubMed Central

    Ridley, R G; Patel, H V; Gerber, G E; Morton, R C; Freeman, K B

    1986-01-01

    A cDNA clone spanning the entire amino acid sequence of the nuclear-encoded uncoupling protein of rat brown adipose tissue mitochondria has been isolated and sequenced. With the exception of the N-terminal methionine the deduced N-terminus of the newly synthesized uncoupling protein is identical to the N-terminal 30 amino acids of the native uncoupling protein as determined by protein sequencing. This proves that the protein contains no N-terminal mitochondrial targeting prepiece and that a targeting region must reside within the amino acid sequence of the mature protein. Images PMID:3012461

  9. Sequence and structural analyses of nuclear export signals in the NESdb database

    PubMed Central

    Xu, Darui; Farmer, Alicia; Collett, Garen; Grishin, Nick V.; Chook, Yuh Min

    2012-01-01

    We compiled >200 nuclear export signal (NES)–containing CRM1 cargoes in a database named NESdb. We analyzed the sequences and three-dimensional structures of natural, experimentally identified NESs and of false-positive NESs that were generated from the database in order to identify properties that might distinguish the two groups of sequences. Analyses of amino acid frequencies, sequence logos, and agreement with existing NES consensus sequences revealed strong preferences for the Φ1-X3-Φ2-X2-Φ3-X-Φ4 pattern and for negatively charged amino acids in the nonhydrophobic positions of experimentally identified NESs but not of false positives. Strong preferences against certain hydrophobic amino acids in the hydrophobic positions were also revealed. These findings led to a new and more precise NES consensus. More important, three-dimensional structures are now available for 68 NESs within 56 different cargo proteins. Analyses of these structures showed that experimentally identified NESs are more likely than the false positives to adopt α-helical conformations that transition to loops at their C-termini and more likely to be surface accessible within their protein domains or be present in disordered or unobserved parts of the structures. Such distinguishing features for real NESs might be useful in future NES prediction efforts. Finally, we also tested CRM1-binding of 40 NESs that were found in the 56 structures. We found that 16 of the NES peptides did not bind CRM1, hence illustrating how NESs are easily misidentified. PMID:22833565

  10. An Internet-Accessible DNA Sequence Database for Identifying Fusaria from Human and Animal Infections

    USDA-ARS?s Scientific Manuscript database

    Because less than one-third of clinically relevant fusaria can be accurately identified to species level using phenotypic data (i.e., morphological species recognition), we constructed a three-locus DNA sequence database to facilitate molecular identification of the 69 Fusarium species associated wi...

  11. Identification of Delta5-fatty acid desaturase from the cellular slime mold dictyostelium discoideum.

    PubMed

    Saito, T; Ochiai, H

    1999-10-01

    cDNA fragments putatively encoding amino acid sequences characteristic of the fatty acid desaturase were obtained using expressed sequence tag (EST) information of the Dictyostelium cDNA project. Using this sequence, we have determined the cDNA sequence and genomic sequence of a desaturase. The cloned cDNA is 1489 nucleotides long and the deduced amino acid sequence comprised 464 amino acid residues containing an N-terminal cytochrome b5 domain. The whole sequence was 38.6% identical to the initially identified Delta5-desaturase of Mortierella alpina. We have confirmed its function as Delta5-desaturase by over expression mutation in D. discoideum and also the gain of function mutation in the yeast Saccharomyces cerevisiae. Analysis of the lipids from transformed D. discoideum and yeast demonstrated the accumulation of Delta5-desaturated products. This is the first report concering fatty acid desaturase in cellular slime molds.

  12. Genome Sequence of Lactobacillus rhamnosus Strain CASL, an Efficient l-Lactic Acid Producer from Cheap Substrate Cassava

    PubMed Central

    Yu, Bo; Su, Fei; Wang, Limin; Zhao, Bo; Qin, Jiayang; Ma, Cuiqing; Xu, Ping; Ma, Yanhe

    2011-01-01

    Lactobacillus rhamnosus is a type of probiotic bacteria with industrial potential for l-lactic acid production. We announce the draft genome sequence of L. rhamnosus CASL (2,855,156 bp with a G+C content of 46.6%), which is an efficient producer of l-lactic acid from cheap, nonfood substrate cassava with a high production titer. PMID:22123765

  13. Fluorescence energy transfer as a probe for nucleic acid structures and sequences.

    PubMed Central

    Mergny, J L; Boutorine, A S; Garestier, T; Belloc, F; Rougée, M; Bulychev, N V; Koshkin, A A; Bourson, J; Lebedev, A V; Valeur, B

    1994-01-01

    The primary or secondary structure of single-stranded nucleic acids has been investigated with fluorescent oligonucleotides, i.e., oligonucleotides covalently linked to a fluorescent dye. Five different chromophores were used: 2-methoxy-6-chloro-9-amino-acridine, coumarin 500, fluorescein, rhodamine and ethidium. The chemical synthesis of derivatized oligonucleotides is described. Hybridization of two fluorescent oligonucleotides to adjacent nucleic acid sequences led to fluorescence excitation energy transfer between the donor and the acceptor dyes. This phenomenon was used to probe primary and secondary structures of DNA fragments and the orientation of oligodeoxynucleotides synthesized with the alpha-anomers of nucleoside units. Fluorescence energy transfer can be used to reveal the formation of hairpin structures and the translocation of genes between two chromosomes. PMID:8152922

  14. Exome sequencing identifies CTSK mutations in patients originally diagnosed as intermediate osteopetrosis.

    PubMed

    Pangrazio, Alessandra; Puddu, Alessandro; Oppo, Manuela; Valentini, Maria; Zammataro, Luca; Vellodi, Ashok; Gener, Blanca; Llano-Rivas, Isabel; Raza, Jamal; Atta, Irum; Vezzoni, Paolo; Superti-Furga, Andrea; Villa, Anna; Sobacchi, Cristina

    2014-02-01

    Autosomal Recessive Osteopetrosis is a genetic disorder characterized by increased bone density due to lack of resorption by the osteoclasts. Genetic studies have widely unraveled the molecular basis of the most severe forms, while cases of intermediate severity are more difficult to characterize, probably because of a large heterogeneity. Here, we describe the use of exome sequencing in the molecular diagnosis of 2 siblings initially thought to be affected by "intermediate osteopetrosis", which identified a homozygous mutation in the CTSK gene. Prompted by this finding, we tested by Sanger sequencing 25 additional patients addressed to us for recessive osteopetrosis and found CTSK mutations in 4 of them. In retrospect, their clinical and radiographic features were found to be compatible with, but not typical for, Pycnodysostosis. We sought to identify modifier genes that might have played a role in the clinical manifestation of the disease in these patients, but our results were not informative. In conclusion, we underline the difficulties of differential diagnosis in some patients whose clinical appearance does not fit the classical malignant or benign picture and recommend that CTSK gene be included in the molecular diagnosis of high bone density conditions. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.

  15. A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.

    PubMed

    Liu, Yu-Cheng; Yang, Meng-Han; Lin, Win-Li; Huang, Chien-Kang; Oyang, Yen-Jen

    2009-12-03

    Proteins are dynamic macromolecules which may undergo conformational transitions upon changes in environment. As it has been observed in laboratories that protein flexibility is correlated to essential biological functions, scientists have been designing various types of predictors for identifying structurally flexible regions in proteins. In this respect, there are two major categories of predictors. One category of predictors attempts to identify conformationally flexible regions through analysis of protein tertiary structures. Another category of predictors works completely based on analysis of the polypeptide sequences. As the availability of protein tertiary structures is generally limited, the design of predictors that work completely based on sequence information is crucial for advances of molecular biology research. In this article, we propose a novel approach to design a sequence-based predictor for identifying conformationally ambivalent regions in proteins. The novelty in the design stems from incorporating two classifiers based on two distinctive supervised learning algorithms that provide complementary prediction powers. Experimental results show that the overall performance delivered by the hybrid predictor proposed in this article is superior to the performance delivered by the existing predictors. Furthermore, the case study presented in this article demonstrates that the proposed hybrid predictor is capable of providing the biologists with valuable clues about the functional sites in a protein chain. The proposed hybrid predictor provides the users with two optional modes, namely, the high-sensitivity mode and the high-specificity mode. The experimental results with an independent testing data set show that the proposed hybrid predictor is capable of delivering sensitivity of 0.710 and specificity of 0.608 under the high-sensitivity mode, while delivering sensitivity of 0.451 and specificity of 0.787 under the high-specificity mode. Though

  16. SNPs in putative regulatory regions identified by human mouse comparative sequencing and transcription factor binding site data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Banerjee, Poulabi; Bahlo, Melanie; Schwartz, Jody R.

    2002-01-01

    Genome wide disease association analysis using SNPs is being explored as a method for dissecting complex genetic traits and a vast number of SNPs have been generated for this purpose. As there are cost and throughput limitations of genotyping large numbers of SNPs and statistical issues regarding the large number of dependent tests on the same data set, to make association analysis practical it has been proposed that SNPs should be prioritized based on likely functional importance. The most easily identifiable functional SNPs are coding SNPs (cSNPs) and accordingly cSNPs have been screened in a number of studies. SNPs inmore » gene regulatory sequences embedded in noncoding DNA are another class of SNPs suggested for prioritization due to their predicted quantitative impact on gene expression. The main challenge in evaluating these SNPs, in contrast to cSNPs is a lack of robust algorithms and databases for recognizing regulatory sequences in noncoding DNA. Approaches that have been previously used to delineate noncoding sequences with gene regulatory activity include cross-species sequence comparisons and the search for sequences recognized by transcription factors. We combined these two methods to sift through mouse human genomic sequences to identify putative gene regulatory elements and subsequently localized SNPs within these sequences in a 1 Megabase (Mb) region of human chromosome 5q31, orthologous to mouse chromosome 11 containing the Interleukin cluster.« less

  17. Relative quantification of 40 nucleic acid sequences by multiplex ligation-dependent probe amplification

    PubMed Central

    Schouten, Jan P.; McElgunn, Cathal J.; Waaijer, Raymond; Zwijnenburg, Danny; Diepvens, Filip; Pals, Gerard

    2002-01-01

    We describe a new method for relative quantification of 40 different DNA sequences in an easy to perform reaction requiring only 20 ng of human DNA. Applications shown of this multiplex ligation-dependent probe amplification (MLPA) technique include the detection of exon deletions and duplications in the human BRCA1, MSH2 and MLH1 genes, detection of trisomies such as Down’s syndrome, characterisation of chromosomal aberrations in cell lines and tumour samples and SNP/mutation detection. Relative quantification of mRNAs by MLPA will be described elsewhere. In MLPA, not sample nucleic acids but probes added to the samples are amplified and quantified. Amplification of probes by PCR depends on the presence of probe target sequences in the sample. Each probe consists of two oligonucleotides, one synthetic and one M13 derived, that hybridise to adjacent sites of the target sequence. Such hybridised probe oligonucleotides are ligated, permitting subsequent amplification. All ligated probes have identical end sequences, permitting simultaneous PCR amplification using only one primer pair. Each probe gives rise to an amplification product of unique size between 130 and 480 bp. Probe target sequences are small (50–70 nt). The prerequisite of a ligation reaction provides the opportunity to discriminate single nucleotide differences. PMID:12060695

  18. Relative quantification of 40 nucleic acid sequences by multiplex ligation-dependent probe amplification.

    PubMed

    Schouten, Jan P; McElgunn, Cathal J; Waaijer, Raymond; Zwijnenburg, Danny; Diepvens, Filip; Pals, Gerard

    2002-06-15

    We describe a new method for relative quantification of 40 different DNA sequences in an easy to perform reaction requiring only 20 ng of human DNA. Applications shown of this multiplex ligation-dependent probe amplification (MLPA) technique include the detection of exon deletions and duplications in the human BRCA1, MSH2 and MLH1 genes, detection of trisomies such as Down's syndrome, characterisation of chromosomal aberrations in cell lines and tumour samples and SNP/mutation detection. Relative quantification of mRNAs by MLPA will be described elsewhere. In MLPA, not sample nucleic acids but probes added to the samples are amplified and quantified. Amplification of probes by PCR depends on the presence of probe target sequences in the sample. Each probe consists of two oligonucleotides, one synthetic and one M13 derived, that hybridise to adjacent sites of the target sequence. Such hybridised probe oligonucleotides are ligated, permitting subsequent amplification. All ligated probes have identical end sequences, permitting simultaneous PCR amplification using only one primer pair. Each probe gives rise to an amplification product of unique size between 130 and 480 bp. Probe target sequences are small (50-70 nt). The prerequisite of a ligation reaction provides the opportunity to discriminate single nucleotide differences.

  19. Many amino acid substitution variants identified in DNA repair genes during human population screenings are predicted to impact protein function

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Xi, T; Jones, I M; Mohrenweiser, H W

    2003-11-03

    Over 520 different amino acid substitution variants have been previously identified in the systematic screening of 91 human DNA repair genes for sequence variation. Two algorithms were employed to predict the impact of these amino acid substitutions on protein activity. Sorting Intolerant From Tolerant (SIFT) classified 226 of 508 variants (44%) as ''Intolerant''. Polymorphism Phenotyping (PolyPhen) classed 165 of 489 amino acid substitutions (34%) as ''Probably or Possibly Damaging''. Another 9-15% of the variants were classed as ''Potentially Intolerant or Damaging''. The results from the two algorithms are highly associated, with concordance in predicted impact observed for {approx}62% of themore » variants. Twenty one to thirty one percent of the variant proteins are predicted to exhibit reduced activity by both algorithms. These variants occur at slightly lower individual allele frequency than do the variants classified as ''Tolerant'' or ''Benign''. Both algorithms correctly predicted the impact of 26 functionally characterized amino acid substitutions in the APE1 protein on biochemical activity, with one exception. It is concluded that a substantial fraction of the missense variants observed in the general human population are functionally relevant. These variants are expected to be the molecular genetic and biochemical basis for the associations of reduced DNA repair capacity phenotypes with elevated cancer risk.« less

  20. Sequence-based screening for self-sufficient P450 monooxygenase from a metagenome library.

    PubMed

    Kim, B S; Kim, S Y; Park, J; Park, W; Hwang, K Y; Yoon, Y J; Oh, W K; Kim, B Y; Ahn, J S

    2007-05-01

    Cytochrome P450 monooxygenases (CYPs) are useful catalysts for oxidation reactions. Self-sufficient CYPs harbour a reductive domain covalently connected to a P450 domain and are known for their robust catalytic activity with great potential as biocatalysts. In an effort to expand genetic sources of self-sufficient CYPs, we devised a sequence-based screening system to identify them in a soil metagenome. We constructed a soil metagenome library and performed sequence-based screening for self-sufficient CYP genes. A new CYP gene, syk181, was identified from the metagenome library. Phylogenetic analysis revealed that SYK181 formed a distinct phylogenic line with 46% amino-acid-sequence identity to CYP102A1 which has been extensively studied as a fatty acid hydroxylase. The heterologously expressed SYK181 showed significant hydroxylase activity towards naphthalene and phenanthrene as well as towards fatty acids. Sequence-based screening of metagenome libraries is expected to be a useful approach for searching self-sufficient CYP genes. The translated product of syk181 shows self-sufficient hydroxylase activity towards fatty acids and aromatic compounds. SYK181 is the first self-sufficient CYP obtained directly from a metagenome library. The genetic and biochemical information on SYK181 are expected to be helpful for engineering self-sufficient CYPs with broader catalytic activities towards various substrates, which would be useful for bioconversion of natural products and biodegradation of organic chemicals.

  1. RECOVIR Software for Identifying Viruses

    NASA Technical Reports Server (NTRS)

    Chakravarty, Sugoto; Fox, George E.; Zhu, Dianhui

    2013-01-01

    Most single-stranded RNA (ssRNA) viruses mutate rapidly to generate a large number of strains with highly divergent capsid sequences. Determining the capsid residues or nucleotides that uniquely characterize these strains is critical in understanding the strain diversity of these viruses. RECOVIR (an acronym for "recognize viruses") software predicts the strains of some ssRNA viruses from their limited sequence data. Novel phylogenetic-tree-based databases of protein or nucleic acid residues that uniquely characterize these virus strains are created. Strains of input virus sequences (partial or complete) are predicted through residue-wise comparisons with the databases. RECOVIR uses unique characterizing residues to identify automatically strains of partial or complete capsid sequences of picorna and caliciviruses, two of the most highly diverse ssRNA virus families. Partition-wise comparisons of the database residues with the corresponding residues of more than 300 complete and partial sequences of these viruses resulted in correct strain identification for all of these sequences. This study shows the feasibility of creating databases of hitherto unknown residues uniquely characterizing the capsid sequences of two of the most highly divergent ssRNA virus families. These databases enable automated strain identification from partial or complete capsid sequences of these human and animal pathogens.

  2. Whole-exome sequencing identifies novel MPL and JAK2 mutations in triple-negative myeloproliferative neoplasms

    PubMed Central

    Milosevic Feenstra, Jelena D.; Nivarthi, Harini; Gisslinger, Heinz; Leroy, Emilie; Rumi, Elisa; Chachoua, Ilyas; Bagienski, Klaudia; Kubesova, Blanka; Pietra, Daniela; Gisslinger, Bettina; Milanesi, Chiara; Jäger, Roland; Chen, Doris; Berg, Tiina; Schalling, Martin; Schuster, Michael; Bock, Christoph; Constantinescu, Stefan N.; Cazzola, Mario

    2016-01-01

    Essential thrombocythemia (ET) and primary myelofibrosis (PMF) are chronic diseases characterized by clonal hematopoiesis and hyperproliferation of terminally differentiated myeloid cells. The disease is driven by somatic mutations in exon 9 of CALR or exon 10 of MPL or JAK2-V617F in >90% of the cases, whereas the remaining cases are termed “triple negative.” We aimed to identify the disease-causing mutations in the triple-negative cases of ET and PMF by applying whole-exome sequencing (WES) on paired tumor and control samples from 8 patients. We found evidence of clonal hematopoiesis in 5 of 8 studied cases based on clonality analysis and presence of somatic genetic aberrations. WES identified somatic mutations in 3 of 8 cases. We did not detect any novel recurrent somatic mutations. In 3 patients with clonal hematopoiesis analyzed by WES, we identified a somatic MPL-S204P, a germline MPL-V285E mutation, and a germline JAK2-G571S variant. We performed Sanger sequencing of the entire coding region of MPL in 62, and of JAK2 in 49 additional triple-negative cases of ET or PMF. New somatic (T119I, S204F, E230G, Y591D) and 1 germline (R321W) MPL mutation were detected. All of the identified MPL mutations were gain-of-function when analyzed in functional assays. JAK2 variants were identified in 5 of 57 triple-negative cases analyzed by WES and Sanger sequencing combined. We could demonstrate that JAK2-V625F and JAK2-F556V are gain-of-function mutations. Our results suggest that triple-negative cases of ET and PMF do not represent a homogenous disease entity. Cases with polyclonal hematopoiesis might represent hereditary disorders. PMID:26423830

  3. Whole-exome sequencing identifies novel MPL and JAK2 mutations in triple-negative myeloproliferative neoplasms.

    PubMed

    Milosevic Feenstra, Jelena D; Nivarthi, Harini; Gisslinger, Heinz; Leroy, Emilie; Rumi, Elisa; Chachoua, Ilyas; Bagienski, Klaudia; Kubesova, Blanka; Pietra, Daniela; Gisslinger, Bettina; Milanesi, Chiara; Jäger, Roland; Chen, Doris; Berg, Tiina; Schalling, Martin; Schuster, Michael; Bock, Christoph; Constantinescu, Stefan N; Cazzola, Mario; Kralovics, Robert

    2016-01-21

    Essential thrombocythemia (ET) and primary myelofibrosis (PMF) are chronic diseases characterized by clonal hematopoiesis and hyperproliferation of terminally differentiated myeloid cells. The disease is driven by somatic mutations in exon 9 of CALR or exon 10 of MPL or JAK2-V617F in >90% of the cases, whereas the remaining cases are termed "triple negative." We aimed to identify the disease-causing mutations in the triple-negative cases of ET and PMF by applying whole-exome sequencing (WES) on paired tumor and control samples from 8 patients. We found evidence of clonal hematopoiesis in 5 of 8 studied cases based on clonality analysis and presence of somatic genetic aberrations. WES identified somatic mutations in 3 of 8 cases. We did not detect any novel recurrent somatic mutations. In 3 patients with clonal hematopoiesis analyzed by WES, we identified a somatic MPL-S204P, a germline MPL-V285E mutation, and a germline JAK2-G571S variant. We performed Sanger sequencing of the entire coding region of MPL in 62, and of JAK2 in 49 additional triple-negative cases of ET or PMF. New somatic (T119I, S204F, E230G, Y591D) and 1 germline (R321W) MPL mutation were detected. All of the identified MPL mutations were gain-of-function when analyzed in functional assays. JAK2 variants were identified in 5 of 57 triple-negative cases analyzed by WES and Sanger sequencing combined. We could demonstrate that JAK2-V625F and JAK2-F556V are gain-of-function mutations. Our results suggest that triple-negative cases of ET and PMF do not represent a homogenous disease entity. Cases with polyclonal hematopoiesis might represent hereditary disorders. © 2016 by The American Society of Hematology.

  4. Pool-based genome-wide association study identified novel candidate regions on BTA9 and 14 for oleic acid percentage in Japanese Black cattle.

    PubMed

    Kawaguchi, Fuki; Kigoshi, Hiroto; Nakajima, Ayaka; Matsumoto, Yuta; Uemoto, Yoshinobu; Fukushima, Moriyuki; Yoshida, Emi; Iwamoto, Eiji; Akiyama, Takayuki; Kohama, Namiko; Kobayashi, Eiji; Honda, Takeshi; Oyama, Kenji; Mannen, Hideyuki; Sasazaki, Shinji

    2018-05-17

    Fatty acid composition is an important indicator of beef quality. The objective of this study was to search the potential candidate region for fatty acid composition. We performed pool-based genome-wide association studies (GWAS) for oleic acid percentage (C18:1) in a Japanese Black cattle population from the Hyogo prefecture. GWAS analysis revealed two novel candidate regions on BTA9 and BTA14. The most significant single nucleotide polymorphisms (SNPs) in each region were genotyped in a population (n = 899) to verify their effect on C18:1. Statistical analysis revealed that both SNPs were significantly associated with C18:1 (p = .0080 and .0003), validating the quantitative trait loci (QTLs) detected in GWAS. We subsequently selected VNN1 and LYPLA1 genes as candidate genes from each region on BTA9 and BTA14, respectively. We sequenced full-length coding sequence (CDS) of these genes in eight individuals and identified a nonsynonymous SNP T66M on VNN1 gene as a putative candidate polymorphism. The polymorphism was also significantly associated with C18:1, but the p value (p = .0162) was higher than the most significant SNP on BTA9, suggesting that it would not be responsible for the QTL. Although further investigation will be needed to determine the responsible gene and polymorphism, our findings would contribute to development of selective markers for fatty acid composition in the Japanese Black cattle of Hyogo. © 2018 Japanese Society of Animal Science.

  5. Analysis of Draft Genome Sequence of Pseudomonas sp. QTF5 Reveals Its Benzoic Acid Degradation Ability and Heavy Metal Tolerance

    PubMed Central

    Li, Yang; Ren, Yi

    2017-01-01

    Pseudomonas sp. QTF5 was isolated from the continuous permafrost near the bitumen layers in the Qiangtang basin of Qinghai-Tibetan Plateau in China (5,111 m above sea level). It is psychrotolerant and highly and widely tolerant to heavy metals and has the ability to metabolize benzoic acid and salicylic acid. To gain insight into the genetic basis for its adaptation, we performed whole genome sequencing and analyzed the resistant genes and metabolic pathways. Based on 120 published and annotated genomes representing 31 species in the genus Pseudomonas, in silico genomic DNA-DNA hybridization (<54%) and average nucleotide identity calculation (<94%) revealed that QTF5 is closest to Pseudomonas lini and should be classified into a novel species. This study provides the genetic basis to identify the genes linked to its specific mechanisms for adaptation to extreme environment and application of this microorganism in environmental conservation. PMID:29270429

  6. Exome sequencing identifies a novel SMCHD1 mutation in facioscapulohumeral muscular dystrophy 2.

    PubMed

    Mitsuhashi, Satomi; Boyden, Steven E; Estrella, Elicia A; Jones, Takako I; Rahimov, Fedik; Yu, Timothy W; Darras, Basil T; Amato, Anthony A; Folkerth, Rebecca D; Jones, Peter L; Kunkel, Louis M; Kang, Peter B

    2013-12-01

    FSHD2 is a rare form of facioscapulohumeral muscular dystrophy (FSHD) characterized by the absence of a contraction in the D4Z4 macrosatellite repeat region on chromosome 4q35 that is the hallmark of FSHD1. However, hypomethylation of this region is common to both subtypes. Recently, mutations in SMCHD1 combined with a permissive 4q35 allele were reported to cause FSHD2. We identified a novel p.Lys275del SMCHD1 mutation in a family affected with FSHD2 using whole-exome sequencing and linkage analysis. This mutation alters a highly conserved amino acid in the ATPase domain of SMCHD1. Subject III-11 is a male who developed asymmetrical muscle weakness characteristic of FSHD at 13 years. Physical examination revealed marked bilateral atrophy at biceps brachii, bilateral scapular winging, some asymmetrical weakness at tibialis anterior and peroneal muscles, and mild lower facial weakness. Biopsy of biceps brachii in subject II-5, the father of III-11, demonstrated lobulated fibers and dystrophic changes. Endomysial and perivascular inflammation was found, which has been reported in FSHD1 but not FSHD2. Given the previous report of SMCHD1 mutations in FSHD2 and the clinical presentations consistent with the FSHD phenotype, we conclude that the SMCHD1 mutation is the likely cause of the disease in this family. Copyright © 2013 Elsevier B.V. All rights reserved.

  7. Anxiolytic-Like Actions of Fatty Acids Identified in Human Amniotic Fluid

    PubMed Central

    García-Ríos, Rosa Isela; Rodríguez-Landa, Juan Francisco; Contreras, Carlos M.

    2013-01-01

    Eight fatty acids (C12–C18) were previously identified in human amniotic fluid, colostrum, and milk in similar proportions but different amounts. Amniotic fluid is well known to be the natural environment for development in mammals. Interestingly, amniotic fluid and an artificial mixture of fatty acids contained in amniotic fluid produce similar anxiolytic-like actions in Wistar rats. We explored whether the lowest amount of fatty acids contained in amniotic fluid with respect to colostrum and milk produces such anxiolytic-like effects. Although a trend toward a dose-response effect was observed, only an amount of fatty acids that was similar to amniotic fluid fully mimicked the effect of diazepam (2 mg/kg, i.p.) in the defensive burying test, an action devoid of effects on locomotor activity and motor coordination. Our results confirm that the amount of fatty acids contained in amniotic fluid is sufficient to produce anxiolytic-like effects, suggesting similar actions during intrauterine development. PMID:23737729

  8. Whole Exome Sequencing Identifies de Novo Mutations in GATA6 Associated with Congenital Diaphragmatic Hernia

    PubMed Central

    Yu, Lan; Bennett, James T.; Wynn, Julia; Carvill, Gemma L.; Cheung, Yee Him; Shen, Yufeng; Mychaliska, George B.; Azarow, Kenneth S.; Crombleholme, Timothy M.; Chung, Dai H.; Potoka, Douglas; Warner, Brad W.; Bucher, Brian; Lim, Foong-Yen; Pietsch, John; Stolar, Charles; Aspelund, Gudrun; Arkovitz, Marc S.; Mefford, Heather; Chung, Wendy K.

    2014-01-01

    Background Congenital diaphragmatic hernia (CDH) is a common birth defect affecting 1 in 3,000 births. It is characterized by herniation of abdominal viscera through an incompletely formed diaphragm. Although chromosomal anomalies and mutations in several genes have been implicated, the cause for most patients is unknown. Methods We used whole exome sequencing in two families with CDH and congenital heart disease, and identified mutations in GATA6 in both. Results In the first family, we identified a de novo missense mutation (c.1366C>T, p.R456C) in a sporadic CDH patient with tetralogy of Fallot. In the second, a nonsense mutation (c.712G>T, p.G238*) was identified in two siblings with CDH and a large ventricular septal defect. The G238* mutation was inherited from their mother, who was clinically affected with congenital absence of the pericardium, patent ductus arteriosus, and intestinal malrotation. Deep sequencing of blood and saliva derived DNA from the mother suggested somatic mosaicism as an explanation for her milder phenotype, with only approximately 15% mutant alleles. To determine the frequency of GATA6 mutations in CDH, we sequenced the gene in 378 patients with CDH. We identified one additional de novo mutation (c.1071delG, p.V358Cfs34*). Conclusions Mutations in GATA6 have been previously associated with pancreatic agenesis and congenital heart disease. We conclude that, in addition to the heart and the pancreas, GATA6 is involved in development of two additional organs, the diaphragm and the pericardium. In addition we have shown that de novo mutations can contribute to the development of CDH, a common birth defect. PMID:24385578

  9. Construction Strategy for an Internal Amplification Control for Real-Time Diagnostic Assays Using Nucleic Acid Sequence-Based Amplification: Development and Clinical Application

    PubMed Central

    Rodríguez-Lázaro, David; D'Agostino, Martin; Pla, Maria; Cook, Nigel

    2004-01-01

    An important analytical control in molecular amplification-based methods is an internal amplification control (IAC), which should be included in each reaction mixture. An IAC is a nontarget nucleic acid sequence which is coamplified simultaneously with the target sequence. With negative results for the target nucleic acid, the absence of an IAC signal indicates that amplification has failed. A general strategy for the construction of an IAC for inclusion in molecular beacon-based real-time nucleic acid sequence-based amplification (NASBA) assays is presented. Construction proceeds in two phases. In the first phase, a double-stranded DNA molecule that contains nontarget sequences flanked by target sequences complementary to the NASBA primers is produced. At the 5′ end of this DNA molecule is a T7 RNA polymerase binding sequence. In the second phase of construction, RNA transcripts are produced from the DNA by T7 RNA polymerase. This RNA is the IAC; it is amplified by the target NASBA primers and is detected by a molecular beacon probe complementary to the internal nontarget sequences. As a practical example, an IAC for use in an assay for the detection of Mycobacterium avium subsp. paratuberculosis is described, its incorporation and optimization within the assay are detailed, and its application to spiked and natural clinical samples is shown to illustrate the correct interpretation of the diagnostic results. PMID:15583319

  10. Amino acid sequences of peptides from a chymotryptic digest of a urea-soluble protein fraction (U.S.3) from oxidized wool

    PubMed Central

    Corfield, M. C.; Fletcher, J. C.

    1969-01-01

    1. A chymotryptic digest of the protein fraction U.S.3. from oxidized wool was separated into 51 peptide fractions by chromatography on a column of cation-exchange resin. 2. The less acidic fractions were separated into their component peptides by a combination of cation-exchange-resin chromatography, paper chromatography and paper electrophoresis. 3. The amino acid sequences of 34 of these peptides were elucidated, and those of 14 others partially determined. 4. Overlaps between the tryptic and chymotryptic peptides from fraction U.S.3 have enabled ten extended amino acid sequences to be deduced, the longest containing 20 amino acid residues. 5. The relevance of the results to the structures of the helical and non-helical regions of wool is discussed. PMID:5395876

  11. VWF mutations and new sequence variations identified in healthy controls are more frequent in the African-American population.

    PubMed

    Bellissimo, Daniel B; Christopherson, Pamela A; Flood, Veronica H; Gill, Joan Cox; Friedman, Kenneth D; Haberichter, Sandra L; Shapiro, Amy D; Abshire, Thomas C; Leissinger, Cindy; Hoots, W Keith; Lusher, Jeanne M; Ragni, Margaret V; Montgomery, Robert R

    2012-03-01

    Diagnosis and classification of VWD is aided by molecular analysis of the VWF gene. Because VWF polymorphisms have not been fully characterized, we performed VWF laboratory testing and gene sequencing of 184 healthy controls with a negative bleeding history. The controls included 66 (35.9%) African Americans (AAs). We identified 21 new sequence variations, 13 (62%) of which occurred exclusively in AAs and 2 (G967D, T2666M) that were found in 10%-15% of the AA samples, suggesting they are polymorphisms. We identified 14 sequence variations reported previously as VWF mutations, the majority of which were type 1 mutations. These controls had VWF Ag levels within the normal range, suggesting that these sequence variations might not always reduce plasma VWF levels. Eleven mutations were found in AAs, and the frequency of M740I, H817Q, and R2185Q was 15%-18%. Ten AA controls had the 2N mutation H817Q; 1 was homozygous. The average factor VIII level in this group was 99 IU/dL, suggesting that this variation may confer little or no clinical symptoms. This study emphasizes the importance of sequencing healthy controls to understand ethnic-specific sequence variations so that asymptomatic sequence variations are not misidentified as mutations in other ethnic or racial groups.

  12. Novel compound heterozygous mutations in the OTOF Gene identified by whole-exome sequencing in auditory neuropathy spectrum disorder.

    PubMed

    Tang, Fengzhu; Ma, Dengke; Wang, Yulan; Qiu, Yuecai; Liu, Fei; Wang, Qingqing; Lu, Qiutian; Shi, Min; Xu, Liang; Liu, Min; Liang, Jianping

    2017-03-23

    Many hearing-loss diseases are demonstrated to have Mendelian inheritance caused by mutations in single gene. However, many deaf individuals have diseases that remain genetically unexplained. Auditory neuropathy is a sensorineural deafness in which sounds are able to be transferred into the inner ear normally but the transmission of the signals from inner ear to auditory nerve and brain is injured, also known as auditory neuropathy spectrum disorder (ANSD). The pathogenic mutations of the genes responsible for the Chinese ANSD population remain poorly understood. A total of 127 patients with non-syndromic hearing loss (NSHL) were enrolled in Guangxi Zhuang Autonomous Region. A hereditary deafness gene mutation screening was performed to identify the mutation sites in four deafness-related genes (GJB2, GJB3, 12S rRNA, and SLC26A4). In addition, whole-exome sequencing (WES) was applied to explore unappreciated mutation sites in the cases with the singularity of its phenotype. Well-characterized mutations were found in only 8.7% (11/127) of the patients. Interestingly, two mutations in the OTOF gene were identified in two affected siblings with ANSD from a Chinese family, including one nonsense mutation c.1273C > T (p.R425X) and one missense mutation c.4994 T > C (p.L1665P). Furthermore, we employed Sanger sequencing to confirm the mutations in each subject. Two compound heterozygous mutations in the OTOF gene were observed in the two affected siblings, whereas the two parents and unaffected sister were heterozygous carriers of c.1273C > T (father and sister) and c.4994 T > C (mother). The nonsense mutation p.R425X, contributes to a premature stop codon, may result in a truncated polypeptide, which strongly suggests its pathogenicity for ANSD. The missense mutation p.L1665P results in a single amino acid substitution in a highly conserved region. Two mutations in the OTOF gene in the Chinese deaf population were recognized for the first time. These

  13. Rapid Polymer Sequencer

    NASA Technical Reports Server (NTRS)

    Stolc, Viktor (Inventor); Brock, Matthew W (Inventor)

    2013-01-01

    Method and system for rapid and accurate determination of each of a sequence of unknown polymer components, such as nucleic acid components. A self-assembling monolayer of a selected substance is optionally provided on an interior surface of a pipette tip, and the interior surface is immersed in a selected liquid. A selected electrical field is impressed in a longitudinal direction, or in a transverse direction, in the tip region, a polymer sequence is passed through the tip region, and a change in an electrical current signal is measured as each polymer component passes through the tip region. Each of the measured changes in electrical current signals is compared with a database of reference electrical change signals, with each reference signal corresponding to an identified polymer component, to identify the unknown polymer component with a reference polymer component. The nanopore preferably has a pore inner diameter of no more than about 40 nm and is prepared by heating and pulling a very small section of a glass tubing.

  14. Sequence Alignment to Predict Across Species Susceptibility ...

    EPA Pesticide Factsheets

    Conservation of a molecular target across species can be used as a line-of-evidence to predict the likelihood of chemical susceptibility. The web-based Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool was developed to simplify, streamline, and quantitatively assess protein sequence/structural similarity across taxonomic groups as a means to predict relative intrinsic susceptibility. The intent of the tool is to allow for evaluation of any potential protein target, so it is amenable to variable degrees of protein characterization, depending on available information about the chemical/protein interaction and the molecular target itself. To allow for flexibility in the analysis, a layered strategy was adopted for the tool. The first level of the SeqAPASS analysis compares primary amino acid sequences to a query sequence, calculating a metric for sequence similarity (including detection of candidate orthologs), the second level evaluates sequence similarity within selected domains (e.g., ligand-binding domain, DNA binding domain), and the third level of analysis compares individual amino acid residue positions identified as being of importance for protein conformation and/or ligand binding upon chemical perturbation. Each level of the SeqAPASS analysis provides increasing evidence to apply toward rapid, screening-level assessments of probable cross species susceptibility. Such analyses can support prioritization of chemicals for further ev

  15. Whole exome sequencing identifies novel candidate genes that modify chronic obstructive pulmonary disease susceptibility.

    PubMed

    Bruse, Shannon; Moreau, Michael; Bromberg, Yana; Jang, Jun-Ho; Wang, Nan; Ha, Hongseok; Picchi, Maria; Lin, Yong; Langley, Raymond J; Qualls, Clifford; Klensney-Tait, Julia; Zabner, Joseph; Leng, Shuguang; Mao, Jenny; Belinsky, Steven A; Xing, Jinchuan; Nyunoya, Toru

    2016-01-07

    Chronic obstructive pulmonary disease (COPD) is characterized by an irreversible airflow limitation in response to inhalation of noxious stimuli, such as cigarette smoke. However, only 15-20 % smokers manifest COPD, suggesting a role for genetic predisposition. Although genome-wide association studies have identified common genetic variants that are associated with susceptibility to COPD, effect sizes of the identified variants are modest, as is the total heritability accounted for by these variants. In this study, an extreme phenotype exome sequencing study was combined with in vitro modeling to identify COPD candidate genes. We performed whole exome sequencing of 62 highly susceptible smokers and 30 exceptionally resistant smokers to identify rare variants that may contribute to disease risk or resistance to COPD. This was a cross-sectional case-control study without therapeutic intervention or longitudinal follow-up information. We identified candidate genes based on rare variant analyses and evaluated exonic variants to pinpoint individual genes whose function was computationally established to be significantly different between susceptible and resistant smokers. Top scoring candidate genes from these analyses were further filtered by requiring that each gene be expressed in human bronchial epithelial cells (HBECs). A total of 81 candidate genes were thus selected for in vitro functional testing in cigarette smoke extract (CSE)-exposed HBECs. Using small interfering RNA (siRNA)-mediated gene silencing experiments, we showed that silencing of several candidate genes augmented CSE-induced cytotoxicity in vitro. Our integrative analysis through both genetic and functional approaches identified two candidate genes (TACC2 and MYO1E) that augment cigarette smoke (CS)-induced cytotoxicity and, potentially, COPD susceptibility.

  16. Complete genome sequence of a divergent strain of Japanese yam mosaic virus from China

    USDA-ARS?s Scientific Manuscript database

    A novel strain of Japanese yam mosaic virus (JYMV-CN) was identified in a yam plant with foliar mottle symptoms in China. The complete genomic sequence of JYMV-CN was determined. Its genomic sequence of 9701 nucleotides encodes a polyprotein of 3247 amino acids. Its organization was virtually identi...

  17. Whole exome sequencing identifies genetic variants in inherited thrombocytopenia with secondary qualitative function defects

    PubMed Central

    Johnson, Ben; Lowe, Gillian C.; Futterer, Jane; Lordkipanidzé, Marie; MacDonald, David; Simpson, Michael A.; Sanchez-Guiú, Isabel; Drake, Sian; Bem, Danai; Leo, Vincenzo; Fletcher, Sarah J.; Dawood, Ban; Rivera, José; Allsup, David; Biss, Tina; Bolton-Maggs, Paula HB; Collins, Peter; Curry, Nicola; Grimley, Charlotte; James, Beki; Makris, Mike; Motwani, Jayashree; Pavord, Sue; Talks, Katherine; Thachil, Jecko; Wilde, Jonathan; Williams, Mike; Harrison, Paul; Gissen, Paul; Mundell, Stuart; Mumford, Andrew; Daly, Martina E.; Watson, Steve P.; Morgan, Neil V.

    2016-01-01

    Inherited thrombocytopenias are a heterogeneous group of disorders characterized by abnormally low platelet counts which can be associated with abnormal bleeding. Next-generation sequencing has previously been employed in these disorders for the confirmation of suspected genetic abnormalities, and more recently in the discovery of novel disease-causing genes. However its full potential has not yet been exploited. Over the past 6 years we have sequenced the exomes from 55 patients, including 37 index cases and 18 additional family members, all of whom were recruited to the UK Genotyping and Phenotyping of Platelets study. All patients had inherited or sustained thrombocytopenia of unknown etiology with platelet counts varying from 11×109/L to 186×109/L. Of the 51 patients phenotypically tested, 37 (73%), had an additional secondary qualitative platelet defect. Using whole exome sequencing analysis we have identified “pathogenic” or “likely pathogenic” variants in 46% (17/37) of our index patients with thrombocytopenia. In addition, we report variants of uncertain significance in 12 index cases, including novel candidate genetic variants in previously unreported genes in four index cases. These results demonstrate that whole exome sequencing is an efficient method for elucidating potential pathogenic genetic variants in inherited thrombocytopenia. Whole exome sequencing also has the added benefit of discovering potentially pathogenic genetic variants for further study in novel genes not previously implicated in inherited thrombocytopenia. PMID:27479822

  18. Prediction of beta-turns from amino acid sequences using the residue-coupled model.

    PubMed

    Guruprasad, K; Shukla, S

    2003-04-01

    We evaluated the prediction of beta-turns from amino acid sequences using the residue-coupled model with an enlarged representative protein data set selected from the Protein Data Bank. Our results show that the probability values derived from a data set comprising 425 protein chains yielded an overall beta-turn prediction accuracy 68.74%, compared with 94.7% reported earlier on a data set of 30 proteins using the same method. However, we noted that the overall beta-turn prediction accuracy using probability values derived from the 30-protein data set reduces to 40.74% when tested on the data set comprising 425 protein chains. In contrast, using probability values derived from the 425 data set used in this analysis, the overall beta-turn prediction accuracy yielded consistent results when tested on either the 30-protein data set (64.62%) used earlier or a more recent representative data set comprising 619 protein chains (64.66%) or on a jackknife data set comprising 476 representative protein chains (63.38%). We therefore recommend the use of probability values derived from the 425 representative protein chains data set reported here, which gives more realistic and consistent predictions of beta-turns from amino acid sequences.

  19. The shikimate pathway: review of amino acid sequence, function and three-dimensional structures of the enzymes.

    PubMed

    Mir, Rafia; Jallu, Shais; Singh, T P

    2015-06-01

    The aromatic compounds such as aromatic amino acids, vitamin K and ubiquinone are important prerequisites for the metabolism of an organism. All organisms can synthesize these aromatic metabolites through shikimate pathway, except for mammals which are dependent on their diet for these compounds. The pathway converts phosphoenolpyruvate and erythrose 4-phosphate to chorismate through seven enzymatically catalyzed steps and chorismate serves as a precursor for the synthesis of variety of aromatic compounds. These enzymes have shown to play a vital role for the viability of microorganisms and thus are suggested to present attractive molecular targets for the design of novel antimicrobial drugs. This review focuses on the seven enzymes of the shikimate pathway, highlighting their primary sequences, functions and three-dimensional structures. The understanding of their active site amino acid maps, functions and three-dimensional structures will provide a framework on which the rational design of antimicrobial drugs would be based. Comparing the full length amino acid sequences and the X-ray crystal structures of these enzymes from bacteria, fungi and plant sources would contribute in designing a specific drug and/or in developing broad-spectrum compounds with efficacy against a variety of pathogens.

  20. Exome sequencing identifies a DNAJB6 mutation in a family with dominantly-inherited limb-girdle muscular dystrophy.

    PubMed

    Couthouis, Julien; Raphael, Alya R; Siskind, Carly; Findlay, Andrew R; Buenrostro, Jason D; Greenleaf, William J; Vogel, Hannes; Day, John W; Flanigan, Kevin M; Gitler, Aaron D

    2014-05-01

    Limb-girdle muscular dystrophy primarily affects the muscles of the hips and shoulders (the "limb-girdle" muscles), although it is a heterogeneous disorder that can present with varying symptoms. There is currently no cure. We sought to identify the genetic basis of limb-girdle muscular dystrophy type 1 in an American family of Northern European descent using exome sequencing. Exome sequencing was performed on DNA samples from two affected siblings and one unaffected sibling and resulted in the identification of eleven candidate mutations that co-segregated with the disease. Notably, this list included a previously reported mutation in DNAJB6, p.Phe89Ile, which was recently identified as a cause of limb-girdle muscular dystrophy type 1D. Additional family members were Sanger sequenced and the mutation in DNAJB6 was only found in affected individuals. Subsequent haplotype analysis indicated that this DNAJB6 p.Phe89Ile mutation likely arose independently of the previously reported mutation. Since other published mutations are located close by in the G/F domain of DNAJB6, this suggests that the area may represent a mutational hotspot. Exome sequencing provided an unbiased and effective method for identifying the genetic etiology of limb-girdle muscular dystrophy type 1 in a previously genetically uncharacterized family. This work further confirms the causative role of DNAJB6 mutations in limb-girdle muscular dystrophy type 1D. Copyright © 2014 Elsevier B.V. All rights reserved.

  1. Ultra high-throughput nucleic acid sequencing as a tool for virus discovery in the turkey gut.

    USDA-ARS?s Scientific Manuscript database

    Recently, the use of the next generation of nucleic acid sequencing technology (i.e., 454 pyrosequencing, as developed by Roche/454 Life Sciences) has allowed an in-depth look at the uncultivated microorganisms present in complex environmental samples, including samples with agricultural importance....

  2. Use of a Drosophila Genome-Wide Conserved Sequence Database to Identify Functionally Related cis-Regulatory Enhancers

    PubMed Central

    Brody, Thomas; Yavatkar, Amarendra S; Kuzin, Alexander; Kundu, Mukta; Tyson, Leonard J; Ross, Jermaine; Lin, Tzu-Yang; Lee, Chi-Hon; Awasaki, Takeshi; Lee, Tzumin; Odenwald, Ward F

    2012-01-01

    Background: Phylogenetic footprinting has revealed that cis-regulatory enhancers consist of conserved DNA sequence clusters (CSCs). Currently, there is no systematic approach for enhancer discovery and analysis that takes full-advantage of the sequence information within enhancer CSCs. Results: We have generated a Drosophila genome-wide database of conserved DNA consisting of >100,000 CSCs derived from EvoPrints spanning over 90% of the genome. cis-Decoder database search and alignment algorithms enable the discovery of functionally related enhancers. The program first identifies conserved repeat elements within an input enhancer and then searches the database for CSCs that score highly against the input CSC. Scoring is based on shared repeats as well as uniquely shared matches, and includes measures of the balance of shared elements, a diagnostic that has proven to be useful in predicting cis-regulatory function. To demonstrate the utility of these tools, a temporally-restricted CNS neuroblast enhancer was used to identify other functionally related enhancers and analyze their structural organization. Conclusions: cis-Decoder reveals that co-regulating enhancers consist of combinations of overlapping shared sequence elements, providing insights into the mode of integration of multiple regulating transcription factors. The database and accompanying algorithms should prove useful in the discovery and analysis of enhancers involved in any developmental process. Developmental Dynamics 241:169–189, 2012. © 2011 Wiley Periodicals, Inc. Key findings A genome-wide catalog of Drosophila conserved DNA sequence clusters. cis-Decoder discovers functionally related enhancers. Functionally related enhancers share balanced sequence element copy numbers. Many enhancers function during multiple phases of development. PMID:22174086

  3. Fatty acids identified in the Burmese python promote beneficial cardiac growth.

    PubMed

    Riquelme, Cecilia A; Magida, Jason A; Harrison, Brooke C; Wall, Christopher E; Marr, Thomas G; Secor, Stephen M; Leinwand, Leslie A

    2011-10-28

    Burmese pythons display a marked increase in heart mass after a large meal. We investigated the molecular mechanisms of this physiological heart growth with the goal of applying this knowledge to the mammalian heart. We found that heart growth in pythons is characterized by myocyte hypertrophy in the absence of cell proliferation and by activation of physiological signal transduction pathways. Despite high levels of circulating lipids, the postprandial python heart does not accumulate triglycerides or fatty acids. Instead, there is robust activation of pathways of fatty acid transport and oxidation combined with increased expression and activity of superoxide dismutase, a cardioprotective enzyme. We also identified a combination of fatty acids in python plasma that promotes physiological heart growth when injected into either pythons or mice.

  4. Expressed sequences tags of the anther smut fungus, Microbotryum violaceum, identify mating and pathogenicity genes

    PubMed Central

    Yockteng, Roxana; Marthey, Sylvain; Chiapello, Hélène; Gendrault, Annie; Hood, Michael E; Rodolphe, François; Devier, Benjamin; Wincker, Patrick; Dossat, Carole; Giraud, Tatiana

    2007-01-01

    Background The basidiomycete fungus Microbotryum violaceum is responsible for the anther-smut disease in many plants of the Caryophyllaceae family and is a model in genetics and evolutionary biology. Infection is initiated by dikaryotic hyphae produced after the conjugation of two haploid sporidia of opposite mating type. This study describes M. violaceum ESTs corresponding to nuclear genes expressed during conjugation and early hyphal production. Results A normalized cDNA library generated 24,128 sequences, which were assembled into 7,765 unique genes; 25.2% of them displayed significant similarity to annotated proteins from other organisms, 74.3% a weak similarity to the same set of known proteins, and 0.5% were orphans. We identified putative pheromone receptors and genes that in other fungi are involved in the mating process. We also identified many sequences similar to genes known to be involved in pathogenicity in other fungi. The M. violaceum EST database, MICROBASE, is available on the Web and provides access to the sequences, assembled contigs, annotations and programs to compare similarities against MICROBASE. Conclusion This study provides a basis for cloning the mating type locus, for further investigation of pathogenicity genes in the anther smut fungi, and for comparative genomics. PMID:17692127

  5. Whole Exome Sequencing Identifies Rare Protein-Coding Variants in Behçet's Disease.

    PubMed

    Ognenovski, Mikhail; Renauer, Paul; Gensterblum, Elizabeth; Kötter, Ina; Xenitidis, Theodoros; Henes, Jörg C; Casali, Bruno; Salvarani, Carlo; Direskeneli, Haner; Kaufman, Kenneth M; Sawalha, Amr H

    2016-05-01

    Behçet's disease (BD) is a systemic inflammatory disease with an incompletely understood etiology. Despite the identification of multiple common genetic variants associated with BD, rare genetic variants have been less explored. We undertook this study to investigate the role of rare variants in BD by performing whole exome sequencing in BD patients of European descent. Whole exome sequencing was performed in a discovery set comprising 14 German BD patients of European descent. For replication and validation, Sanger sequencing and Sequenom genotyping were performed in the discovery set and in 2 additional independent sets of 49 German BD patients and 129 Italian BD patients of European descent. Genetic association analysis was then performed in BD patients and 503 controls of European descent. Functional effects of associated genetic variants were assessed using bioinformatic approaches. Using whole exome sequencing, we identified 77 rare variants (in 74 genes) with predicted protein-damaging effects in BD. These variants were genotyped in 2 additional patient sets and then analyzed to reveal significant associations with BD at 2 genetic variants detected in all 3 patient sets that remained significant after Bonferroni correction. We detected genetic association between BD and LIMK2 (rs149034313), involved in regulating cytoskeletal reorganization, and between BD and NEIL1 (rs5745908), involved in base excision DNA repair (P = 3.22 × 10(-4) and P = 5.16 × 10(-4) , respectively). The LIMK2 association is a missense variant with predicted protein damage that may influence functional interactions with proteins involved in cytoskeletal regulation by Rho GTPase, inflammation mediated by chemokine and cytokine signaling pathways, T cell activation, and angiogenesis (Bonferroni-corrected P = 5.63 × 10(-14) , P = 7.29 × 10(-6) , P = 1.15 × 10(-5) , and P = 6.40 × 10(-3) , respectively). The genetic association in NEIL1 is a predicted splice

  6. Draft genome sequence of Cryptococcus terricola JCM 24523, an oleaginous yeast capable of expressing exogenous DNA

    DOE PAGES

    Close, Dan; Ojumu, John O.; Zhang, Gui X.

    2016-11-03

    Cryptococcus terricola JCM 24523 has recently been identified as an oleaginous yeast capable of converting starch into fatty acids. Here, this draft genome sequence provides a platform for elucidating its fatty acid production potential and supporting comparisons with other oleaginous species.

  7. Draft genome sequence of Cryptococcus terricola JCM 24523, an oleaginous yeast capable of expressing exogenous DNA

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Close, Dan; Ojumu, John O.; Zhang, Gui X.

    Cryptococcus terricola JCM 24523 has recently been identified as an oleaginous yeast capable of converting starch into fatty acids. Here, this draft genome sequence provides a platform for elucidating its fatty acid production potential and supporting comparisons with other oleaginous species.

  8. Fast computational methods for predicting protein structure from primary amino acid sequence

    DOEpatents

    Agarwal, Pratul Kumar [Knoxville, TN

    2011-07-19

    The present invention provides a method utilizing primary amino acid sequence of a protein, energy minimization, molecular dynamics and protein vibrational modes to predict three-dimensional structure of a protein. The present invention also determines possible intermediates in the protein folding pathway. The present invention has important applications to the design of novel drugs as well as protein engineering. The present invention predicts the three-dimensional structure of a protein independent of size of the protein, overcoming a significant limitation in the prior art.

  9. Silent genetic alterations identified by targeted next-generation sequencing in pheochromocytoma/paraganglioma: A clinicopathological correlations.

    PubMed

    Pillai, Suja; Gopalan, Vinod; Lo, Chung Y; Liew, Victor; Smith, Robert A; Lam, Alfred King Y

    2017-02-01

    The goal of this pilot study was to develop a customized, cost-effective amplicon panel (Ampliseq) for target sequencing in a cohort of patients with sporadic phaeochromocytoma/paraganglioma. Phaeochromocytoma/paragangliomas from 25 patients were analysed by targeted next-generation sequencing approach using an Ion Torrent PGM instrument. Primers for 15 target genes (NF1, RET, VHL, SDHA, SDHB, SDHC, SDHD, SDHAF2, TMEM127, MAX, MEN1, KIF1Bβ, EPAS1, CDKN2 & PHD2) were designed using ion ampliseq designer. Ion Reporter software and Ingenuity® Variant Analysis™ software (www.ingenuity.com/variants) from Ingenuity Systems were used to analysis these results. Overall, 713 variants were identified. The variants identified from the Ion Reporter ranged from 64 to 161 per patient. Single nucleotide variants (SNV) were the most common. Further annotation with the help of Ingenuity variant analysis revealed 29 of these 713variants were deletions. Of these, six variants were non-pathogenic and four were likely to be pathogenic. The remaining 19 variants were of uncertain significance. The most frequently altered gene in the cohort was KIF1B followed by NF1. Novel KIF1B pathogenic variant c.3375+1G>A was identified. The mutation was noted in a patient with clinically confirmed neurofibromatosis. Chromosome 1 showed the presence of maximum number of variants. Use of targeted next-generation sequencing is a sensitive method for the detecting genetic changes in patients with phaeochromocytoma/paraganglioma. The precise detection of these genetic changes helps in understanding the pathogenesis of these tumours. Copyright © 2016 Elsevier Inc. All rights reserved.

  10. Exome sequencing identifies MAX mutations as a cause of hereditary pheochromocytoma.

    PubMed

    Comino-Méndez, Iñaki; Gracia-Aznárez, Francisco J; Schiavi, Francesca; Landa, Iñigo; Leandro-García, Luis J; Letón, Rocío; Honrado, Emiliano; Ramos-Medina, Rocío; Caronia, Daniela; Pita, Guillermo; Gómez-Graña, Alvaro; de Cubas, Aguirre A; Inglada-Pérez, Lucía; Maliszewska, Agnieszka; Taschin, Elisa; Bobisse, Sara; Pica, Giuseppe; Loli, Paola; Hernández-Lavado, Rafael; Díaz, José A; Gómez-Morales, Mercedes; González-Neira, Anna; Roncador, Giovanna; Rodríguez-Antona, Cristina; Benítez, Javier; Mannelli, Massimo; Opocher, Giuseppe; Robledo, Mercedes; Cascón, Alberto

    2011-06-19

    Hereditary pheochromocytoma (PCC) is often caused by germline mutations in one of nine susceptibility genes described to date, but there are familial cases without mutations in these known genes. We sequenced the exomes of three unrelated individuals with hereditary PCC (cases) and identified mutations in MAX, the MYC associated factor X gene. Absence of MAX protein in the tumors and loss of heterozygosity caused by uniparental disomy supported the involvement of MAX alterations in the disease. A follow-up study of a selected series of 59 cases with PCC identified five additional MAX mutations and suggested an association with malignant outcome and preferential paternal transmission of MAX mutations. The involvement of the MYC-MAX-MXD1 network in the development and progression of neural crest cell tumors is further supported by the lack of functional MAX in rat PCC (PC12) cells and by the amplification of MYCN in neuroblastoma and suggests that loss of MAX function is correlated with metastatic potential.

  11. Sparse whole genome sequencing identifies two loci for major depressive disorder

    PubMed Central

    2015-01-01

    Major depressive disorder (MDD), one of the most frequently encountered forms of mental illness and a leading cause of disability worldwide1, poses a major challenge to genetic analysis. To date no robustly replicated genetic loci have been identified 2, despite analysis of more than 9,000 cases3. Using low coverage genome sequence of 5,303 Chinese women with recurrent MDD selected to reduce phenotypic heterogeneity, and 5,337 controls screened to exclude MDD, we identified and replicated two genome-wide significant loci contributing to risk of MDD on chromosome 10: one near the SIRT1 gene (P-value = 2.53×10−10) the other in an intron of the LHPP gene (P = 6.45×10−12). Analysis of 4,509 cases with a severe subtype of MDD, melancholia, yielded an increased genetic signal at the SIRT1 locus. We attribute our success to the recruitment of relatively homogeneous cases with severe illness. PMID:26176920

  12. Complete amino acid sequence of the myoglobin from the Pacific sei whale, Balaenoptera borealis.

    PubMed

    Jones, B N; Rothgeb, T M; England, R D; Gurd, F R

    1979-04-25

    The complete amino acid sequence of the major component myoglobin from Pacific sei whale, Balaenoptera borealis, was determined by specific cleavage of the protein to obtain large peptides which are readily degraded by the automatic sequencer. The acetimidated apomyoglobin was selectively cleaved at its two methionyl residues with cyanogen bromide and at its three arginyl residues by trypsin. From the sequence analysis of four of these peptides and the apomyoglobin, over 75% of the covalent structure of the protein was obtained. The remainder of the primary structure was determined by the sequence analysis of peptides that resulted from further digestion of the amino-terminal and central cyanogen bromide fragments. The amino-terminal fragment was specifically cleaved at its two tryptophanyl residues with N-chlorosuccinimide and the central cyanogen bromide fragment was cleaved at its glutamyl residues with staphylococcal protease and at its single tyrosyl residue with N-bromosuccinimide. The primary structure of this myoglobin proved identical with that from the gray whale but differs from that of the finback whale at four positions, from that of the minke whale at three positions and from the myoglobin of the humpback whale at one position. The above sequence identities and differences reflect the close taxonomic relationship of these five species of Cetacea.

  13. MetaPhinder-Identifying Bacteriophage Sequences in Metagenomic Data Sets.

    PubMed

    Jurtz, Vanessa Isabell; Villarroel, Julia; Lund, Ole; Voldby Larsen, Mette; Nielsen, Morten

    Bacteriophages are the most abundant biological entity on the planet, but at the same time do not account for much of the genetic material isolated from most environments due to their small genome sizes. They also show great genetic diversity and mosaic genomes making it challenging to analyze and understand them. Here we present MetaPhinder, a method to identify assembled genomic fragments (i.e.contigs) of phage origin in metagenomic data sets. The method is based on a comparison to a database of whole genome bacteriophage sequences, integrating hits to multiple genomes to accomodate for the mosaic genome structure of many bacteriophages. The method is demonstrated to out-perform both BLAST methods based on single hits and methods based on k-mer comparisons. MetaPhinder is available as a web service at the Center for Genomic Epidemiology https://cge.cbs.dtu.dk/services/MetaPhinder/, while the source code can be downloaded from https://bitbucket.org/genomicepidemiology/metaphinder or https://github.com/vanessajurtz/MetaPhinder.

  14. Investigation of Intercellular Salicylic Acid Accumulation during Compatible and Incompatible Arabidopsis-Pseudomonas syringae Interactions Using a Fast Neutron-Generated Mutant Allele of EDS5 Identified by Genetic Mapping and Whole-Genome Sequencing

    PubMed Central

    Catana, Vasile; Golding, Brian; Weretilnyk, Elizabeth A.; Cameron, Robin K.

    2014-01-01

    A whole-genome sequencing technique developed to identify fast neutron-induced deletion mutations revealed that iap1-1 is a new allele of EDS5 (eds5-5). RPS2-AvrRpt2-initiated effector-triggered immunity (ETI) was compromised in iap1-1/eds5-5 with respect to in planta bacterial levels and the hypersensitive response, while intra- and intercellular free salicylic acid (SA) accumulation was greatly reduced, suggesting that SA contributes as both an intracellular signaling molecule and an antimicrobial agent in the intercellular space during ETI. During the compatible interaction between wild-type Col-0 and virulent Pseudomonas syringae pv. tomato (Pst), little intercellular free SA accumulated, which led to the hypothesis that Pst suppresses intercellular SA accumulation. When Col-0 was inoculated with a coronatine-deficient strain of Pst, high levels of intercellular SA accumulation were observed, suggesting that Pst suppresses intercellular SA accumulation using its phytotoxin coronatine. This work suggests that accumulation of SA in the intercellular space is an important component of basal/PAMP-triggered immunity as well as ETI to pathogens that colonize the intercellular space. PMID:24594657

  15. Sequencing of T-superfamily conotoxins from Conus virgo: pyroglutamic acid identification and disulfide arrangement by MALDI mass spectrometry.

    PubMed

    Mandal, Amit Kumar; Ramasamy, Mani Ramakrishnan Santhana; Sabareesh, Varatharajan; Openshaw, Matthew E; Krishnan, Kozhalmannom S; Balaram, Padmanabhan

    2007-08-01

    De novo mass spectrometric sequencing of two Conus peptides, Vi1359 and Vi1361, from the vermivorous cone snail Conus virgo, found off the southern Indian coast, is presented. The peptides, whose masses differ only by 2 Da, possess two disulfide bonds and an amidated C-terminus. Simple chemical modifications and enzymatic cleavage coupled with matrix assisted laser desorption ionization (MALDI) mass spectrometric analysis aided in establishing the sequences of Vi1359, ZCCITIPECCRI-NH(2), and Vi1361, ZCCPTMPECCRI-NH(2), which differ only at residues 4 and 6 (Z = pyroglutamic acid). The presence of the pyroglutamyl residue at the N-terminus was unambiguously identified by chemical hydrolysis of the cyclic amide, followed by esterification. The presence of Ile residues in both the peptides was confirmed from high-energy collision induced dissociation (CID) studies, using the observation of w(n)- and d(n)-ions as a diagnostic. Differential cysteine labeling, in conjunction with MALDI-MS/MS, permitted establishment of disulfide connectivity in both peptides as Cys2-Cys9 and Cys3-Cys10. The cysteine pattern clearly reveals that the peptides belong to the class of T-superfamily conotoxins, in particular the T-1 superfamily.

  16. The Biomolecule Sequencer Project: Nanopore Sequencing as a Dual-Use Tool for Crew Health and Astrobiology Investigations

    NASA Technical Reports Server (NTRS)

    John, K. K.; Botkin, D. S.; Burton, A. S.; Castro-Wallace, S. L.; Chaput, J. D.; Dworkin, J. P.; Lehman, N.; Lupisella, M. L.; Mason, C. E.; Smith, D. J.; hide

    2016-01-01

    Human missions to Mars will fundamentally transform how the planet is explored, enabling new scientific discoveries through more sophisticated sample acquisition and processing than can currently be implemented in robotic exploration. The presence of humans also poses new challenges, including ensuring astronaut safety and health and monitoring contamination. Because the capability to transfer materials to Earth will be extremely limited, there is a strong need for in situ diagnostic capabilities. Nucleotide sequencing is a particularly powerful tool because it can be used to: (1) mitigate microbial risks to crew by allowing identification of microbes in water, in air, and on surfaces; (2) identify optimal treatment strategies for infections that arise in crew members; and (3) track how crew members, microbes, and mission-relevant organisms (e.g., farmed plants) respond to conditions on Mars through transcriptomic and genomic changes. Sequencing would also offer benefits for science investigations occurring on the surface of Mars by permitting identification of Earth-derived contamination in samples. If Mars contains indigenous life, and that life is based on nucleic acids or other closely related molecules, sequencing would serve as a critical tool for the characterization of those molecules. Therefore, spaceflight-compatible nucleic acid sequencing would be an important capability for both crew health and astrobiology exploration. Advances in sequencing technology on Earth have been driven largely by needs for higher throughput and read accuracy. Although some reduction in size has been achieved, nearly all commercially available sequencers are not compatible with spaceflight due to size, power, and operational requirements. Exceptions are nanopore-based sequencers that measure changes in current caused by DNA passing through pores; these devices are inherently much smaller and require significantly less power than sequencers using other detection methods

  17. Differences in acid tolerance between Bifidobacterium breve BB8 and its acid-resistant derivative B. breve BB8dpH, revealed by RNA-sequencing and physiological analysis.

    PubMed

    Yang, Xu; Hang, Xiaomin; Tan, Jing; Yang, Hong

    2015-06-01

    Bifidobacteria are common inhabitants of the human gastrointestinal tract, and their application has increased dramatically in recent years due to their health-promoting effects. The ability of bifidobacteria to tolerate acidic environments is particularly important for their function as probiotics because they encounter such environments in food products and during passage through the gastrointestinal tract. In this study, we generated a derivative, Bifidobacterium breve BB8dpH, which displayed a stable, acid-resistant phenotype. To investigate the possible reasons for the higher acid tolerance of B. breve BB8dpH, as compared with its parental strain B. breve BB8, a combined transcriptome and physiological approach was used to characterize differences between the two strains. An analysis of the transcriptome by RNA-sequencing indicated that the expression of 121 genes was increased by more than 2-fold, while the expression of 146 genes was reduced more than 2-fold, in B. breve BB8dpH. Validation of the RNA-sequencing data using real-time quantitative PCR analysis demonstrated that the RNA-sequencing results were highly reliable. The comparison analysis, based on differentially expressed genes, suggested that the acid tolerance of B. breve BB8dpH was enhanced by regulating the expression of genes involved in carbohydrate transport and metabolism, energy production, synthesis of cell envelope components (peptidoglycan and exopolysaccharide), synthesis and transport of glutamate and glutamine, and histidine synthesis. Furthermore, an analysis of physiological data showed that B. breve BB8dpH displayed higher production of exopolysaccharide and lower H(+)-ATPase activity than B. breve BB8. The results presented here will improve our understanding of acid tolerance in bifidobacteria, and they will lead to the development of new strategies to enhance the acid tolerance of bifidobacterial strains. Copyright © 2015 Elsevier Ltd. All rights reserved.

  18. Single-molecule protein sequencing through fingerprinting: computational assessment

    NASA Astrophysics Data System (ADS)

    Yao, Yao; Docter, Margreet; van Ginkel, Jetty; de Ridder, Dick; Joo, Chirlmin

    2015-10-01

    Proteins are vital in all biological systems as they constitute the main structural and functional components of cells. Recent advances in mass spectrometry have brought the promise of complete proteomics by helping draft the human proteome. Yet, this commonly used protein sequencing technique has fundamental limitations in sensitivity. Here we propose a method for single-molecule (SM) protein sequencing. A major challenge lies in the fact that proteins are composed of 20 different amino acids, which demands 20 molecular reporters. We computationally demonstrate that it suffices to measure only two types of amino acids to identify proteins and suggest an experimental scheme using SM fluorescence. When achieved, this highly sensitive approach will result in a paradigm shift in proteomics, with major impact in the biological and medical sciences.

  19. Identifying Genetic Signatures of Natural Selection Using Pooled Population Sequencing in Picea abies

    PubMed Central

    Chen, Jun; Källman, Thomas; Ma, Xiao-Fei; Zaina, Giusi; Morgante, Michele; Lascoux, Martin

    2016-01-01

    The joint inference of selection and past demography remain a costly and demanding task. We used next generation sequencing of two pools of 48 Norway spruce mother trees, one corresponding to the Fennoscandian domain, and the other to the Alpine domain, to assess nucleotide polymorphism at 88 nuclear genes. These genes are candidate genes for phenological traits, and most belong to the photoperiod pathway. Estimates of population genetic summary statistics from the pooled data are similar to previous estimates, suggesting that pooled sequencing is reliable. The nonsynonymous SNPs tended to have both lower frequency differences and lower FST values between the two domains than silent ones. These results suggest the presence of purifying selection. The divergence between the two domains based on synonymous changes was around 5 million yr, a time similar to a recent phylogenetic estimate of 6 million yr, but much larger than earlier estimates based on isozymes. Two approaches, one of them novel and that considers both FST and difference in allele frequencies between the two domains, were used to identify SNPs potentially under diversifying selection. SNPs from around 20 genes were detected, including genes previously identified as main target for selection, such as PaPRR3 and PaGI. PMID:27172202

  20. Identifying Genetic Signatures of Natural Selection Using Pooled Population Sequencing in Picea abies.

    PubMed

    Chen, Jun; Källman, Thomas; Ma, Xiao-Fei; Zaina, Giusi; Morgante, Michele; Lascoux, Martin

    2016-07-07

    The joint inference of selection and past demography remain a costly and demanding task. We used next generation sequencing of two pools of 48 Norway spruce mother trees, one corresponding to the Fennoscandian domain, and the other to the Alpine domain, to assess nucleotide polymorphism at 88 nuclear genes. These genes are candidate genes for phenological traits, and most belong to the photoperiod pathway. Estimates of population genetic summary statistics from the pooled data are similar to previous estimates, suggesting that pooled sequencing is reliable. The nonsynonymous SNPs tended to have both lower frequency differences and lower FST values between the two domains than silent ones. These results suggest the presence of purifying selection. The divergence between the two domains based on synonymous changes was around 5 million yr, a time similar to a recent phylogenetic estimate of 6 million yr, but much larger than earlier estimates based on isozymes. Two approaches, one of them novel and that considers both FST and difference in allele frequencies between the two domains, were used to identify SNPs potentially under diversifying selection. SNPs from around 20 genes were detected, including genes previously identified as main target for selection, such as PaPRR3 and PaGI. Copyright © 2016 Chen et al.

  1. Adhesive proteins of stalked and acorn barnacles display homology with low sequence similarities.

    PubMed

    Jonker, Jaimie-Leigh; Abram, Florence; Pires, Elisabete; Varela Coelho, Ana; Grunwald, Ingo; Power, Anne Marie

    2014-01-01

    Barnacle adhesion underwater is an important phenomenon to understand for the prevention of biofouling and potential biotechnological innovations, yet so far, identifying what makes barnacle glue proteins 'sticky' has proved elusive. Examination of a broad range of species within the barnacles may be instructive to identify conserved adhesive domains. We add to extensive information from the acorn barnacles (order Sessilia) by providing the first protein analysis of a stalked barnacle adhesive, Lepas anatifera (order Lepadiformes). It was possible to separate the L. anatifera adhesive into at least 10 protein bands using SDS-PAGE. Intense bands were present at approximately 30, 70, 90 and 110 kilodaltons (kDa). Mass spectrometry for protein identification was followed by de novo sequencing which detected 52 peptides of 7-16 amino acids in length. None of the peptides matched published or unpublished transcriptome sequences, but some amino acid sequence similarity was apparent between L. anatifera and closely-related Dosima fascicularis. Antibodies against two acorn barnacle proteins (ab-cp-52k and ab-cp-68k) showed cross-reactivity in the adhesive glands of L. anatifera. We also analysed the similarity of adhesive proteins across several barnacle taxa, including Pollicipes pollicipes (a stalked barnacle in the order Scalpelliformes). Sequence alignment of published expressed sequence tags clearly indicated that P. pollicipes possesses homologues for the 19 kDa and 100 kDa proteins in acorn barnacles. Homology aside, sequence similarity in amino acid and gene sequences tended to decline as taxonomic distance increased, with minimum similarities of 18-26%, depending on the gene. The results indicate that some adhesive proteins (e.g. 100 kDa) are more conserved within barnacles than others (20 kDa).

  2. CoSMoS: Conserved Sequence Motif Search in the proteome

    PubMed Central

    Liu, Xiao I; Korde, Neeraj; Jakob, Ursula; Leichert, Lars I

    2006-01-01

    Background With the ever-increasing number of gene sequences in the public databases, generating and analyzing multiple sequence alignments becomes increasingly time consuming. Nevertheless it is a task performed on a regular basis by researchers in many labs. Results We have now created a database called CoSMoS to find the occurrences and at the same time evaluate the significance of sequence motifs and amino acids encoded in the whole genome of the model organism Escherichia coli K12. We provide a precomputed set of multiple sequence alignments for each individual E. coli protein with all of its homologues in the RefSeq database. The alignments themselves, information about the occurrence of sequence motifs together with information on the conservation of each of the more than 1.3 million amino acids encoded in the E. coli genome can be accessed via the web interface of CoSMoS. Conclusion CoSMoS is a valuable tool to identify highly conserved sequence motifs, to find regions suitable for mutational studies in functional analyses and to predict important structural features in E. coli proteins. PMID:16433915

  3. Whole exome sequencing identifies genetic variants in inherited thrombocytopenia with secondary qualitative function defects.

    PubMed

    Johnson, Ben; Lowe, Gillian C; Futterer, Jane; Lordkipanidzé, Marie; MacDonald, David; Simpson, Michael A; Sanchez-Guiú, Isabel; Drake, Sian; Bem, Danai; Leo, Vincenzo; Fletcher, Sarah J; Dawood, Ban; Rivera, José; Allsup, David; Biss, Tina; Bolton-Maggs, Paula Hb; Collins, Peter; Curry, Nicola; Grimley, Charlotte; James, Beki; Makris, Mike; Motwani, Jayashree; Pavord, Sue; Talks, Katherine; Thachil, Jecko; Wilde, Jonathan; Williams, Mike; Harrison, Paul; Gissen, Paul; Mundell, Stuart; Mumford, Andrew; Daly, Martina E; Watson, Steve P; Morgan, Neil V

    2016-10-01

    Inherited thrombocytopenias are a heterogeneous group of disorders characterized by abnormally low platelet counts which can be associated with abnormal bleeding. Next-generation sequencing has previously been employed in these disorders for the confirmation of suspected genetic abnormalities, and more recently in the discovery of novel disease-causing genes. However its full potential has not yet been exploited. Over the past 6 years we have sequenced the exomes from 55 patients, including 37 index cases and 18 additional family members, all of whom were recruited to the UK Genotyping and Phenotyping of Platelets study. All patients had inherited or sustained thrombocytopenia of unknown etiology with platelet counts varying from 11×10 9 /L to 186×10 9 /L. Of the 51 patients phenotypically tested, 37 (73%), had an additional secondary qualitative platelet defect. Using whole exome sequencing analysis we have identified "pathogenic" or "likely pathogenic" variants in 46% (17/37) of our index patients with thrombocytopenia. In addition, we report variants of uncertain significance in 12 index cases, including novel candidate genetic variants in previously unreported genes in four index cases. These results demonstrate that whole exome sequencing is an efficient method for elucidating potential pathogenic genetic variants in inherited thrombocytopenia. Whole exome sequencing also has the added benefit of discovering potentially pathogenic genetic variants for further study in novel genes not previously implicated in inherited thrombocytopenia. Copyright© Ferrata Storti Foundation.

  4. Purification and sequence of rat oxyntomodulin.

    PubMed Central

    Collie, N L; Walsh, J H; Wong, H C; Shively, J E; Davis, M T; Lee, T D; Reeve, J R

    1994-01-01

    Structural information about rat enteroglucagon, intestinal peptides containing the pancreatic glucagon sequence, has been based previously on cDNA, immunologic, and chromatographic data. Our interests in testing the physiological actions of synthetic enteroglucagon peptides in rats required that we identify precisely the forms present in vivo. From knowledge of the proglucagon gene sequence, we synthesized an enteroglucagon C-terminal octapeptide common to both proposed enteroglucagon forms, glicentin and oxyntomodulin, but sharing no sequence overlap with glucagon. We then developed a radioimmunoassay using antibodies raised against the octapeptide that was specific for enteroglucagon peptides without cross-reacting with glucagon. Rat intestine was extracted, and one presumptive enteroglucagon form was purified by following the enteroglucagon C-terminal octapeptide-like immunoreactivity through several HPLC purification steps. Structural characterization of the material by amino acid composition, microsequence, and mass spectral analyses identified the peptide as rat oxyntomodulin. The 37-residue peptide consists of pancreatic glucagon plus the C-terminal extension, Lys-Arg-Asn-Arg-Asn-Asn-Ile-Ala. This now permits synthesis of an unambiguous duplicate of endogenous rat oxyntomodulin for physiological studies. Images PMID:7937770

  5. Identification of a nuclear localization sequence in the polyomavirus capsid protein VP2

    NASA Technical Reports Server (NTRS)

    Chang, D.; Haynes, J. I. 2nd; Brady, J. N.; Consigli, R. A.; Spooner, B. S. (Principal Investigator)

    1992-01-01

    A nuclear localization signal (NLS) has been identified in the C-terminal (Glu307-Glu-Asp-Gly-Pro-Gln-Lys-Lys-Lys-Arg-Arg-Leu318) amino acid sequence of the polyomavirus minor capsid protein VP2. The importance of this amino acid sequence for nuclear transport of newly synthesized VP2 was demonstrated by a genetic "subtractive" study using the constructs pSG5VP2 (expressing full-length VP2) and pSG5 delta 3VP2 (expressing truncated VP2, lacking amino acids Glu307-Leu318). These constructs were transfected into COS-7 cells, and the intracellular localization of the VP2 protein was determined by indirect immunofluorescence. These studies revealed that the full-length VP2 was localized in the nucleus, while the truncated VP2 protein was localized in the cytoplasm and not transported to the nucleus. A biochemical "additive" approach was also used to determine whether this sequence could target nonnuclear proteins to the nucleus. A synthetic peptide identical to VP2 amino acids Glu307-Leu318 was cross-linked to the nonnuclear proteins bovine serum albumin (BSA) or immunoglobulin G (IgG). The conjugates were then labeled with fluorescein isothiocyanate and microinjected into the cytoplasm of NIH 3T6 cells. Both conjugates localized in the nucleus of the microinjected cells, whereas unconjugated BSA and IgG remained in the cytoplasm. Taken together, these genetic subtractive and biochemical additive approaches have identified the C-terminal sequence of polyoma-virus VP2 (containing amino acids Glu307-Leu318) as the NLS of this protein.

  6. Synthesis and evaluations of an acid-cleavable, fluorescently labeled nucleotide as a reversible terminator for DNA sequencing.

    PubMed

    Tan, Lianjiang; Liu, Yazhi; Li, Xiaowei; Wu, Xin-Yan; Gong, Bing; Shen, Yu-Mei; Shao, Zhifeng

    2016-02-11

    An acid-cleavable linker based on a dimethylketal moiety was synthesized and used to connect a nucleotide with a fluorophore to produce a 3'-OH unblocked nucleotide analogue as an excellent reversible terminator for DNA sequencing by synthesis.

  7. Fatty acid methyl ester analysis to identify sources of soil in surface water.

    PubMed

    Banowetz, Gary M; Whittaker, Gerald W; Dierksen, Karen P; Azevedo, Mark D; Kennedy, Ann C; Griffith, Stephen M; Steiner, Jeffrey J

    2006-01-01

    Efforts to improve land-use practices to prevent contamination of surface waters with soil are limited by an inability to identify the primary sources of soil present in these waters. We evaluated the utility of fatty acid methyl ester (FAME) profiles of dry reference soils for multivariate statistical classification of soils collected from surface waters adjacent to agricultural production fields and a wooded riparian zone. Trials that compared approaches to concentrate soil from surface water showed that aluminum sulfate precipitation provided comparable yields to that obtained by vacuum filtration and was more suitable for handling large numbers of samples. Fatty acid methyl ester profiles were developed from reference soils collected from contrasting land uses in different seasons to determine whether specific fatty acids would consistently serve as variables in multivariate statistical analyses to permit reliable classification of soils. We used a Bayesian method and an independent iterative process to select appropriate fatty acids and found that variable selection was strongly impacted by the season during which soil was collected. The apparent seasonal variation in the occurrence of marker fatty acids in FAME profiles from reference soils prevented preparation of a standardized set of variables. Nevertheless, accurate classification of soil in surface water was achieved utilizing fatty acid variables identified in seasonally matched reference soils. Correlation analysis of entire chromatograms and subsequent discriminant analyses utilizing a restricted number of fatty acid variables showed that FAME profiles of soils exposed to the aquatic environment still had utility for classification at least 1 wk after submersion.

  8. Statistical potential-based amino acid similarity matrices for aligning distantly related protein sequences.

    PubMed

    Tan, Yen Hock; Huang, He; Kihara, Daisuke

    2006-08-15

    Aligning distantly related protein sequences is a long-standing problem in bioinformatics, and a key for successful protein structure prediction. Its importance is increasing recently in the context of structural genomics projects because more and more experimentally solved structures are available as templates for protein structure modeling. Toward this end, recent structure prediction methods employ profile-profile alignments, and various ways of aligning two profiles have been developed. More fundamentally, a better amino acid similarity matrix can improve a profile itself; thereby resulting in more accurate profile-profile alignments. Here we have developed novel amino acid similarity matrices from knowledge-based amino acid contact potentials. Contact potentials are used because the contact propensity to the other amino acids would be one of the most conserved features of each position of a protein structure. The derived amino acid similarity matrices are tested on benchmark alignments at three different levels, namely, the family, the superfamily, and the fold level. Compared to BLOSUM45 and the other existing matrices, the contact potential-based matrices perform comparably in the family level alignments, but clearly outperform in the fold level alignments. The contact potential-based matrices perform even better when suboptimal alignments are considered. Comparing the matrices themselves with each other revealed that the contact potential-based matrices are very different from BLOSUM45 and the other matrices, indicating that they are located in a different basin in the amino acid similarity matrix space.

  9. Nucleotide and deduced amino acid sequence of the envelope gene of the Vasilchenko strain of TBE virus; comparison with other flaviviruses.

    PubMed

    Gritsun, T S; Frolova, T V; Pogodina, V V; Lashkevich, V A; Venugopal, K; Gould, E A

    1993-02-01

    A strain of tick-borne encephalitis virus known as Vasilchenko (Vs) exhibits relatively low virulence characteristics in monkeys, Syrian hamsters and humans. The gene encoding the envelope glycoprotein of this virus was cloned and sequenced. Alignment of the sequence with those of other known tick-borne flaviviruses and identification of the recognised amino acid genetic marker EHLPTA confirmed its identity as a member of the TBE complex. However, Vs virus was distinguishable from eastern and western tick-borne serotypes by the presence of the sequence AQQ at amino acid positions 232-234 and also by the presence of other specific amino acid substitutions which may be genetic markers for these viruses and could determine their pathogenetic characteristics. When compared with other tick-borne flaviviruses, Vs virus had 12 unique amino acid substitutions including an additional potential glycosylation site at position (315-317). The Vs virus strain shared closest nucleotide and amino acid homology (84.5% and 95.5% respectively) with western and far eastern strains of tick-borne encephalitis virus. Comparison with the far eastern serotype of tick-borne encephalitis virus, by cross-immunoelectrophoresis of Vs virions and PAGE analysis of the extracted virion proteins, revealed differences in surface charge and virus stability that may account for the different virulence characteristics of Vs virus. These results support and enlarge upon previous data obtained from molecular and serological analysis.

  10. India Allele Finder: a web-based annotation tool for identifying common alleles in next-generation sequencing data of Indian origin.

    PubMed

    Zhang, Jimmy F; James, Francis; Shukla, Anju; Girisha, Katta M; Paciorkowski, Alex R

    2017-06-27

    We built India Allele Finder, an online searchable database and command line tool, that gives researchers access to variant frequencies of Indian Telugu individuals, using publicly available fastq data from the 1000 Genomes Project. Access to appropriate population-based genomic variant annotation can accelerate the interpretation of genomic sequencing data. In particular, exome analysis of individuals of Indian descent will identify population variants not reflected in European exomes, complicating genomic analysis for such individuals. India Allele Finder offers improved ease-of-use to investigators seeking to identify and annotate sequencing data from Indian populations. We describe the use of India Allele Finder to identify common population variants in a disease quartet whole exome dataset, reducing the number of candidate single nucleotide variants from 84 to 7. India Allele Finder is freely available to investigators to annotate genomic sequencing data from Indian populations. Use of India Allele Finder allows efficient identification of population variants in genomic sequencing data, and is an example of a population-specific annotation tool that simplifies analysis and encourages international collaboration in genomics research.

  11. Microfluidic screening and whole-genome sequencing identifies mutations associated with improved protein secretion by yeast.

    PubMed

    Huang, Mingtao; Bai, Yunpeng; Sjostrom, Staffan L; Hallström, Björn M; Liu, Zihe; Petranovic, Dina; Uhlén, Mathias; Joensson, Haakan N; Andersson-Svahn, Helene; Nielsen, Jens

    2015-08-25

    There is an increasing demand for biotech-based production of recombinant proteins for use as pharmaceuticals in the food and feed industry and in industrial applications. Yeast Saccharomyces cerevisiae is among preferred cell factories for recombinant protein production, and there is increasing interest in improving its protein secretion capacity. Due to the complexity of the secretory machinery in eukaryotic cells, it is difficult to apply rational engineering for construction of improved strains. Here we used high-throughput microfluidics for the screening of yeast libraries, generated by UV mutagenesis. Several screening and sorting rounds resulted in the selection of eight yeast clones with significantly improved secretion of recombinant α-amylase. Efficient secretion was genetically stable in the selected clones. We performed whole-genome sequencing of the eight clones and identified 330 mutations in total. Gene ontology analysis of mutated genes revealed many biological processes, including some that have not been identified before in the context of protein secretion. Mutated genes identified in this study can be potentially used for reverse metabolic engineering, with the objective to construct efficient cell factories for protein secretion. The combined use of microfluidics screening and whole-genome sequencing to map the mutations associated with the improved phenotype can easily be adapted for other products and cell types to identify novel engineering targets, and this approach could broadly facilitate design of novel cell factories.

  12. Cleavage of nucleic acids

    DOEpatents

    Prudent, James R.; Hall, Jeff G.; Lyamichev, Victor L.; Brow, Mary Ann D.; Dahlberg, James E.

    2007-12-11

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof.

  13. Cleavage of nucleic acids

    DOEpatents

    Prudent, James R.; Hall, Jeff G.; Lyamichev, Victor I.; Brow; Mary Ann D.; Dahlberg, James E.

    2010-11-09

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof.

  14. Cleavage of nucleic acids

    DOEpatents

    Prudent, James R.; Hall, Jeff G.; Lyamichev, Victor I.; Brow, Mary Ann D.; Dahlberg, James E.

    2000-01-01

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof.

  15. Nucleic acid detection assays

    DOEpatents

    Prudent, James R.; Hall, Jeff G.; Lyamichev, Victor I.; Brow, Mary Ann; Dahlberg, James E.

    2005-04-05

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof.

  16. Exome sequencing of extreme phenotypes identifies DCTN4 as a modifier of chronic Pseudomonas aeruginosa infection in cystic fibrosis.

    PubMed

    Emond, Mary J; Louie, Tin; Emerson, Julia; Zhao, Wei; Mathias, Rasika A; Knowles, Michael R; Wright, Fred A; Rieder, Mark J; Tabor, Holly K; Nickerson, Deborah A; Barnes, Kathleen C; Gibson, Ronald L; Bamshad, Michael J

    2012-07-08

    Exome sequencing has become a powerful and effective strategy for the discovery of genes underlying Mendelian disorders. However, use of exome sequencing to identify variants associated with complex traits has been more challenging, partly because the sample sizes needed for adequate power may be very large. One strategy to increase efficiency is to sequence individuals who are at both ends of a phenotype distribution (those with extreme phenotypes). Because the frequency of alleles that contribute to the trait are enriched in one or both phenotype extremes, a modest sample size can potentially be used to identify novel candidate genes and/or alleles. As part of the National Heart, Lung, and Blood Institute (NHLBI) Exome Sequencing Project (ESP), we used an extreme phenotype study design to discover that variants in DCTN4, encoding a dynactin protein, are associated with time to first P. aeruginosa airway infection, chronic P. aeruginosa infection and mucoid P. aeruginosa in individuals with cystic fibrosis.

  17. Complete Genome Sequence of a thermotolerant sporogenic lactic acid bacterium, Bacillus coagulans strain 36D1

    PubMed Central

    Rhee, Mun Su; Moritz, Brélan E.; Xie, Gary; Glavina del Rio, T.; Dalin, E.; Tice, H.; Bruce, D.; Goodwin, L.; Chertkov, O.; Brettin, T.; Han, C.; Detter, C.; Pitluck, S.; Land, Miriam L.; Patel, Milind; Ou, Mark; Harbrucker, Roberta; Ingram, Lonnie O.; Shanmugam, K. T.

    2011-01-01

    Bacillus coagulans is a ubiquitous soil bacterium that grows at 50-55 °C and pH 5.0 and ferments various sugars that constitute plant biomass to L (+)-lactic acid. The ability of this sporogenic lactic acid bacterium to grow at 50-55 °C and pH 5.0 makes this organism an attractive microbial biocatalyst for production of optically pure lactic acid at industrial scale not only from glucose derived from cellulose but also from xylose, a major constituent of hemicellulose. This bacterium is also considered as a potential probiotic. Complete genome sequence of a representative strain, B. coagulans strain 36D1, is presented and discussed. PMID:22675583

  18. Purification and partial amino-acid sequence of gibberellin 20-oxidase from Cucurbita maxima L. endosperm.

    PubMed

    Lange, T

    1994-01-01

    Gibberellin (GA) 20-oxidase was purified to apparent homogeneity from Cucurbita maxima endosperm by fractionated ammonium-sulphate precipitation, gel-filtration chromatography and anion-exchange and hydrophobic-interaction high-performance liquid chromatography (HPLC). Average purification after the last step was 55-fold with 3.9% of the activity recovered. The purest single fraction was enriched 101-fold with 0.2% overall recovery. Apparent relative molecular mass of the enzyme was 45 kDa, as determined by gel-filtration HPLC and sodium dodecyl sulphate-polyacrylamide gel electrophoresis, indicating that GA 20-oxidase is probably a monomeric enzyme. The purified enzyme degraded on two-dimensional gel electrophoresis, giving two protein spots: a major one corresponding to a molecular mass of 30 kDa and a minor one at 45 kDa. The isoelectric point for both was 5.4. The amino-acid sequences of the amino-terminus of the purified enzyme and of two peptides from a tryptic digest were determined. The purified enzyme catalysed the sequential conversion of [14C]GA12 to [14C]GA15, [14C]GA24 and [14C]GA25, showing that carbon atom 20 was oxidised to the corresponding alcohol, aldehyde and carboxylic acid in three consecutive reactions. [14C]Gibberellin A53 was similarly converted to [14C]GA44, [14C]GA19, [14C]GA17 and small amounts of a fourth product, which was preliminarily identified as [14C]GA20, a C19-gibberellin. All GAs except [14C]GA20 were identified by combined gas chromatography-mass spectrometry. The cofactor requirements in the absence of dithiothreitol were essentially as in its presence (Lange et al., Planta 195, 98-107, 1994), except that ascorbate was essential for enzyme activity and the optimal concentration of catalase was lower.

  19. Salmonella Persistence in Tomatoes Requires a Distinct Set of Metabolic Functions Identified by Transposon Insertion Sequencing.

    PubMed

    de Moraes, Marcos H; Desai, Prerak; Porwollik, Steffen; Canals, Rocio; Perez, Daniel R; Chu, Weiping; McClelland, Michael; Teplitski, Max

    2017-03-01

    Human enteric pathogens, such as Salmonella spp. and verotoxigenic Escherichia coli , are increasingly recognized as causes of gastroenteritis outbreaks associated with the consumption of fruits and vegetables. Persistence in plants represents an important part of the life cycle of these pathogens. The identification of the full complement of Salmonella genes involved in the colonization of the model plant (tomato) was carried out using transposon insertion sequencing analysis. With this approach, 230,000 transposon insertions were screened in tomato pericarps to identify loci with reduction in fitness, followed by validation of the screen results using competition assays of the isogenic mutants against the wild type. A comparison with studies in animals revealed a distinct plant-associated set of genes, which only partially overlaps with the genes required to elicit disease in animals. De novo biosynthesis of amino acids was critical to persistence within tomatoes, while amino acid scavenging was prevalent in animal infections. Fitness reduction of the Salmonella amino acid synthesis mutants was generally more severe in the tomato rin mutant, which hyperaccumulates certain amino acids, suggesting that these nutrients remain unavailable to Salmonella spp. within plants. Salmonella lipopolysaccharide (LPS) was required for persistence in both animals and plants, exemplifying some shared pathogenesis-related mechanisms in animal and plant hosts. Similarly to phytopathogens, Salmonella spp. required biosynthesis of amino acids, LPS, and nucleotides to colonize tomatoes. Overall, however, it appears that while Salmonella shares some strategies with phytopathogens and taps into its animal virulence-related functions, colonization of tomatoes represents a distinct strategy, highlighting this pathogen's flexible metabolism. IMPORTANCE Outbreaks of gastroenteritis caused by human pathogens have been increasingly associated with foods of plant origin, with tomatoes being

  20. Salmonella Persistence in Tomatoes Requires a Distinct Set of Metabolic Functions Identified by Transposon Insertion Sequencing

    PubMed Central

    Desai, Prerak; Porwollik, Steffen; Canals, Rocio; Perez, Daniel R.; Chu, Weiping; McClelland, Michael; Teplitski, Max

    2016-01-01

    ABSTRACT Human enteric pathogens, such as Salmonella spp. and verotoxigenic Escherichia coli, are increasingly recognized as causes of gastroenteritis outbreaks associated with the consumption of fruits and vegetables. Persistence in plants represents an important part of the life cycle of these pathogens. The identification of the full complement of Salmonella genes involved in the colonization of the model plant (tomato) was carried out using transposon insertion sequencing analysis. With this approach, 230,000 transposon insertions were screened in tomato pericarps to identify loci with reduction in fitness, followed by validation of the screen results using competition assays of the isogenic mutants against the wild type. A comparison with studies in animals revealed a distinct plant-associated set of genes, which only partially overlaps with the genes required to elicit disease in animals. De novo biosynthesis of amino acids was critical to persistence within tomatoes, while amino acid scavenging was prevalent in animal infections. Fitness reduction of the Salmonella amino acid synthesis mutants was generally more severe in the tomato rin mutant, which hyperaccumulates certain amino acids, suggesting that these nutrients remain unavailable to Salmonella spp. within plants. Salmonella lipopolysaccharide (LPS) was required for persistence in both animals and plants, exemplifying some shared pathogenesis-related mechanisms in animal and plant hosts. Similarly to phytopathogens, Salmonella spp. required biosynthesis of amino acids, LPS, and nucleotides to colonize tomatoes. Overall, however, it appears that while Salmonella shares some strategies with phytopathogens and taps into its animal virulence-related functions, colonization of tomatoes represents a distinct strategy, highlighting this pathogen's flexible metabolism. IMPORTANCE Outbreaks of gastroenteritis caused by human pathogens have been increasingly associated with foods of plant origin, with tomatoes

  1. PineElm_SSRdb: a microsatellite marker database identified from genomic, chloroplast, mitochondrial and EST sequences of pineapple (Ananas comosus (L.) Merrill).

    PubMed

    Chaudhary, Sakshi; Mishra, Bharat Kumar; Vivek, Thiruvettai; Magadum, Santoshkumar; Yasin, Jeshima Khan

    2016-01-01

    Simple Sequence Repeats or microsatellites are resourceful molecular genetic markers. There are only few reports of SSR identification and development in pineapple. Complete genome sequence of pineapple available in the public domain can be used to develop numerous novel SSRs. Therefore, an attempt was made to identify SSRs from genomic, chloroplast, mitochondrial and EST sequences of pineapple which will help in deciphering genetic makeup of its germplasm resources. A total of 359511 SSRs were identified in pineapple (356385 from genome sequence, 45 from chloroplast sequence, 249 in mitochondrial sequence and 2832 from EST sequences). The list of EST-SSR markers and their details are available in the database. PineElm_SSRdb is an open source database available for non-commercial academic purpose at http://app.bioelm.com/ with a mapping tool which can develop circular maps of selected marker set. This database will be of immense use to breeders, researchers and graduates working on Ananas spp. and to others working on cross-species transferability of markers, investigating diversity, mapping and DNA fingerprinting.

  2. Whole-exome sequencing identifies novel homozygous mutation in NPAS2 in family with nonobstructive azoospermia.

    PubMed

    Ramasamy, Ranjith; Bakırcıoğlu, M Emre; Cengiz, Cenk; Karaca, Ender; Scovell, Jason; Jhangiani, Shalini N; Akdemir, Zeynep C; Bainbridge, Matthew; Yu, Yao; Huff, Chad; Gibbs, Richard A; Lupski, James R; Lamb, Dolores J

    2015-08-01

    To investigate the genetic cause of nonobstructive azoospermia (NOA) in a consanguineous Turkish family through homozygosity mapping followed by targeted exon/whole-exome sequencing to identify genetic variations. Whole-exome sequencing (WES). Research laboratory. Two siblings in a consanguineous family with NOA. Validating all variants passing filter criteria with Sanger sequencing to confirm familial segregation and absence in the control population. Discovery of a mutation that could potentially cause NOA. A novel nonsynonymous mutation in the neuronal PAS-2 domain (NPAS2) was identified in a consanguineous family from Turkey. This mutation in exon 14 (chr2: 101592000 C>G) of NPAS2 is likely a disease-causing mutation as it is predicted to be damaging, it is a novel variant, and it segregates with the disease. Family segregation of the variants showed the presence of the homozygous mutation in the three brothers with NOA and a heterozygous mutation in the mother as well as one brother and one sister who were both fertile. The mutation is not found in the single-nucleotide polymorphism database, the 1000 Genomes Project, the Baylor College of Medicine cohort of 500 Turkish patients (not a population-specific polymorphism), or the matching 50 fertile controls. With the use of WES we identified a novel homozygous mutation in NPAS2 as a likely disease-causing variant in a Turkish family diagnosed with NOA. Our data reinforce the clinical role of WES in the molecular diagnosis of highly heterogeneous genetic diseases for which conventional genetic approaches have previously failed to find a molecular diagnosis. Copyright © 2015 American Society for Reproductive Medicine. Published by Elsevier Inc. All rights reserved.

  3. WholeGenome Sequencing of High-Risk Families to Identify New Mutational Mechanisms of Breast Cancer Predisposition

    DTIC Science & Technology

    2014-10-01

    4 APPENDICES 4 INTRODUCTION: Despite tremendous advances in mutation detection with gene panels...population frequency and overlap with ENCODE regions. 2a. Align reads to the reference sequence (months 4-10) 2b. Identify SNPs, indels, CNVs and

  4. The hypervariable region 1 protein of hepatitis C virus broadly reactive with sera of patients with chronic hepatitis C has a similar amino acid sequence with the consensus sequence.

    PubMed

    Watanabe, K; Yoshioka, K; Ito, H; Ishigami, M; Takagi, K; Utsunomiya, S; Kobayashi, M; Kishimoto, H; Yano, M; Kakumu, S

    1999-11-10

    Hypervariable region 1 (HVR1) proteins of hepatitis C virus (HCV) have been reported to react broadly with sera of patients with HCV infection. However, the variability of the broad reactivity of individual HVR1 proteins has not been elucidated. We assessed the reactivity of 25 different HVR1 proteins (genotype 1b) with sera of 81 patients with HCV infection (genotype 1b) by Western blot. HVR1 proteins reacted with 2-60 sera. The number of sera reactive with each HVR1 protein significantly correlated with the number of amino acid residues identical to the consensus sequence defined by Puntoriero et al. (G. Puntoriero, A. Lahm, S. Zucchelli, B. B. Ercole, R. Tafi, M. Penzzanera, M. U. Mondelli, R. Cortese, A. Tramontano, G. Galfre', and A. Nicosia. 1998. EMBO J. 17, 3521-3533. ) (r = 0.561, P < 0.005). The most widely reactive HVR1 protein, 12-22, had a sequence similar to the consensus sequence. The peptide with C-terminal 13-amino-acids sequence of HVR1 protein 12-22 (NH2-CSFTSLFTPGPSQK) was injected into rabbits as an immunogen. The rabbit immune sera reacted with 9 of 25 HVR1 proteins of genotype 1b including HVR1 protein 12-22 and with 3 of 12 proteins of genotype 2a. These results indicate that the HVR1 protein broadly reactive with patients' sera has a sequence similar to the consensus sequence, can induce broadly reactive sera, and could be one of the candidate immunogens in a prophylactic vaccine against HCV. Copyright 1999 Academic Press.

  5. Correlating low-similarity peptide sequences and allergenic epitopes.

    PubMed

    Kanduc, D

    2008-01-01

    Although a high number of allergenic peptide epitopes has been experimentally identified and defined, the molecular basis and the precise mechanisms underlying peptide allergenicity are unknown. This issue was analyzed exploring the relationship between peptide allergenicity and sequence similarity to the human proteome. The structured analysis of the data reported in literature put into evidence that the most part of IgE-binding epitopes are (or harbor) pentapeptide unit(s) with no/low similarity to the human proteome, this way suggesting that no or low sequence similarity to the host proteome might represent a minimum common denominator identifying allergenic peptides. The present literature analysis might be of relevance in devising and designing short amino acid modules to be used for blocking pathogenic IgE.

  6. Targeted next generation sequencing identifies novel NOTCH3 gene mutations in CADASIL diagnostics patients.

    PubMed

    Maksemous, Neven; Smith, Robert A; Haupt, Larisa M; Griffiths, Lyn R

    2016-11-24

    Cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASIL) is a monogenic, hereditary, small vessel disease of the brain causing stroke and vascular dementia in adults. CADASIL has previously been shown to be caused by varying mutations in the NOTCH3 gene. The disorder is often misdiagnosed due to its significant clinical heterogeneic manifestation with familial hemiplegic migraine and several ataxia disorders as well as the location of the currently identified causative mutations. The aim of this study was to develop a new, comprehensive and efficient single assay strategy for complete molecular diagnosis of NOTCH3 mutations through the use of a custom next-generation sequencing (NGS) panel for improved routine clinical molecular diagnostic testing. Our custom NGS panel identified nine genetic variants in NOTCH3 (p.D139V, p.C183R, p.R332C, p.Y465C, p.C597W, p.R607H, p.E813E, p.C977G and p.Y1106C). Six mutations were stereotypical CADASIL mutations leading to an odd number of cysteine residues in one of the 34 NOTCH3 gene epidermal growth factor (EGF)-like repeats, including three new typical cysteine mutations identified in exon 11 (p.C597W; c.1791C>G); exon 18 (p.C977G; c.2929T>G) and exon 20 (p.Y1106C; c.3317A>G). Interestingly, a novel missense mutation in the CACNA1A gene was also identified in one CADASIL patient. All variants identified (novel and known) were further investigated using in silico bioinformatic analyses and confirmed through Sanger sequencing. NGS provides an improved and effective methodology for the diagnosis of CADASIL. The NGS approach reduced time and cost for comprehensive genetic diagnosis, placing genetic diagnostic testing within reach of more patients.

  7. PLAAC: a web and command-line application to identify proteins with prion-like amino acid composition.

    PubMed

    Lancaster, Alex K; Nutter-Upham, Andrew; Lindquist, Susan; King, Oliver D

    2014-09-01

    Prions are self-templating protein aggregates that stably perpetuate distinct biological states and are of keen interest to researchers in both evolutionary and biomedical science. The best understood prions are from yeast and have a prion-forming domain with strongly biased amino acid composition, most notably enriched for Q or N. PLAAC is a web application that scans protein sequences for domains with P: rion- L: ike A: mino A: cid C: omposition. Users can upload sequence files, or paste sequences directly into a textbox. PLAAC ranks the input sequences by several summary scores and allows scores along sequences to be visualized. Text output files can be downloaded for further analyses, and visualizations saved in PDF and PNG formats. http://plaac.wi.mit.edu/. The Ruby-based web framework and the command-line software (implemented in Java, with visualization routines in R) are available at http://github.com/whitehead/plaac under the MIT license. All software can be run under OS X, Windows and Unix. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  8. Highly conserved intragenic HSV-2 sequences: Results from next-generation sequencing of HSV-2 UL and US regions from genital swabs collected from 3 continents.

    PubMed

    Johnston, Christine; Magaret, Amalia; Roychoudhury, Pavitra; Greninger, Alexander L; Cheng, Anqi; Diem, Kurt; Fitzgibbon, Matthew P; Huang, Meei-Li; Selke, Stacy; Lingappa, Jairam R; Celum, Connie; Jerome, Keith R; Wald, Anna; Koelle, David M

    2017-10-01

    Understanding the variability in circulating herpes simplex virus type 2 (HSV-2) genomic sequences is critical to the development of HSV-2 vaccines. Genital lesion swabs containing ≥ 10 7 log 10 copies HSV DNA collected from Africa, the USA, and South America underwent next-generation sequencing, followed by K-mer based filtering and de novo genomic assembly. Sites of heterogeneity within coding regions in unique long and unique short (U L _U S ) regions were identified. Phylogenetic trees were created using maximum likelihood reconstruction. Among 46 samples from 38 persons, 1468 intragenic base-pair substitutions were identified. The maximum nucleotide distance between strains for concatenated U L_ U S segments was 0.4%. Phylogeny did not reveal geographic clustering. The most variable proteins had non-synonymous mutations in < 3% of amino acids. Unenriched HSV-2 DNA can undergo next-generation sequencing to identify intragenic variability. The use of clinical swabs for sequencing expands the information that can be gathered directly from these specimens. Copyright © 2017 Elsevier Inc. All rights reserved.

  9. Analysis and Functional Annotation of an Expressed Sequence Tag Collection for Tropical Crop Sugarcane

    PubMed Central

    Vettore, André L.; da Silva, Felipe R.; Kemper, Edson L.; Souza, Glaucia M.; da Silva, Aline M.; Ferro, Maria Inês T.; Henrique-Silva, Flavio; Giglioti, Éder A.; Lemos, Manoel V.F.; Coutinho, Luiz L.; Nobrega, Marina P.; Carrer, Helaine; França, Suzelei C.; Bacci, Maurício; Goldman, Maria Helena S.; Gomes, Suely L.; Nunes, Luiz R.; Camargo, Luis E.A.; Siqueira, Walter J.; Van Sluys, Marie-Anne; Thiemann, Otavio H.; Kuramae, Eiko E.; Santelli, Roberto V.; Marino, Celso L.; Targon, Maria L.P.N.; Ferro, Jesus A.; Silveira, Henrique C.S.; Marini, Danyelle C.; Lemos, Eliana G.M.; Monteiro-Vitorello, Claudia B.; Tambor, José H.M.; Carraro, Dirce M.; Roberto, Patrícia G.; Martins, Vanderlei G.; Goldman, Gustavo H.; de Oliveira, Regina C.; Truffi, Daniela; Colombo, Carlos A.; Rossi, Magdalena; de Araujo, Paula G.; Sculaccio, Susana A.; Angella, Aline; Lima, Marleide M.A.; de Rosa, Vicente E.; Siviero, Fábio; Coscrato, Virginia E.; Machado, Marcos A.; Grivet, Laurent; Di Mauro, Sonia M.Z.; Nobrega, Francisco G.; Menck, Carlos F.M.; Braga, Marilia D.V.; Telles, Guilherme P.; Cara, Frank A.A.; Pedrosa, Guilherme; Meidanis, João; Arruda, Paulo

    2003-01-01

    To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST) program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. Of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged. PMID:14613979

  10. An acid-stable bacterial laccase identified from the endophyte Pantoea ananatis Sd-1 genome exhibiting lignin degradation and dye decolorization abilities.

    PubMed

    Shi, Xiaowei; Liu, Qian; Ma, Jiangshan; Liao, Hongdong; Xiong, Xianqiu; Zhang, Keke; Wang, Tengfei; Liu, Xuanmin; Xu, Ting; Yuan, Shanshan; Zhang, Xin; Zhu, Yonghua

    2015-11-01

    Isolation and identification of a novel laccase (namely Lac4) with various industrial applications potentials from an endophytical bacterium. Endophyte Sd-1 cultured in rice straw showed intra- and extra-cellular laccase activities. Genomic analysis of Sd-1 identified four putative laccases, Lac1 to Lac4. However, only Lac4 contains the complete signature sequence of laccase and shares at most 64 % sequence identity with other characterized bacterial multi-copper oxidases. Recombinant Lac4 can oxidize non-phenolic and phenolic compounds under acidic conditions and at 30-50 °C; Km values of Lac4 for ABTS at pH 2.5 and for guaiacol at pH 4.5 were 1 ± 0.15 and 6.1 ± 1.7 mM, respectively. The activity of Lac4 was stimulated by 0.8 mM Cu(2+) and 5 mM Fe(2+). In addition, Lac4 could decolorize various synthetic dyes and exhibit the degradation rate of 38 % for lignin. The data suggest that Lac4 possesses promising biotechnological potentials.

  11. Comparison of the nucleotide and amino acid sequences of the RsrI and EcoRI restriction endonucleases.

    PubMed

    Stephenson, F H; Ballard, B T; Boyer, H W; Rosenberg, J M; Greene, P J

    1989-12-21

    The RsrI endonuclease, a type-II restriction endonuclease (ENase) found in Rhodobacter sphaeroides, is an isoschizomer of the EcoRI ENase. A clone containing an 11-kb BamHI fragment was isolated from an R. sphaeroides genomic DNA library by hybridization with synthetic oligodeoxyribonucleotide probes based on the N-terminal amino acid (aa) sequence of RsrI. Extracts of E. coli containing a subclone of the 11-kb fragment display RsrI activity. Nucleotide sequence analysis reveals an 831-bp open reading frame encoding a polypeptide of 277 aa. A 50% identity exists within a 266-aa overlap between the deduced aa sequences of RsrI and EcoRI. Regions of 75-100% aa sequence identity correspond to key structural and functional regions of EcoRI. The type-II ENases have many common properties, and a common origin might have been expected. Nevertheless, this is the first demonstration of aa sequence similarity between ENases produced by different organisms.

  12. Draft Genome Sequence of the Butyric Acid Producer Clostridium tyrobutyricum Strain CIP I-776 (IFP923).

    PubMed

    Wasels, François; Clément, Benjamin; Lopes Ferreira, Nicolas

    2016-03-03

    Here, we report the draft genome sequence of Clostridium tyrobutyricum CIP I-776 (IFP923), an efficient producer of butyric acid. The genome consists of a single chromosome of 3.19 Mb and provides useful data concerning the metabolic capacities of the strain. Copyright © 2016 Wasels et al.

  13. Diverse Array of New Viral Sequences Identified in Worldwide Populations of the Asian Citrus Psyllid (Diaphorina citri) Using Viral Metagenomics

    PubMed Central

    Nouri, Shahideh; Salem, Nidá; Nigg, Jared C.

    2015-01-01

    ABSTRACT The Asian citrus psyllid, Diaphorina citri, is the natural vector of the causal agent of Huanglongbing (HLB), or citrus greening disease. Together; HLB and D. citri represent a major threat to world citrus production. As there is no cure for HLB, insect vector management is considered one strategy to help control the disease, and D. citri viruses might be useful. In this study, we used a metagenomic approach to analyze viral sequences associated with the global population of D. citri. By sequencing small RNAs and the transcriptome coupled with bioinformatics analysis, we showed that the virus-like sequences of D. citri are diverse. We identified novel viral sequences belonging to the picornavirus superfamily, the Reoviridae, Parvoviridae, and Bunyaviridae families, and an unclassified positive-sense single-stranded RNA virus. Moreover, a Wolbachia prophage-related sequence was identified. This is the first comprehensive survey to assess the viral community from worldwide populations of an agricultural insect pest. Our results provide valuable information on new putative viruses, some of which may have the potential to be used as biocontrol agents. IMPORTANCE Insects have the most species of all animals, and are hosts to, and vectors of, a great variety of known and unknown viruses. Some of these most likely have the potential to be important fundamental and/or practical resources. In this study, we used high-throughput next-generation sequencing (NGS) technology and bioinformatics analysis to identify putative viruses associated with Diaphorina citri, the Asian citrus psyllid. D. citri is the vector of the bacterium causing Huanglongbing (HLB), currently the most serious threat to citrus worldwide. Here, we report several novel viral sequences associated with D. citri. PMID:26676774

  14. Exome sequencing identifies recurrent somatic RAC1 mutations in melanoma

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Krauthammer, Michael; Kong, Yong; Ha, Byung Hak

    We characterized the mutational landscape of melanoma, the form of skin cancer with the highest mortality rate, by sequencing the exomes of 147 melanomas. Sun-exposed melanomas had markedly more ultraviolet (UV)-like C>T somatic mutations compared to sun-shielded acral, mucosal and uveal melanomas. Among the newly identified cancer genes was PPP6C, encoding a serine/threonine phosphatase, which harbored mutations that clustered in the active site in 12% of sun-exposed melanomas, exclusively in tumors with mutations in BRAF or NRAS. Notably, we identified a recurrent UV-signature, an activating mutation in RAC1 in 9.2% of sun-exposed melanomas. This activating mutation, the third most frequentmore » in our cohort of sun-exposed melanoma after those of BRAF and NRAS, changes Pro29 to serine (RAC1{sup P29S}) in the highly conserved switch I domain. Crystal structures, and biochemical and functional studies of RAC1{sup P29S} showed that the alteration releases the conformational restraint conferred by the conserved proline, causes an increased binding of the protein to downstream effectors, and promotes melanocyte proliferation and migration. These findings raise the possibility that pharmacological inhibition of downstream effectors of RAC1 signaling could be of therapeutic benefit.« less

  15. Genetic Variants Identified from Epilepsy of Unknown Etiology in Chinese Children by Targeted Exome Sequencing

    PubMed Central

    Wang, Yimin; Du, Xiaonan; Bin, Rao; Yu, Shanshan; Xia, Zhezhi; Zheng, Guo; Zhong, Jianmin; Zhang, Yunjian; Jiang, Yong-hui; Wang, Yi

    2017-01-01

    Genetic factors play a major role in the etiology of epilepsy disorders. Recent genomics studies using next generation sequencing (NGS) technique have identified a large number of genetic variants including copy number (CNV) and single nucleotide variant (SNV) in a small set of genes from individuals with epilepsy. These discoveries have contributed significantly to evaluate the etiology of epilepsy in clinic and lay the foundation to develop molecular specific treatment. However, the molecular basis for a majority of epilepsy patients remains elusive, and furthermore, most of these studies have been conducted in Caucasian children. Here we conducted a targeted exome-sequencing of 63 trios of Chinese epilepsy families using a custom-designed NGS panel that covers 412 known and candidate genes for epilepsy. We identified pathogenic and likely pathogenic variants in 15 of 63 (23.8%) families in known epilepsy genes including SCN1A, CDKL5, STXBP1, CHD2, SCN3A, SCN9A, TSC2, MBD5, POLG and EFHC1. More importantly, we identified likely pathologic variants in several novel candidate genes such as GABRE, MYH1, and CLCN6. Our results provide the evidence supporting the application of custom-designed NGS panel in clinic and indicate a conserved genetic susceptibility for epilepsy between Chinese and Caucasian children. PMID:28074849

  16. Utility of next-generation RNA-sequencing in identifying chimeric transcription involving human endogenous retroviruses.

    PubMed

    Sokol, Martin; Jessen, Karen Margrethe; Pedersen, Finn Skou

    2016-01-01

    Several studies have shown that human endogenous retroviruses and endogenous retrovirus-like repeats (here collectively HERVs) impose direct regulation on human genes through enhancer and promoter motifs present in their long terminal repeats (LTRs). Although chimeric transcription in which novel gene isoforms containing retroviral and human sequence are transcribed from viral promoters are commonly associated with disease, regulation by HERVs is beneficial in other settings; for example, in human testis chimeric isoforms of TP63 induced by an ERV9 LTR protect the male germ line upon DNA damage by inducing apoptosis, whereas in the human globin locus the γ- and β-globin switch during normal hematopoiesis is mediated by complex interactions of an ERV9 LTR and surrounding human sequence. The advent of deep sequencing or next-generation sequencing (NGS) has revolutionized the way researchers solve important scientific questions and develop novel hypotheses in relation to human genome regulation. We recently applied next-generation paired-end RNA-sequencing (RNA-seq) together with chromatin immunoprecipitation with sequencing (ChIP-seq) to examine ERV9 chimeric transcription in human reference cell lines from Encyclopedia of DNA Elements (ENCODE). This led to the discovery of advanced regulation mechanisms by ERV9s and other HERVs across numerous human loci including transcription of large gene-unannotated genomic regions, as well as cooperative regulation by multiple HERVs and non-LTR repeats such as Alu elements. In this article, well-established examples of human gene regulation by HERVs are reviewed followed by a description of paired-end RNA-seq, and its application in identifying chimeric transcription genome-widely. Based on integrative analyses of RNA-seq and ChIP-seq, data we then present novel examples of regulation by ERV9s of tumor suppressor genes CADM2 and SEMA3A, as well as transcription of an unannotated region. Taken together, this article highlights

  17. Gene-Based Sequencing Identifies Lipid-Influencing Variants with Ethnicity-Specific Effects in African Americans

    PubMed Central

    Bentley, Amy R.; Chen, Guanjie; Shriner, Daniel; Doumatey, Ayo P.; Zhou, Jie; Huang, Hanxia; Mullikin, James C.; Blakesley, Robert W.; Hansen, Nancy F.; Bouffard, Gerard G.; Cherukuri, Praveen F.; Maskeri, Baishali; Young, Alice C.; Adeyemo, Adebowale; Rotimi, Charles N.

    2014-01-01

    Although a considerable proportion of serum lipids loci identified in European ancestry individuals (EA) replicate in African Americans (AA), interethnic differences in the distribution of serum lipids suggest that some genetic determinants differ by ethnicity. We conducted a comprehensive evaluation of five lipid candidate genes to identify variants with ethnicity-specific effects. We sequenced ABCA1, LCAT, LPL, PON1, and SERPINE1 in 48 AA individuals with extreme serum lipid concentrations (high HDLC/low TG or low HDLC/high TG). Identified variants were genotyped in the full population-based sample of AA (n = 1694) and tested for an association with serum lipids. rs328 (LPL) and correlated variants were associated with higher HDLC and lower TG. Interestingly, a stronger effect was observed on a “European” vs. “African” genetic background at this locus. To investigate this effect, we evaluated the region among West Africans (WA). For TG, the effect size among WA was the same in AA with only African local ancestry (2–3% lower TG), while the larger association among AA with local European ancestry matched previous reports in EA (10%). For HDLC, there was no association with rs328 in AA with only African local ancestry or in WA, while the association among AA with European local ancestry was much greater than what has been observed for EA (15 vs. ∼5 mg/dl), suggesting an interaction with an environmental or genetic factor that differs by ethnicity. Beyond this ancestry effect, the importance of African ancestry-focused, sequence-based work was also highlighted by serum lipid associations of variants that were in higher frequency (or present only) among those of African ancestry. By beginning our study with the sequence variation present in AA individuals, investigating local ancestry effects, and seeking replication in WA, we were able to comprehensively evaluate the role of a set of candidate genes in serum lipids in AA. PMID:24603370

  18. Analysis of common bean expressed sequence tags identifies sulfur metabolic pathways active in seed and sulfur-rich proteins highly expressed in the absence of phaseolin and major lectins

    PubMed Central

    2011-01-01

    Background A deficiency in phaseolin and phytohemagglutinin is associated with a near doubling of sulfur amino acid content in genetically related lines of common bean (Phaseolus vulgaris), particularly cysteine, elevated by 70%, and methionine, elevated by 10%. This mostly takes place at the expense of an abundant non-protein amino acid, S-methyl-cysteine. The deficiency in phaseolin and phytohemagglutinin is mainly compensated by increased levels of the 11S globulin legumin and residual lectins. Legumin, albumin-2, defensin and albumin-1 were previously identified as contributing to the increased sulfur amino acid content in the mutant line, on the basis of similarity to proteins from other legumes. Results Profiling of free amino acid in developing seeds of the BAT93 reference genotype revealed a biphasic accumulation of gamma-glutamyl-S-methyl-cysteine, the main soluble form of S-methyl-cysteine, with a lag phase occurring during storage protein accumulation. A collection of 30,147 expressed sequence tags (ESTs) was generated from four developmental stages, corresponding to distinct phases of gamma-glutamyl-S-methyl-cysteine accumulation, and covering the transitions to reserve accumulation and dessication. Analysis of gene ontology categories indicated the occurrence of multiple sulfur metabolic pathways, including all enzymatic activities responsible for sulfate assimilation, de novo cysteine and methionine biosynthesis. Integration of genomic and proteomic data enabled the identification and isolation of cDNAs coding for legumin, albumin-2, defensin D1 and albumin-1A and -B induced in the absence of phaseolin and phytohemagglutinin. Their deduced amino acid sequences have a higher content of cysteine than methionine, providing an explanation for the preferential increase of cysteine in the mutant line. Conclusion The EST collection provides a foundation to further investigate sulfur metabolism and the differential accumulation of sulfur amino acids in seed

  19. MetaPhinder—Identifying Bacteriophage Sequences in Metagenomic Data Sets

    PubMed Central

    Villarroel, Julia; Lund, Ole; Voldby Larsen, Mette; Nielsen, Morten

    2016-01-01

    Bacteriophages are the most abundant biological entity on the planet, but at the same time do not account for much of the genetic material isolated from most environments due to their small genome sizes. They also show great genetic diversity and mosaic genomes making it challenging to analyze and understand them. Here we present MetaPhinder, a method to identify assembled genomic fragments (i.e.contigs) of phage origin in metagenomic data sets. The method is based on a comparison to a database of whole genome bacteriophage sequences, integrating hits to multiple genomes to accomodate for the mosaic genome structure of many bacteriophages. The method is demonstrated to out-perform both BLAST methods based on single hits and methods based on k-mer comparisons. MetaPhinder is available as a web service at the Center for Genomic Epidemiology https://cge.cbs.dtu.dk/services/MetaPhinder/, while the source code can be downloaded from https://bitbucket.org/genomicepidemiology/metaphinder or https://github.com/vanessajurtz/MetaPhinder. PMID:27684958

  20. Adhesive Proteins of Stalked and Acorn Barnacles Display Homology with Low Sequence Similarities

    PubMed Central

    Jonker, Jaimie-Leigh; Abram, Florence; Pires, Elisabete; Varela Coelho, Ana; Grunwald, Ingo; Power, Anne Marie

    2014-01-01

    Barnacle adhesion underwater is an important phenomenon to understand for the prevention of biofouling and potential biotechnological innovations, yet so far, identifying what makes barnacle glue proteins ‘sticky’ has proved elusive. Examination of a broad range of species within the barnacles may be instructive to identify conserved adhesive domains. We add to extensive information from the acorn barnacles (order Sessilia) by providing the first protein analysis of a stalked barnacle adhesive, Lepas anatifera (order Lepadiformes). It was possible to separate the L. anatifera adhesive into at least 10 protein bands using SDS-PAGE. Intense bands were present at approximately 30, 70, 90 and 110 kilodaltons (kDa). Mass spectrometry for protein identification was followed by de novo sequencing which detected 52 peptides of 7–16 amino acids in length. None of the peptides matched published or unpublished transcriptome sequences, but some amino acid sequence similarity was apparent between L. anatifera and closely-related Dosima fascicularis. Antibodies against two acorn barnacle proteins (ab-cp-52k and ab-cp-68k) showed cross-reactivity in the adhesive glands of L. anatifera. We also analysed the similarity of adhesive proteins across several barnacle taxa, including Pollicipes pollicipes (a stalked barnacle in the order Scalpelliformes). Sequence alignment of published expressed sequence tags clearly indicated that P. pollicipes possesses homologues for the 19 kDa and 100 kDa proteins in acorn barnacles. Homology aside, sequence similarity in amino acid and gene sequences tended to decline as taxonomic distance increased, with minimum similarities of 18–26%, depending on the gene. The results indicate that some adhesive proteins (e.g. 100 kDa) are more conserved within barnacles than others (20 kDa). PMID:25295513

  1. Exome sequencing of hepatocellular carcinomas identifies new mutational signatures and potential therapeutic targets

    DOE PAGES

    Schulze, Kornelius; Imbeaud, Sandrine; Letouzé, Eric; ...

    2015-03-30

    Our genomic analyses promise to improve tumor characterization to optimize personalized treatment for patients with hepatocellular carcinoma (HCC). Exome sequencing analysis of 243 liver tumors identified mutational signatures associated with specific risk factors, mainly combined alcohol and tobacco consumption and exposure to aflatoxin B1. We identified 161 putative driver genes associated with 11 recurrently altered pathways. Associations of mutations defined 3 groups of genes related to risk factors and centered on CTNNB1 (alcohol), TP53 (hepatitis B virus, HBV) and AXIN1. These analyses according to tumor stage progression identified TERT promoter mutation as an early event, whereasFGF3, FGF4, FGF19 or CCND1more » amplification and TP53 and CDKN2A alterations appeared at more advanced stages in aggressive tumors. In 28% of the tumors, we identified genetic alterations potentially targetable by US Food and Drug Administration (FDA)–approved drugs. Finally, we identified risk factor–specific mutational signatures and defined the extensive landscape of altered genes and pathways in HCC, which will be useful to design clinical trials for targeted therapy.« less

  2. Exome sequencing of hepatocellular carcinomas identifies new mutational signatures and potential therapeutic targets

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schulze, Kornelius; Imbeaud, Sandrine; Letouzé, Eric

    Our genomic analyses promise to improve tumor characterization to optimize personalized treatment for patients with hepatocellular carcinoma (HCC). Exome sequencing analysis of 243 liver tumors identified mutational signatures associated with specific risk factors, mainly combined alcohol and tobacco consumption and exposure to aflatoxin B1. We identified 161 putative driver genes associated with 11 recurrently altered pathways. Associations of mutations defined 3 groups of genes related to risk factors and centered on CTNNB1 (alcohol), TP53 (hepatitis B virus, HBV) and AXIN1. These analyses according to tumor stage progression identified TERT promoter mutation as an early event, whereasFGF3, FGF4, FGF19 or CCND1more » amplification and TP53 and CDKN2A alterations appeared at more advanced stages in aggressive tumors. In 28% of the tumors, we identified genetic alterations potentially targetable by US Food and Drug Administration (FDA)–approved drugs. Finally, we identified risk factor–specific mutational signatures and defined the extensive landscape of altered genes and pathways in HCC, which will be useful to design clinical trials for targeted therapy.« less

  3. Validation of rearrangement break points identified by paired-end sequencing in natural populations of Drosophila melanogaster.

    PubMed

    Cridland, Julie M; Thornton, Kevin R

    2010-01-13

    Several recent studies have focused on the evolution of recently duplicated genes in Drosophila. Currently, however, little is known about the evolutionary forces acting upon duplications that are segregating in natural populations. We used a high-throughput, paired-end sequencing platform (Illumina) to identify structural variants in a population sample of African D. melanogaster. Polymerase chain reaction and sequencing confirmation of duplications detected by multiple, independent paired-ends showed that paired-end sequencing reliably uncovered the break points of structural rearrangements and allowed us to identify a number of tandem duplications segregating within a natural population. Our confirmation experiments show that rates of confirmation are very high, even at modest coverage. Our results also compare well with previous studies using microarrays (Emerson J, Cardoso-Moreira M, Borevitz JO, Long M. 2008. Natural selection shapes genome wide patterns of copy-number polymorphism in Drosophila melanogaster. Science. 320:1629-1631. and Dopman EB, Hartl DL. 2007. A portrait of copy-number polymorphism in Drosophila melanogaster. Proc Natl Acad Sci U S A. 104:19920-19925.), which both gives us confidence in the results of this study as well as confirms previous microarray results.We were also able to identify whole-gene duplications, such as a novel duplication of Or22a, an olfactory receptor, and identify copy-number differences in genes previously known to be under positive selection, like Cyp6g1, which confers resistance to dichlorodiphenyltrichloroethane. Several "hot spots" of duplications were detected in this study, which indicate that particular regions of the genome may be more prone to generating duplications. Finally, population frequency analysis of confirmed events also showed an excess of rare variants in our population, which indicates that duplications segregating in the population may be deleterious and ultimately destined to be lost from the

  4. Molecular cloning of actin genes in Trichomonas vaginalis and phylogeny inferred from actin sequences.

    PubMed

    Bricheux, G; Brugerolle, G

    1997-08-01

    The parasitic protozoan Trichomonas vaginalis is known to contain the ubiquitous and highly conserved protein actin. A genomic library and a cDNA library have been screened to identify and clone the actin gene(s) of T. vaginalis. The nucleotide sequence of one gene and its flanking regions have been determined. The open reading frame encodes a protein of 376 amino acids. The sequence is not interrupted by any introns and the promoter could be represented by a 10 bp motif close to a consensus motif also found upstream of most sequenced T. vaginalis genes. The five different clones isolated from the cDNA library have similar sequences and encode three actin proteins differing only by one or two amino acids. A phylogenetic analysis of 31 actin sequences by distance matrix and parsimony methods, using centractin as outgroup, gives congruent trees with Parabasala branching above Diplomonadida.

  5. Anchoring genome sequence to chromosomes of the central bearded dragon (Pogona vitticeps) enables reconstruction of ancestral squamate macrochromosomes and identifies sequence content of the Z chromosome.

    PubMed

    Deakin, Janine E; Edwards, Melanie J; Patel, Hardip; O'Meally, Denis; Lian, Jinmin; Stenhouse, Rachael; Ryan, Sam; Livernois, Alexandra M; Azad, Bhumika; Holleley, Clare E; Li, Qiye; Georges, Arthur

    2016-06-10

    Squamates (lizards and snakes) are a speciose lineage of reptiles displaying considerable karyotypic diversity, particularly among lizards. Understanding the evolution of this diversity requires comparison of genome organisation between species. Although the genomes of several squamate species have now been sequenced, only the green anole lizard has any sequence anchored to chromosomes. There is only limited gene mapping data available for five other squamates. This makes it difficult to reconstruct the events that have led to extant squamate karyotypic diversity. The purpose of this study was to anchor the recently sequenced central bearded dragon (Pogona vitticeps) genome to chromosomes to trace the evolution of squamate chromosomes. Assigning sequence to sex chromosomes was of particular interest for identifying candidate sex determining genes. By using two different approaches to map conserved blocks of genes, we were able to anchor approximately 42 % of the dragon genome sequence to chromosomes. We constructed detailed comparative maps between dragon, anole and chicken genomes, and where possible, made broader comparisons across Squamata using cytogenetic mapping information for five other species. We show that squamate macrochromosomes are relatively well conserved between species, supporting findings from previous molecular cytogenetic studies. Macrochromosome diversity between members of the Toxicofera clade has been generated by intrachromosomal, and a small number of interchromosomal, rearrangements. We reconstructed the ancestral squamate macrochromosomes by drawing upon comparative cytogenetic mapping data from seven squamate species and propose the events leading to the arrangements observed in representative species. In addition, we assigned over 8 Mbp of sequence containing 219 genes to the Z chromosome, providing a list of genes to begin testing as candidate sex determining genes. Anchoring of the dragon genome has provided substantial insight into

  6. Database-independent Protein Sequencing (DiPS) Enables Full-length de Novo Protein and Antibody Sequence Determination.

    PubMed

    Savidor, Alon; Barzilay, Rotem; Elinger, Dalia; Yarden, Yosef; Lindzen, Moshit; Gabashvili, Alexandra; Adiv Tal, Ophir; Levin, Yishai

    2017-06-01

    Traditional "bottom-up" proteomic approaches use proteolytic digestion, LC-MS/MS, and database searching to elucidate peptide identities and their parent proteins. Protein sequences absent from the database cannot be identified, and even if present in the database, complete sequence coverage is rarely achieved even for the most abundant proteins in the sample. Thus, sequencing of unknown proteins such as antibodies or constituents of metaproteomes remains a challenging problem. To date, there is no available method for full-length protein sequencing, independent of a reference database, in high throughput. Here, we present Database-independent Protein Sequencing, a method for unambiguous, rapid, database-independent, full-length protein sequencing. The method is a novel combination of non-enzymatic, semi-random cleavage of the protein, LC-MS/MS analysis, peptide de novo sequencing, extraction of peptide tags, and their assembly into a consensus sequence using an algorithm named "Peptide Tag Assembler." As proof-of-concept, the method was applied to samples of three known proteins representing three size classes and to a previously un-sequenced, clinically relevant monoclonal antibody. Excluding leucine/isoleucine and glutamic acid/deamidated glutamine ambiguities, end-to-end full-length de novo sequencing was achieved with 99-100% accuracy for all benchmarking proteins and the antibody light chain. Accuracy of the sequenced antibody heavy chain, including the entire variable region, was also 100%, but there was a 23-residue gap in the constant region sequence. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.

  7. Hybridization-based antibody cDNA recovery for the production of recombinant antibodies identified by repertoire sequencing.

    PubMed

    Valdés-Alemán, Javier; Téllez-Sosa, Juan; Ovilla-Muñoz, Marbella; Godoy-Lozano, Elizabeth; Velázquez-Ramírez, Daniel; Valdovinos-Torres, Humberto; Gómez-Barreto, Rosa E; Martinez-Barnetche, Jesús

    2014-01-01

    High-throughput sequencing of the antibody repertoire is enabling a thorough analysis of B cell diversity and clonal selection, which may improve the novel antibody discovery process. Theoretically, an adequate bioinformatic analysis could allow identification of candidate antigen-specific antibodies, requiring their recombinant production for experimental validation of their specificity. Gene synthesis is commonly used for the generation of recombinant antibodies identified in silico. Novel strategies that bypass gene synthesis could offer more accessible antibody identification and validation alternatives. We developed a hybridization-based recovery strategy that targets the complementarity-determining region 3 (CDRH3) for the enrichment of cDNA of candidate antigen-specific antibody sequences. Ten clonal groups of interest were identified through bioinformatic analysis of the heavy chain antibody repertoire of mice immunized with hen egg white lysozyme (HEL). cDNA from eight of the targeted clonal groups was recovered efficiently, leading to the generation of recombinant antibodies. One representative heavy chain sequence from each clonal group recovered was paired with previously reported anti-HEL light chains to generate full antibodies, later tested for HEL-binding capacity. The recovery process proposed represents a simple and scalable molecular strategy that could enhance antibody identification and specificity assessment, enabling a more cost-efficient generation of recombinant antibodies.

  8. Whole-exome sequencing identified a variant in EFTUD2 gene in establishing a genetic diagnosis.

    PubMed

    Rengasamy Venugopalan, S; Farrow, E G; Lypka, M

    2017-06-01

    Craniofacial anomalies are complex and have an overlapping phenotype. Mandibulofacial Dysostosis and Oculo-Auriculo-Vertebral Spectrum are conditions that share common craniofacial phenotype and present a challenge in arriving at a diagnosis. In this report, we present a case of female proband who was given a differential diagnosis of Treacher Collins syndrome or Hemifacial Microsomia without certainty. Prior genetic testing reported negative for 22q deletion and FGFR screenings. The objective of this study was to demonstrate the critical role of whole-exome sequencing in establishing a genetic diagnosis of the proband. The participants were 14½-year-old affected female proband/parent trio. Proband/parent trio were enrolled in the study. Surgical tissue sample from the proband and parental blood samples were collected and prepared for whole-exome sequencing. Illumina HiSeq 2500 instrument was used for sequencing (125 nucleotide reads/84X coverage). Analyses of variants were performed using custom-developed software, RUNES and VIKING. Variant analyses following whole-exome sequencing identified a heterozygous de novo pathogenic variant, c.259C>T (p.Gln87*), in EFTUD2 (NM_004247.3) gene in the proband. Previous studies have reported that the variants in EFTUD2 gene were associated with Mandibulofacial Dysostosis with Microcephaly. Patients with facial asymmetry, micrognathia, choanal atresia and microcephaly should be analyzed for variants in EFTUD2 gene. Next-generation sequencing techniques, such as whole-exome sequencing offer great promise to improve the understanding of etiologies of sporadic genetic diseases. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  9. Mining for class-specific motifs in protein sequence classification

    PubMed Central

    2013-01-01

    Background In protein sequence classification, identification of the sequence motifs or n-grams that can precisely discriminate between classes is a more interesting scientific question than the classification itself. A number of classification methods aim at accurate classification but fail to explain which sequence features indeed contribute to the accuracy. We hypothesize that sequences in lower denominations (n-grams) can be used to explore the sequence landscape and to identify class-specific motifs that discriminate between classes during classification. Discriminative n-grams are short peptide sequences that are highly frequent in one class but are either minimally present or absent in other classes. In this study, we present a new substitution-based scoring function for identifying discriminative n-grams that are highly specific to a class. Results We present a scoring function based on discriminative n-grams that can effectively discriminate between classes. The scoring function, initially, harvests the entire set of 4- to 8-grams from the protein sequences of different classes in the dataset. Similar n-grams of the same size are combined to form new n-grams, where the similarity is defined by positive amino acid substitution scores in the BLOSUM62 matrix. Substitution has resulted in a large increase in the number of discriminatory n-grams harvested. Due to the unbalanced nature of the dataset, the frequencies of the n-grams are normalized using a dampening factor, which gives more weightage to the n-grams that appear in fewer classes and vice-versa. After the n-grams are normalized, the scoring function identifies discriminative 4- to 8-grams for each class that are frequent enough to be above a selection threshold. By mapping these discriminative n-grams back to the protein sequences, we obtained contiguous n-grams that represent short class-specific motifs in protein sequences. Our method fared well compared to an existing motif finding method known as

  10. Synthetic signal sequences that enable efficient secretory protein production in the yeast Kluyveromyces marxianus.

    PubMed

    Yarimizu, Tohru; Nakamura, Mikiko; Hoshida, Hisashi; Akada, Rinji

    2015-02-14

    Targeting of cellular proteins to the extracellular environment is directed by a secretory signal sequence located at the N-terminus of a secretory protein. These signal sequences usually contain an N-terminal basic amino acid followed by a stretch containing hydrophobic residues, although no consensus signal sequence has been identified. In this study, simple modeling of signal sequences was attempted using Gaussia princeps secretory luciferase (GLuc) in the yeast Kluyveromyces marxianus, which allowed comprehensive recombinant gene construction to substitute synthetic signal sequences. Mutational analysis of the GLuc signal sequence revealed that the GLuc hydrophobic peptide length was lower limit for effective secretion and that the N-terminal basic residue was indispensable. Deletion of the 16th Glu caused enhanced levels of secreted protein, suggesting that this hydrophilic residue defined the boundary of a hydrophobic peptide stretch. Consequently, we redesigned this domain as a repeat of a single hydrophobic amino acid between the N-terminal Lys and C-terminal Glu. Stretches consisting of Phe, Leu, Ile, or Met were effective for secretion but the number of residues affected secretory activity. A stretch containing sixteen consecutive methionine residues (M16) showed the highest activity; the M16 sequence was therefore utilized for the secretory production of human leukemia inhibitory factor protein in yeast, resulting in enhanced secreted protein yield. We present a new concept for the provision of secretory signal sequence ability in the yeast K. marxianus, determined by the number of residues of a single hydrophobic residue located between N-terminal basic and C-terminal acidic amino acid boundaries.

  11. Optimal MRI sequence for identifying occlusion location in acute stroke: which value of time-resolved contrast-enhanced MRA?

    PubMed

    Le Bras, A; Raoult, H; Ferré, J-C; Ronzière, T; Gauvrit, J-Y

    2015-06-01

    Identifying occlusion location is crucial for determining the optimal therapeutic strategy during the acute phase of ischemic stroke. The purpose of this study was to assess the diagnostic efficacy of MR imaging, including conventional sequences plus time-resolved contrast-enhanced MRA in comparison with DSA for identifying arterial occlusion location. Thirty-two patients with 34 occlusion levels referred for thrombectomy during acute cerebral stroke events were consecutively included from August 2010 to December 2012. Before thrombectomy, we performed 3T MR imaging, including conventional 3D-TOF and gradient-echo T2 sequences, along with time-resolved contrast-enhanced MRA of the extra- and intracranial arteries. The 3D-TOF, gradient-echo T2, and time-resolved contrast-enhanced MRA results were consensually assessed by 2 neuroradiologists and compared with prethrombectomy DSA results in terms of occlusion location. The Wilcoxon test was used for statistical analysis to compare MR imaging sequences with DSA, and the κ coefficient was used to determine intermodality agreement. The occlusion level on the 3D-TOF and gradient-echo T2 images differed significantly from that of DSA (P < .001 and P = .002, respectively), while no significant difference was observed between DSA and time-resolved contrast-enhanced MRA (P = .125). κ coefficients for intermodality agreement with DSA (95% CI, percentage agreement) were 0.43 (0.3%-0.6; 62%), 0.32 (0.2%-0.5; 56%), and 0.81 (0.6%-1.0; 88%) for 3D-TOF, gradient-echo T2, and time-resolved contrast-enhanced MRA, respectively. The time-resolved contrast-enhanced MRA sequence proved reliable for identifying occlusion location in acute stroke with performance superior to that of 3D-TOF and gradient-echo T2 sequences. © 2015 by American Journal of Neuroradiology.

  12. The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color

    USDA-ARS?s Scientific Manuscript database

    Background: Theobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated cacao type. The availability of its genome sequence and methods for identifying genes responsible for important cacao traits will aid cacao researchers and breeders. Results: We describe the sequencing and assembly of...

  13. Novel Genetic Variants of Sporadic Atrial Septal Defect (ASD) in a Chinese Population Identified by Whole-Exome Sequencing (WES).

    PubMed

    Liu, Yong; Cao, Yu; Li, Yaxiong; Lei, Dongyun; Li, Lin; Hou, Zong Liu; Han, Shen; Meng, Mingyao; Shi, Jianlin; Zhang, Yayong; Wang, Yi; Niu, Zhaoyi; Xie, Yanhua; Xiao, Benshan; Wang, Yuanfei; Li, Xiao; Yang, Lirong; Wang, Wenju; Jiang, Lihong

    2018-03-05

    BACKGROUND Recently, mutations in several genes have been described to be associated with sporadic ASD, but some genetic variants remain to be identified. The aim of this study was to use whole-exome sequencing (WES) combined with bioinformatics analysis to identify novel genetic variants in cases of sporadic congenital ASD, followed by validation by Sanger sequencing. MATERIAL AND METHODS Five Han patients with secundum ASD were recruited, and their tissue samples were analyzed by WES, followed by verification by Sanger sequencing of tissue and blood samples. Further evaluation using blood samples included 452 additional patients with sporadic secundum ASD (212 male and 240 female patients) and 519 healthy subjects (252 male and 267 female subjects) for further verification by a multiplexed MassARRAY system. Bioinformatic analyses were performed to identify novel genetic variants associated with sporadic ASD. RESULTS From five patients with sporadic ASD, a total of 181,762 genomic variants in 33 exon loci, validated by Sanger sequencing, were selected and underwent MassARRAY analysis in 452 patients with ASD and 519 healthy subjects. Three loci with high mutation frequencies, the 138665410 FOXL2 gene variant, the 23862952 MYH6 gene variant, and the 71098693 HYDIN gene variant were found to be significantly associated with sporadic ASD (P<0.05); variants in FOXL2 and MYH6 were found in patients with isolated, sporadic ASD (P<5×10^-4). CONCLUSIONS This was the first study that demonstrated variants in FOXL2 and HYDIN associated with sporadic ASD, and supported the use of WES and bioinformatics analysis to identify disease-associated mutations.

  14. Statistical theory for protein combinatorial libraries. Packing interactions, backbone flexibility, and the sequence variability of a main-chain structure.

    PubMed

    Kono, H; Saven, J G

    2001-02-23

    Combinatorial experiments provide new ways to probe the determinants of protein folding and to identify novel folding amino acid sequences. These types of experiments, however, are complicated both by enormous conformational complexity and by large numbers of possible sequences. Therefore, a quantitative computational theory would be helpful in designing and interpreting these types of experiment. Here, we present and apply a statistically based, computational approach for identifying the properties of sequences compatible with a given main-chain structure. Protein side-chain conformations are included in an atom-based fashion. Calculations are performed for a variety of similar backbone structures to identify sequence properties that are robust with respect to minor changes in main-chain structure. Rather than specific sequences, the method yields the likelihood of each of the amino acids at preselected positions in a given protein structure. The theory may be used to quantify the characteristics of sequence space for a chosen structure without explicitly tabulating sequences. To account for hydrophobic effects, we introduce an environmental energy that it is consistent with other simple hydrophobicity scales and show that it is effective for side-chain modeling. We apply the method to calculate the identity probabilities of selected positions of the immunoglobulin light chain-binding domain of protein L, for which many variant folding sequences are available. The calculations compare favorably with the experimentally observed identity probabilities.

  15. Homozygosity Mapping Identifies a Bile Acid Biosynthetic Defect in an Adult with Cirrhosis of Unknown Etiology

    PubMed Central

    Molho-Pessach, Vered; Rios, Jonathan J.; Xing, Chao; Setchell, Kenneth D.R.; Cohen, Jonathan C.; Hobbs, Helen H.

    2013-01-01

    The most common inborn error of bile acid metabolism is 3β-hydroxy-Δ5-C27-steroid oxidoreductase (3β-HSD) deficiency, a disorder that usually presents in early childhood with hepatic dysfunction. Timely diagnosis of this disorder is crucial since it can be effectively treated with primary bile acid replacement. Here we describe a 24-year-old woman from Iran with cirrhosis of unknown etiology. Her sister and a first cousin died of cirrhosis (ages 19 and 6 years) and another 32-year old first cousin had a self-limited liver disorder in childhood that resolved at age 9 years. The family history was consistent with the notion that affected family members were homozygous for a mutant allele inherited identical-by-descent. A genome-wide analysis of 2.5 million single nucleotide polymorphisms (SNP) was performed to identify regions of homozygosity that were present in the proband and the 32-year old first cousin, but not in a healthy relative. One of these regions contained the gene encoding 3β-HSD (HSD3B7). Sequence analysis of HSD3B7 revealed that the proband and her 32-year old cousin were homozygous for a frame shift mutation (c.45_46del AG, p.T15Tfsx27) in exon 1. The diagnosis of 3β-HSD deficiency was confirmed by documenting high levels of 3β-hydroxy-Δ5 bile acids in the serum of the first cousin using mass spectrometry. To our knowledge, the 32-year old relative in this family represents the oldest asymptomatic patient with this disorder. Conclusion: This study highlights the clinical utility of homozygosity mapping in diagnosing autosomal recessive metabolic disorders. This family illustrates the wide variation in expressivity that occurs in 3β-HSD deficiency and underscores the need to consider a bile acid synthetic defect as a possible cause of liver disease in adults. PMID:22095780

  16. Diverse Array of New Viral Sequences Identified in Worldwide Populations of the Asian Citrus Psyllid (Diaphorina citri) Using Viral Metagenomics.

    PubMed

    Nouri, Shahideh; Salem, Nidá; Nigg, Jared C; Falk, Bryce W

    2015-12-16

    The Asian citrus psyllid, Diaphorina citri, is the natural vector of the causal agent of Huanglongbing (HLB), or citrus greening disease. Together; HLB and D. citri represent a major threat to world citrus production. As there is no cure for HLB, insect vector management is considered one strategy to help control the disease, and D. citri viruses might be useful. In this study, we used a metagenomic approach to analyze viral sequences associated with the global population of D. citri. By sequencing small RNAs and the transcriptome coupled with bioinformatics analysis, we showed that the virus-like sequences of D. citri are diverse. We identified novel viral sequences belonging to the picornavirus superfamily, the Reoviridae, Parvoviridae, and Bunyaviridae families, and an unclassified positive-sense single-stranded RNA virus. Moreover, a Wolbachia prophage-related sequence was identified. This is the first comprehensive survey to assess the viral community from worldwide populations of an agricultural insect pest. Our results provide valuable information on new putative viruses, some of which may have the potential to be used as biocontrol agents. Insects have the most species of all animals, and are hosts to, and vectors of, a great variety of known and unknown viruses. Some of these most likely have the potential to be important fundamental and/or practical resources. In this study, we used high-throughput next-generation sequencing (NGS) technology and bioinformatics analysis to identify putative viruses associated with Diaphorina citri, the Asian citrus psyllid. D. citri is the vector of the bacterium causing Huanglongbing (HLB), currently the most serious threat to citrus worldwide. Here, we report several novel viral sequences associated with D. citri. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  17. Molecular cloning and sequence analysis of full-length growth hormone cDNAs from six important economic fishes.

    PubMed

    Zhang, Jing-Nan; Song, Ping; Hu, Jia-Rui; Mo, Sai-Jun; Peng, Mao-Yu; Zhou, Wei; Zou, Ji-Xing; Hu, Yin-Chang

    2005-01-01

    In this study,the full-length cDNAs of GH (Growth Hormone) gene was isolated from six important economic fishes, Siniperca kneri, Epinephelus coioides, Monopterus albus, Silurus asotus, Misgurnus anguillicaudatus and Carassius auratus gibelio Bloch. It is the first time to clone these GH sequences except E. coioides GH. The lengths of the above cDNAs are as follows: 953 bp, 1 023 bp, 825 bp, 1 082 bp, 1 154 bp and 1 180 bp. Each sequence includes an ORF of about 600 bp which encodes a protein of about 200 amino acid: S. kneri, E. coioides and M. albus GHs of 204 amino acid, S. asotus GH of 200 amino acid, M. anguillicaudatus and C. auratus gibelio GHs of 210 amino acid. Then detailed sequence analysis of the six GHs with many other fish sequences was performed. The six sequences all showed high homology to other sequences, especially to sequences within the same order, and many conserved residues were identified, most localized in five domains. The phylogenetic trees (MP and NJ) of many fish GH ORF sequences (including the new six) with Amia calva as outgroup were generally resolved and largely congruent with the morphology-based tree though some incongruities were observed, suggesting GH ORF should be paid more attention to in teleostean phylogeny.

  18. Genome sequence of the thermophilic strain Bacillus coagulans 2-6, an efficient producer of high-optical-purity L-lactic acid.

    PubMed

    Su, Fei; Yu, Bo; Sun, Jibin; Ou, Hong-Yu; Zhao, Bo; Wang, Limin; Qin, Jiayang; Tang, Hongzhi; Tao, Fei; Jarek, Michael; Scharfe, Maren; Ma, Cuiqing; Ma, Yanhe; Xu, Ping

    2011-09-01

    Bacillus coagulans 2-6 is an efficient producer of lactic acid. The genome of B. coagulans 2-6 has the smallest genome among the members of the genus Bacillus known to date. The frameshift mutation at the start of the d-lactate dehydrogenase sequence might be responsible for the production of high-optical-purity l-lactic acid.

  19. Analyses of mitochondrial amino acid sequence datasets support the proposal that specimens of Hypodontus macropi from three species of macropodid hosts represent distinct species

    PubMed Central

    2013-01-01

    Background Hypodontus macropi is a common intestinal nematode of a range of kangaroos and wallabies (macropodid marsupials). Based on previous multilocus enzyme electrophoresis (MEE) and nuclear ribosomal DNA sequence data sets, H. macropi has been proposed to be complex of species. To test this proposal using independent molecular data, we sequenced the whole mitochondrial (mt) genomes of individuals of H. macropi from three different species of hosts (Macropus robustus robustus, Thylogale billardierii and Macropus [Wallabia] bicolor) as well as that of Macropicola ocydromi (a related nematode), and undertook a comparative analysis of the amino acid sequence datasets derived from these genomes. Results The mt genomes sequenced by next-generation (454) technology from H. macropi from the three host species varied from 13,634 bp to 13,699 bp in size. Pairwise comparisons of the amino acid sequences predicted from these three mt genomes revealed differences of 5.8% to 18%. Phylogenetic analysis of the amino acid sequence data sets using Bayesian Inference (BI) showed that H. macropi from the three different host species formed distinct, well-supported clades. In addition, sliding window analysis of the mt genomes defined variable regions for future population genetic studies of H. macropi in different macropodid hosts and geographical regions around Australia. Conclusions The present analyses of inferred mt protein sequence datasets clearly supported the hypothesis that H. macropi from M. robustus robustus, M. bicolor and T. billardierii represent distinct species. PMID:24261823

  20. Exome Sequencing and Linkage Analysis Identified Novel Candidate Genes in Recessive Intellectual Disability Associated with Ataxia.

    PubMed

    Jazayeri, Roshanak; Hu, Hao; Fattahi, Zohreh; Musante, Luciana; Abedini, Seyedeh Sedigheh; Hosseini, Masoumeh; Wienker, Thomas F; Ropers, Hans Hilger; Najmabadi, Hossein; Kahrizi, Kimia

    2015-10-01

    Intellectual disability (ID) is a neuro-developmental disorder which causes considerable socio-economic problems. Some ID individuals are also affected by ataxia, and the condition includes different mutations affecting several genes. We used whole exome sequencing (WES) in combination with homozygosity mapping (HM) to identify the genetic defects in five consanguineous families among our cohort study, with two affected children with ID and ataxia as major clinical symptoms. We identified three novel candidate genes, RIPPLY1, MRPL10, SNX14, and a new mutation in known gene SURF1. All are autosomal genes, except RIPPLY1, which is located on the X chromosome. Two are housekeeping genes, implicated in transcription and translation regulation and intracellular trafficking, and two encode mitochondrial proteins. The pathogenesis of these variants was evaluated by mutation classification, bioinformatic methods, review of medical and biological relevance, co-segregation studies in the particular family, and a normal population study. Linkage analysis and exome sequencing of a small number of affected family members is a powerful new technique which can be used to decrease the number of candidate genes in heterogenic disorders such as ID, and may even identify the responsible gene(s).

  1. The nucleotide sequence of HLA-B{sup *}2704 reveals a new amino acid substitution in exon 4 which is also present in HLA-B{sup *}2706

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rudwaleit, M.; Bowness, P.; Wordsworth, P.

    1996-12-31

    The HLA-B27 subtype HLA-B{sup *}2704 is virtually absent in Caucasians but common in Orientals, where it is associated with ankylosing spondylitis. The amino acid sequence of HLA-B{sup *}2704 has been established by peptide mapping and was shown to differ by two amino acids from HLA-B{sup *}2705, HLA-B{sup *}2704 is characterized by a serine for aspartic acid substitution at position 77 and glutamic acid for valine at position 152. To date, however, no nucleotide sequence confirming these changes at the DNA level has been published. 13 refs., 2 figs.

  2. Nucleic acid arrays and methods of synthesis

    DOEpatents

    Sabanayagam, Chandran R.; Sano, Takeshi; Misasi, John; Hatch, Anson; Cantor, Charles

    2001-01-01

    The present invention generally relates to high density nucleic acid arrays and methods of synthesizing nucleic acid sequences on a solid surface. Specifically, the present invention contemplates the use of stabilized nucleic acid primer sequences immobilized on solid surfaces, and circular nucleic acid sequence templates combined with the use of isothermal rolling circle amplification to thereby increase nucleic acid sequence concentrations in a sample or on an array of nucleic acid sequences.

  3. Identification of Clinical Coryneform Bacterial Isolates: Comparison of Biochemical Methods and Sequence Analysis of 16S rRNA and rpoB Genes▿

    PubMed Central

    Adderson, Elisabeth E.; Boudreaux, Jan W.; Cummings, Jessica R.; Pounds, Stanley; Wilson, Deborah A.; Procop, Gary W.; Hayden, Randall T.

    2008-01-01

    We compared the relative levels of effectiveness of three commercial identification kits and three nucleic acid amplification tests for the identification of coryneform bacteria by testing 50 diverse isolates, including 12 well-characterized control strains and 38 organisms obtained from pediatric oncology patients at our institution. Between 33.3 and 75.0% of control strains were correctly identified to the species level by phenotypic systems or nucleic acid amplification assays. The most sensitive tests were the API Coryne system and amplification and sequencing of the 16S rRNA gene using primers optimized for coryneform bacteria, which correctly identified 9 of 12 control isolates to the species level, and all strains with a high-confidence call were correctly identified. Organisms not correctly identified were species not included in the test kit databases or not producing a pattern of reactions included in kit databases or which could not be differentiated among several genospecies based on reaction patterns. Nucleic acid amplification assays had limited abilities to identify some bacteria to the species level, and comparison of sequence homologies was complicated by the inclusion of allele sequences obtained from uncultivated and uncharacterized strains in databases. The utility of rpoB genotyping was limited by the small number of representative gene sequences that are currently available for comparison. The correlation between identifications produced by different classification systems was poor, particularly for clinical isolates. PMID:18160450

  4. Production of Fatty Acid Components of Meadowfoam Oil in Somatic Soybean Embryos

    PubMed Central

    Cahoon, Edgar B.; Marillia, Elizabeth-France; Stecca, Kevin L.; Hall, Sarah E.; Taylor, David C.; Kinney, Anthony J.

    2000-01-01

    The seed oil of meadowfoam (Limnanthes alba) and other Limnanthes spp. is enriched in the unusual fatty acid Δ5-eicosenoic acid (20:1Δ5). This fatty acid has physical and chemical properties that make the seed oil of these plants useful for a number of industrial applications. An expressed sequence tag approach was used to identify cDNAs for enzymes involved in the biosynthesis of 20:1Δ5). By random sequencing of a library prepared from developing Limnanthes douglasii seeds, a class of cDNAs was identified that encode a homolog of acyl-coenzyme A (CoA) desaturases found in animals, fungi, and cyanobacteria. Expression of a cDNA for the L. douglasii acyl-CoA desaturase homolog in somatic soybean (Glycine max) embryos behind a strong seed-specific promoter resulted in the accumulation of Δ5-hexadecenoic acid to amounts of 2% to 3% (w/w) of the total fatty acids of single embryos. Δ5-Octadecenoic acid and 20:1Δ5 also composed <1% (w/w) each of the total fatty acids of these embryos. In addition, cDNAs were identified from the L. douglasii expressed sequence tags that encode a homolog of fatty acid elongase 1 (FAE1), a β-ketoacyl-CoA synthase that catalyzes the initial step of very long-chain fatty acid synthesis. Expression of the L. douglassi FAE1 homolog in somatic soybean embryos was accompanied by the accumulation of C20 and C22 fatty acids, principally as eicosanoic acid, to amounts of 18% (w/w) of the total fatty acids of single embryos. To partially reconstruct the biosynthetic pathway of 20:1Δ5 in transgenic plant tissues, cDNAs for the L. douglasii acyl-CoA desaturase and FAE1 were co-expressed in somatic soybean embryos. In the resulting transgenic embryos, 20:1Δ5 and Δ5-docosenoic acid composed up to 12% of the total fatty acids. PMID:10982439

  5. Production of fatty acid components of meadowfoam oil in somatic soybean embryos.

    PubMed

    Cahoon, E B; Marillia, E F; Stecca, K L; Hall, S E; Taylor, D C; Kinney, A J

    2000-09-01

    The seed oil of meadowfoam (Limnanthes alba) and other Limnanthes spp. is enriched in the unusual fatty acid Delta(5)-eicosenoic acid (20:1Delta(5)). This fatty acid has physical and chemical properties that make the seed oil of these plants useful for a number of industrial applications. An expressed sequence tag approach was used to identify cDNAs for enzymes involved in the biosynthesis of 20:1Delta(5)). By random sequencing of a library prepared from developing Limnanthes douglasii seeds, a class of cDNAs was identified that encode a homolog of acyl-coenzyme A (CoA) desaturases found in animals, fungi, and cyanobacteria. Expression of a cDNA for the L. douglasii acyl-CoA desaturase homolog in somatic soybean (Glycine max) embryos behind a strong seed-specific promoter resulted in the accumulation of Delta(5)-hexadecenoic acid to amounts of 2% to 3% (w/w) of the total fatty acids of single embryos. Delta(5)-Octadecenoic acid and 20:1Delta(5) also composed <1% (w/w) each of the total fatty acids of these embryos. In addition, cDNAs were identified from the L. douglasii expressed sequence tags that encode a homolog of fatty acid elongase 1 (FAE1), a beta-ketoacyl-CoA synthase that catalyzes the initial step of very long-chain fatty acid synthesis. Expression of the L. douglassi FAE1 homolog in somatic soybean embryos was accompanied by the accumulation of C(20) and C(22) fatty acids, principally as eicosanoic acid, to amounts of 18% (w/w) of the total fatty acids of single embryos. To partially reconstruct the biosynthetic pathway of 20:1Delta(5) in transgenic plant tissues, cDNAs for the L. douglasii acyl-CoA desaturase and FAE1 were co-expressed in somatic soybean embryos. In the resulting transgenic embryos, 20:1Delta(5) and Delta(5)-docosenoic acid composed up to 12% of the total fatty acids.

  6. Epidemiological analysis of Salmonella clusters identified by whole genome sequencing, England and Wales 2014.

    PubMed

    Waldram, Alison; Dolan, Gayle; Ashton, Philip M; Jenkins, Claire; Dallman, Timothy J

    2018-05-01

    The unprecedented level of bacterial strain discrimination provided by whole genome sequencing (WGS) presents new challenges with respect to the utility and interpretation of the data. Whole genome sequences from 1445 isolates of Salmonella belonging to the most commonly identified serotypes in England and Wales isolated between April and August 2014 were analysed. Single linkage single nucleotide polymorphism thresholds at the 10, 5 and 0 level were explored for evidence of epidemiological links between clustered cases. Analysis of the WGS data organised 566 of the 1445 isolates into 32 clusters of five or more. A statistically significant epidemiological link was identified for 17 clusters. The clusters were associated with foreign travel (n = 8), consumption of Chinese takeaways (n = 4), chicken eaten at home (n = 2), and one each of the following; eating out, contact with another case in the home and contact with reptiles. In the same time frame, one cluster was detected using traditional outbreak detection methods. WGS can be used for the highly specific and highly sensitive detection of biologically related isolates when epidemiological links are obscured. Improvements in the collection of detailed, standardised exposure information would enhance cluster investigations. Copyright © 2017 Elsevier Ltd. All rights reserved.

  7. Invasive cleavage of nucleic acids

    DOEpatents

    Prudent, James R.; Hall, Jeff G.; Lyamichev, Victor I.; Brow, Mary Ann D.; Dahlberg, James E.

    1999-01-01

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof.

  8. Invasive cleavage of nucleic acids

    DOEpatents

    Prudent, James R.; Hall, Jeff G.; Lyamichev, Victor I.; Brow, Mary Ann D.; Dahlberg, James E.

    2002-01-01

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof.

  9. GuiTope: an application for mapping random-sequence peptides to protein sequences.

    PubMed

    Halperin, Rebecca F; Stafford, Phillip; Emery, Jack S; Navalkar, Krupa Arun; Johnston, Stephen Albert

    2012-01-03

    Random-sequence peptide libraries are a commonly used tool to identify novel ligands for binding antibodies, other proteins, and small molecules. It is often of interest to compare the selected peptide sequences to the natural protein binding partners to infer the exact binding site or the importance of particular residues. The ability to search a set of sequences for similarity to a set of peptides may sometimes enable the prediction of an antibody epitope or a novel binding partner. We have developed a software application designed specifically for this task. GuiTope provides a graphical user interface for aligning peptide sequences to protein sequences. All alignment parameters are accessible to the user including the ability to specify the amino acid frequency in the peptide library; these frequencies often differ significantly from those assumed by popular alignment programs. It also includes a novel feature to align di-peptide inversions, which we have found improves the accuracy of antibody epitope prediction from peptide microarray data and shows utility in analyzing phage display datasets. Finally, GuiTope can randomly select peptides from a given library to estimate a null distribution of scores and calculate statistical significance. GuiTope provides a convenient method for comparing selected peptide sequences to protein sequences, including flexible alignment parameters, novel alignment features, ability to search a database, and statistical significance of results. The software is available as an executable (for PC) at http://www.immunosignature.com/software and ongoing updates and source code will be available at sourceforge.net.

  10. Whole exome sequencing identifies driver mutations in asymptomatic computed tomography-detected lung cancers with normal karyotype.

    PubMed

    Belloni, Elena; Veronesi, Giulia; Rotta, Luca; Volorio, Sara; Sardella, Domenico; Bernard, Loris; Pece, Salvatore; Di Fiore, Pier Paolo; Fumagalli, Caterina; Barberis, Massimo; Spaggiari, Lorenzo; Pelicci, Pier Giuseppe; Riva, Laura

    2015-04-01

    The efficacy of curative surgery for lung cancer could be largely improved by non-invasive screening programs, which can detect the disease at early stages. We previously showed that 18% of screening-identified lung cancers demonstrate a normal karyotype and, following high-density genome scanning, can be subdivided into samples with 1) numerous; 2) none; and 3) few copy number alterations. Whole exome sequencing was applied to the two normal karyotype, screening-detected lung cancers, constituting group 2, as well as normal controls. We identified mutations in both tumors, including KEAP1 (commonly mutated in lung cancers) in one, and TP53, PMS1, and MSH3 (well-characterized DNA-repair genes) in the other. The two normal karyotype screening-detected lung tumors displayed a typical lung cancer mutational profile that only next generation sequencing could reveal, which offered an additional contribution to the over-diagnosis bias concept hypothesized within lung cancer screening programs. Copyright © 2015 Elsevier Inc. All rights reserved.

  11. microRNA expression profiling in fetal single ventricle malformation identified by deep sequencing.

    PubMed

    Yu, Zhang-Bin; Han, Shu-Ping; Bai, Yun-Fei; Zhu, Chun; Pan, Ya; Guo, Xi-Rong

    2012-01-01

    microRNAs (miRNAs) have emerged as key regulators in many biological processes, particularly cardiac growth and development, although the specific miRNA expression profile associated with this process remains to be elucidated. This study aimed to characterize the cellular microRNA profile involved in the development of congenital heart malformation, through the investigation of single ventricle (SV) defects. Comprehensive miRNA profiling in human fetal SV cardiac tissue was performed by deep sequencing. Differential expression of 48 miRNAs was revealed by sequencing by oligonucleotide ligation and detection (SOLiD) analysis. Of these, 38 were down-regulated and 10 were up-regulated in differentiated SV cardiac tissue, compared to control cardiac tissue. This was confirmed by real-time quantitative reverse transcription-polymerase chain reaction (qRT-PCR) analysis. Predicted target genes of the 48 differentially expressed miRNAs were analyzed by gene ontology and categorized according to cellular process, regulation of biological process and metabolic process. Pathway-Express analysis identified the WNT and mTOR signaling pathways as the most significant processes putatively affected by the differential expression of these miRNAs. The candidate genes involved in cardiac development were identified as potential targets for these differentially expressed microRNAs and the collaborative network of microRNAs and cardiac development related-mRNAs was constructed. These data provide the basis for future investigation of the mechanism of the occurrence and development of fetal SV malformations.

  12. Genome Sequence of Sphingomonas wittichii DP58, the First Reported Phenazine-1-Carboxylic Acid-Degrading Strain

    PubMed Central

    Ma, Zhiwei; Shen, Xuemei; Wang, Wei; Peng, Huasong; Xu, Ping; Zhang, Xuehong

    2012-01-01

    Sphingomonas wittichii DP58 (CCTCC M 2012027), the first reported phenazine-1-carboxylic acid (PCA)-degrading strain, was isolated from pimiento rhizosphere soils. Here we present a 5.6-Mb assembly of its genome. This sequence would contribute to the elucidation of the molecular mechanism of PCA degradation to improve the antifungal's effectiveness or remove superfluous PCA. PMID:22689229

  13. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.; Wang, Chunwei; Jevons, Luis C.; Bernhart, Derek H.; Lipshutz, Robert J.

    2004-05-11

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  14. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    1998-08-18

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  15. Next-generation DNA sequencing identifies novel gene variants and pathways involved in specific language impairment.

    PubMed

    Chen, Xiaowei Sylvia; Reader, Rose H; Hoischen, Alexander; Veltman, Joris A; Simpson, Nuala H; Francks, Clyde; Newbury, Dianne F; Fisher, Simon E

    2017-04-25

    A significant proportion of children have unexplained problems acquiring proficient linguistic skills despite adequate intelligence and opportunity. Developmental language disorders are highly heritable with substantial societal impact. Molecular studies have begun to identify candidate loci, but much of the underlying genetic architecture remains undetermined. We performed whole-exome sequencing of 43 unrelated probands affected by severe specific language impairment, followed by independent validations with Sanger sequencing, and analyses of segregation patterns in parents and siblings, to shed new light on aetiology. By first focusing on a pre-defined set of known candidates from the literature, we identified potentially pathogenic variants in genes already implicated in diverse language-related syndromes, including ERC1, GRIN2A, and SRPX2. Complementary analyses suggested novel putative candidates carrying validated variants which were predicted to have functional effects, such as OXR1, SCN9A and KMT2D. We also searched for potential "multiple-hit" cases; one proband carried a rare AUTS2 variant in combination with a rare inherited haplotype affecting STARD9, while another carried a novel nonsynonymous variant in SEMA6D together with a rare stop-gain in SYNPR. On broadening scope to all rare and novel variants throughout the exomes, we identified biological themes that were enriched for such variants, including microtubule transport and cytoskeletal regulation.

  16. Next-generation DNA sequencing identifies novel gene variants and pathways involved in specific language impairment

    PubMed Central

    Chen, Xiaowei Sylvia; Reader, Rose H.; Hoischen, Alexander; Veltman, Joris A.; Simpson, Nuala H.; Francks, Clyde; Newbury, Dianne F.; Fisher, Simon E.

    2017-01-01

    A significant proportion of children have unexplained problems acquiring proficient linguistic skills despite adequate intelligence and opportunity. Developmental language disorders are highly heritable with substantial societal impact. Molecular studies have begun to identify candidate loci, but much of the underlying genetic architecture remains undetermined. We performed whole-exome sequencing of 43 unrelated probands affected by severe specific language impairment, followed by independent validations with Sanger sequencing, and analyses of segregation patterns in parents and siblings, to shed new light on aetiology. By first focusing on a pre-defined set of known candidates from the literature, we identified potentially pathogenic variants in genes already implicated in diverse language-related syndromes, including ERC1, GRIN2A, and SRPX2. Complementary analyses suggested novel putative candidates carrying validated variants which were predicted to have functional effects, such as OXR1, SCN9A and KMT2D. We also searched for potential “multiple-hit” cases; one proband carried a rare AUTS2 variant in combination with a rare inherited haplotype affecting STARD9, while another carried a novel nonsynonymous variant in SEMA6D together with a rare stop-gain in SYNPR. On broadening scope to all rare and novel variants throughout the exomes, we identified biological themes that were enriched for such variants, including microtubule transport and cytoskeletal regulation. PMID:28440294

  17. Comprehensive analysis of the T-cell receptor beta chain gene in rhesus monkey by high throughput sequencing

    PubMed Central

    Li, Zhoufang; Liu, Guangjie; Tong, Yin; Zhang, Meng; Xu, Ying; Qin, Li; Wang, Zhanhui; Chen, Xiaoping; He, Jiankui

    2015-01-01

    Profiling immune repertoires by high throughput sequencing enhances our understanding of immune system complexity and immune-related diseases in humans. Previously, cloning and Sanger sequencing identified limited numbers of T cell receptor (TCR) nucleotide sequences in rhesus monkeys, thus their full immune repertoire is unknown. We applied multiplex PCR and Illumina high throughput sequencing to study the TCRβ of rhesus monkeys. We identified 1.26 million TCRβ sequences corresponding to 643,570 unique TCRβ sequences and 270,557 unique complementarity-determining region 3 (CDR3) gene sequences. Precise measurements of CDR3 length distribution, CDR3 amino acid distribution, length distribution of N nucleotide of junctional region, and TCRV and TCRJ gene usage preferences were performed. A comprehensive profile of rhesus monkey immune repertoire might aid human infectious disease studies using rhesus monkeys. PMID:25961410

  18. The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color.

    PubMed

    Motamayor, Juan C; Mockaitis, Keithanne; Schmutz, Jeremy; Haiminen, Niina; Livingstone, Donald; Cornejo, Omar; Findley, Seth D; Zheng, Ping; Utro, Filippo; Royaert, Stefan; Saski, Christopher; Jenkins, Jerry; Podicheti, Ram; Zhao, Meixia; Scheffler, Brian E; Stack, Joseph C; Feltus, Frank A; Mustiga, Guiliana M; Amores, Freddy; Phillips, Wilbert; Marelli, Jean Philippe; May, Gregory D; Shapiro, Howard; Ma, Jianxin; Bustamante, Carlos D; Schnell, Raymond J; Main, Dorrie; Gilbert, Don; Parida, Laxmi; Kuhn, David N

    2013-06-03

    Theobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated cacao type. The availability of its genome sequence and methods for identifying genes responsible for important cacao traits will aid cacao researchers and breeders. We describe the sequencing and assembly of the genome of Theobroma cacao L. cultivar Matina 1-6. The genome of the Matina 1-6 cultivar is 445 Mbp, which is significantly larger than a sequenced Criollo cultivar, and more typical of other cultivars. The chromosome-scale assembly, version 1.1, contains 711 scaffolds covering 346.0 Mbp, with a contig N50 of 84.4 kbp, a scaffold N50 of 34.4 Mbp, and an evidence-based gene set of 29,408 loci. Version 1.1 has 10x the scaffold N50 and 4x the contig N50 as Criollo, and includes 111 Mb more anchored sequence. The version 1.1 assembly has 4.4% gap sequence, while Criollo has 10.9%. Through a combination of haplotype, association mapping and gene expression analyses, we leverage this robust reference genome to identify a promising candidate gene responsible for pod color variation. We demonstrate that green/red pod color in cacao is likely regulated by the R2R3 MYB transcription factor TcMYB113, homologs of which determine pigmentation in Rosaceae, Solanaceae, and Brassicaceae. One SNP within the target site for a highly conserved trans-acting siRNA in dicots, found within TcMYB113, seems to affect transcript levels of this gene and therefore pod color variation. We report a high-quality sequence and annotation of Theobroma cacao L. and demonstrate its utility in identifying candidate genes regulating traits.

  19. Somatic mutations in histiocytic sarcoma identified by next generation sequencing.

    PubMed

    Liu, Qingqing; Tomaszewicz, Keith; Hutchinson, Lloyd; Hornick, Jason L; Woda, Bruce; Yu, Hongbo

    2016-08-01

    Histiocytic sarcoma is a rare malignant neoplasm of presumed hematopoietic origin showing morphologic and immunophenotypic evidence of histiocytic differentiation. Somatic mutation importance in the pathogenesis or disease progression of histiocytic sarcoma was largely unknown. To identify somatic mutations in histiocytic sarcoma, we studied 5 histiocytic sarcomas [3 female and 2 male patients; mean age 54.8 (20-72), anatomic sites include lymph node, uterus, and pleura] and matched normal tissues from each patient as germ line controls. Somatic mutations in 50 "Hotspot" oncogenes and tumor suppressor genes were examined using next generation sequencing. Three (out of five) histiocytic sarcoma cases carried somatic mutations in BRAF. Among them, G464V [variant frequency (VF) of 43.6 %] and G466R (VF of 29.6 %) located at the P loop potentially interfere with the hydrophobic interaction between P and activating loops and ultimately activation of BRAF. Also detected was BRAF somatic mutation N581S (VF of 7.4 %), which was located at the catalytic loop of BRAF kinase domain: its role in modifying kinase activity was unclear. A similar mutational analysis was also performed on nine acute monocytic/monoblastic leukemia cases, which did not identify any BRAF somatic mutations. Our study detected several BRAF mutations in histiocytic sarcomas, which may be important in understanding the tumorigenesis of this rare neoplasm and providing mechanisms for potential therapeutical opportunities.

  20. Stable isotope fingerprinting: a novel method for identifying plant, fungal, or bacterial origins of amino acids

    Treesearch

    Thomas Larsen; D. Lee Taylor; Mary Beth Leigh; Diane M. O' Brien

    2009-01-01

    Amino acids play an important role in ecology as essential nutrients for animals and as currencies in symbiotic associations. Here we present a new approach to tracing the origins of amino acids by identifying unique patterns of carbon isotope signatures generated by amino acid synthesis in plants, fungi, and bacteria ("13C fingerprints...

  1. Automated sequence analysis and editing software for HIV drug resistance testing.

    PubMed

    Struck, Daniel; Wallis, Carole L; Denisov, Gennady; Lambert, Christine; Servais, Jean-Yves; Viana, Raquel V; Letsoalo, Esrom; Bronze, Michelle; Aitken, Sue C; Schuurman, Rob; Stevens, Wendy; Schmit, Jean Claude; Rinke de Wit, Tobias; Perez Bercoff, Danielle

    2012-05-01

    Access to antiretroviral treatment in resource-limited-settings is inevitably paralleled by the emergence of HIV drug resistance. Monitoring treatment efficacy and HIV drugs resistance testing are therefore of increasing importance in resource-limited settings. Yet low-cost technologies and procedures suited to the particular context and constraints of such settings are still lacking. The ART-A (Affordable Resistance Testing for Africa) consortium brought together public and private partners to address this issue. To develop an automated sequence analysis and editing software to support high throughput automated sequencing. The ART-A Software was designed to automatically process and edit ABI chromatograms or FASTA files from HIV-1 isolates. The ART-A Software performs the basecalling, assigns quality values, aligns query sequences against a set reference, infers a consensus sequence, identifies the HIV type and subtype, translates the nucleotide sequence to amino acids and reports insertions/deletions, premature stop codons, ambiguities and mixed calls. The results can be automatically exported to Excel to identify mutations. Automated analysis was compared to manual analysis using a panel of 1624 PR-RT sequences generated in 3 different laboratories. Discrepancies between manual and automated sequence analysis were 0.69% at the nucleotide level and 0.57% at the amino acid level (668,047 AA analyzed), and discordances at major resistance mutations were recorded in 62 cases (4.83% of differences, 0.04% of all AA) for PR and 171 (6.18% of differences, 0.03% of all AA) cases for RT. The ART-A Software is a time-sparing tool for pre-analyzing HIV and viral quasispecies sequences in high throughput laboratories and highlighting positions requiring attention. Copyright © 2012 Elsevier B.V. All rights reserved.

  2. Integrating transcriptome and genome re-sequencing data to identify key genes and mutations affecting chicken eggshell qualities.

    PubMed

    Zhang, Quan; Zhu, Feng; Liu, Long; Zheng, Chuan Wei; Wang, De He; Hou, Zhuo Cheng; Ning, Zhong Hua

    2015-01-01

    Eggshell damages lead to economic losses in the egg production industry and are a threat to human health. We examined 49-wk-old Rhode Island White hens (Gallus gallus) that laid eggs having shells with significantly different strengths and thicknesses. We used HiSeq 2000 (Illumina) sequencing to characterize the chicken transcriptome and whole genome to identify the key genes and genetic mutations associated with eggshell calcification. We identified a total of 14,234 genes expressed in the chicken uterus, representing 89% of all annotated chicken genes. A total of 889 differentially expressed genes were identified by comparing low eggshell strength (LES) and normal eggshell strength (NES) genomes. The DEGs are enriched in calcification-related processes, including calcium ion transport and calcium signaling pathways as revealed by gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis. Some important matrix proteins, such as OC-116, LTF and SPP1, were also expressed differentially between two groups. A total of 3,671,919 single-nucleotide polymorphisms (SNPs) and 508,035 Indels were detected in protein coding genes by whole-genome re-sequencing, including 1775 non-synonymous variations and 19 frame-shift Indels in DEGs. SNPs and Indels found in this study could be further investigated for eggshell traits. This is the first report to integrate the transcriptome and genome re-sequencing to target the genetic variations which decreased the eggshell qualities. These findings further advance our understanding of eggshell calcification in the chicken uterus.

  3. Unifying bacteria from decaying wood with various ubiquitous Gibbsiella species as G. acetica sp. nov. based on nucleotide sequence similarities and their acetic acid secretion.

    PubMed

    Geider, Klaus; Gernold, Marina; Jock, Susanne; Wensing, Annette; Völksch, Beate; Gross, Jürgen; Spiteller, Dieter

    2015-12-01

    Bacteria were isolated from necrotic apple and pear tree tissue and from dead wood in Germany and Austria as well as from pear tree exudate in China. They were selected for growth at 37 °C, screened for levan production and then characterized as Gram-negative, facultatively anaerobic rods. Nucleotide sequences from 16S rRNA genes, the housekeeping genes dnaJ, gyrB, recA and rpoB alignments, BLAST searches and phenotypic data confirmed by MALDI-TOF analysis showed that these bacteria belong to the genus Gibbsiella and resembled strains isolated from diseased oaks in Britain and Spain. Gibbsiella-specific PCR primers were designed from the proline isomerase and the levansucrase genes. Acid secretion was investigated by screening for halo formation on calcium carbonate agar and the compound identified by NMR as acetic acid. Its production by Gibbsiella spp. strains was also determined in culture supernatants by GC/MS analysis after derivatization with pentafluorobenzyl bromide. Some strains were differentiated by the PFGE patterns of SpeI digests and by sequence analyses of the lsc and the ppiD genes, and the Chinese Gibbsiella strain was most divergent. The newly investigated bacteria as well as Gibbsiella querinecans, Gibbsiella dentisursi and Gibbsiella papilionis, isolated in Britain, Spain, Korea and Japan, are taxonomically related Enterobacteriaceae, tolerate and secrete acetic acid. We therefore propose to unify them in the species Gibbsiella acetica sp. nov. Copyright © 2015. Published by Elsevier GmbH.

  4. Rapid identification of lettuce seed germination mutants by bulked segregant analysis and whole genome sequencing.

    PubMed

    Huo, Heqiang; Henry, Isabelle M; Coppoolse, Eric R; Verhoef-Post, Miriam; Schut, Johan W; de Rooij, Han; Vogelaar, Aat; Joosen, Ronny V L; Woudenberg, Leo; Comai, Luca; Bradford, Kent J

    2016-11-01

    Lettuce (Lactuca sativa) seeds exhibit thermoinhibition, or failure to complete germination when imbibed at warm temperatures. Chemical mutagenesis was employed to develop lettuce lines that exhibit germination thermotolerance. Two independent thermotolerant lettuce seed mutant lines, TG01 and TG10, were generated through ethyl methanesulfonate mutagenesis. Genetic and physiological analyses indicated that these two mutations were allelic and recessive. To identify the causal gene(s), we applied bulked segregant analysis by whole genome sequencing. For each mutant, bulked DNA samples of segregating thermotolerant (mutant) seeds were sequenced and analyzed for homozygous single-nucleotide polymorphisms. Two independent candidate mutations were identified at different physical positions in the zeaxanthin epoxidase gene (ABSCISIC ACID DEFICIENT 1/ZEAXANTHIN EPOXIDASE, or ABA1/ZEP) in TG01 and TG10. The mutation in TG01 caused an amino acid replacement, whereas the mutation in TG10 resulted in alternative mRNA splicing. Endogenous abscisic acid contents were reduced in both mutants, and expression of the ABA1 gene from wild-type lettuce under its own promoter fully complemented the TG01 mutant. Conventional genetic mapping confirmed that the causal mutations were located near the ZEP/ABA1 gene, but the bulked segregant whole genome sequencing approach more efficiently identified the specific gene responsible for the phenotype. © 2016 The Authors The Plant Journal © 2016 John Wiley & Sons Ltd.

  5. An effective approach for annotation of protein families with low sequence similarity and conserved motifs: identifying GDSL hydrolases across the plant kingdom.

    PubMed

    Vujaklija, Ivan; Bielen, Ana; Paradžik, Tina; Biđin, Siniša; Goldstein, Pavle; Vujaklija, Dušica

    2016-02-18

    The massive accumulation of protein sequences arising from the rapid development of high-throughput sequencing, coupled with automatic annotation, results in high levels of incorrect annotations. In this study, we describe an approach to decrease annotation errors of protein families characterized by low overall sequence similarity. The GDSL lipolytic family comprises proteins with multifunctional properties and high potential for pharmaceutical and industrial applications. The number of proteins assigned to this family has increased rapidly over the last few years. In particular, the natural abundance of GDSL enzymes reported recently in plants indicates that they could be a good source of novel GDSL enzymes. We noticed that a significant proportion of annotated sequences lack specific GDSL motif(s) or catalytic residue(s). Here, we applied motif-based sequence analyses to identify enzymes possessing conserved GDSL motifs in selected proteomes across the plant kingdom. Motif-based HMM scanning (Viterbi decoding-VD and posterior decoding-PD) and the here described PD/VD protocol were successfully applied on 12 selected plant proteomes to identify sequences with GDSL motifs. A significant number of identified GDSL sequences were novel. Moreover, our scanning approach successfully detected protein sequences lacking at least one of the essential motifs (171/820) annotated by Pfam profile search (PfamA) as GDSL. Based on these analyses we provide a curated list of GDSL enzymes from the selected plants. CLANS clustering and phylogenetic analysis helped us to gain a better insight into the evolutionary relationship of all identified GDSL sequences. Three novel GDSL subfamilies as well as unreported variations in GDSL motifs were discovered in this study. In addition, analyses of selected proteomes showed a remarkable expansion of GDSL enzymes in the lycophyte, Selaginella moellendorffii. Finally, we provide a general motif-HMM scanner which is easily accessible through

  6. The sequence, and its evolutionary implications, of a Thermococcus celer protein associated with transcription

    NASA Technical Reports Server (NTRS)

    Kaine, B. P.; Mehr, I. J.; Woese, C. R.

    1994-01-01

    Through random search, a gene from Thermococcus celer has been identified and sequenced that appears to encode a transcription-associated protein (110 amino acid residues). The sequence has clear homology to approximately the last half of an open reading frame reported previously for Sulfolobus acidocaldarius [Langer, D. & Zillig, W. (1993) Nucleic Acids Res. 21, 2251]. The protein translations of these two archaeal genes in turn are homologs of a small subunit found in eukaryotic RNA polymerase I (A12.2) and the counterpart of this from RNA polymerase II (B12.6). Homology is also seen with the eukaryotic transcription factor TFIIS, but it involves only the terminal 45 amino acids of the archaeal proteins. Evolutionary implications of these homologies are discussed.

  7. Novel Genetic Variants of Sporadic Atrial Septal Defect (ASD) in a Chinese Population Identified by Whole-Exome Sequencing (WES)

    PubMed Central

    Liu, Yong; Cao, Yu; Li, Yaxiong; Lei, Dongyun; Li, Lin; Hou, Zong Liu; Han, Shen; Meng, Mingyao; Shi, Jianlin; Zhang, Yayong; Wang, Yi; Niu, Zhaoyi; Xie, Yanhua; Xiao, Benshan; Wang, Yuanfei; Li, Xiao; Yang, Lirong

    2018-01-01

    Background Recently, mutations in several genes have been described to be associated with sporadic ASD, but some genetic variants remain to be identified. The aim of this study was to use whole-exome sequencing (WES) combined with bioinformatics analysis to identify novel genetic variants in cases of sporadic congenital ASD, followed by validation by Sanger sequencing. Material/Methods Five Han patients with secundum ASD were recruited, and their tissue samples were analyzed by WES, followed by verification by Sanger sequencing of tissue and blood samples. Further evaluation using blood samples included 452 additional patients with sporadic secundum ASD (212 male and 240 female patients) and 519 healthy subjects (252 male and 267 female subjects) for further verification by a multiplexed MassARRAY system. Bioinformatic analyses were performed to identify novel genetic variants associated with sporadic ASD. Results From five patients with sporadic ASD, a total of 181,762 genomic variants in 33 exon loci, validated by Sanger sequencing, were selected and underwent MassARRAY analysis in 452 patients with ASD and 519 healthy subjects. Three loci with high mutation frequencies, the 138665410 FOXL2 gene variant, the 23862952 MYH6 gene variant, and the 71098693 HYDIN gene variant were found to be significantly associated with sporadic ASD (P<0.05); variants in FOXL2 and MYH6 were found in patients with isolated, sporadic ASD (P<5×10−4). Conclusions This was the first study that demonstrated variants in FOXL2 and HYDIN associated with sporadic ASD, and supported the use of WES and bioinformatics analysis to identify disease-associated mutations. PMID:29505555

  8. First draft genome sequencing of indole acetic acid producing and plant growth promoting fungus Preussia sp. BSL10.

    PubMed

    Khan, Abdul Latif; Asaf, Sajjad; Khan, Abdur Rahim; Al-Harrasi, Ahmed; Al-Rawahi, Ahmed; Lee, In-Jung

    2016-05-10

    Preussia sp. BSL10, family Sporormiaceae, was actively producing phytohormone (indole-3-acetic acid) and extra-cellular enzymes (phosphatases and glucosidases). The fungus was also promoting the growth of arid-land tree-Boswellia sacra. Looking at such prospects of this fungus, we sequenced its draft genome for the first time. The Illumina based sequence analysis reveals an approximate genome size of 31.4Mbp for Preussia sp. BSL10. Based on ab initio gene prediction, total 32,312 coding sequences were annotated consisting of 11,967 coding genes, pseudogenes, and 221 tRNA genes. Furthermore, 321 carbohydrate-active enzymes were predicted and classified into many functional families. Copyright © 2016 Elsevier B.V. All rights reserved.

  9. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, M.S.

    1998-08-18

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device. 27 figs.

  10. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    2003-08-19

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments may be improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  11. Influence of the Amino Acid Sequence on Protein-Mineral Interactions in Soil

    NASA Astrophysics Data System (ADS)

    Chacon, S. S.; Reardon, P. N.; Purvine, S.; Lipton, M. S.; Washton, N.; Kleber, M.

    2017-12-01

    The intimate associations between protein and mineral surfaces have profound impacts on nutrient cycling in soil. Proteins are an important source of organic C and N, and a subset of proteins, extracellular enzymes (EE), can catalyze the depolymerization of soil organic matter (SOM). Our goal was to determine how variation in the amino acid sequence could influence a protein's susceptibility to become chemically altered by mineral surfaces to infer the fate of adsorbed EE function in soil. We hypothesized that (1) addition of charged amino acids would enhance the adsorption onto oppositely charged mineral surfaces (2) addition of aromatic amino acids would increase adsorption onto zero charged surfaces (3) Increase adsorption of modified proteins would enhance their susceptibility to alterations by redox active minerals. To test these hypotheses, we generated three engineered proxies of a model protein Gb1 (IEP 4.0, 6.2 kDA) by inserting either negatively charged, positively charged or aromatic amino acids in the second loop. These modified proteins were allowed to interact with functionally different mineral surfaces (goethite, montmorillonite, kaolinite and birnessite) at pH 5 and 7. We used LC-MS/MS and solution-state Heteronuclear Single Quantum Coherence Spectroscopy NMR to observe modifications on engineered proteins as a consequence to mineral interactions. Preliminary results indicate that addition of any amino acids to a protein increase its susceptibility to fragmentation and oxidation by redox active mineral surfaces, and alter adsorption to the other mineral surfaces. This suggest that not all mineral surfaces in soil may act as sorbents for EEs and chemical modification of their structure should also be considered as an explanation for decrease in EE activity. Fragmentation of proteins by minerals can bypass the need to produce proteases, but microbial acquisition of other nutrients that require enzymes such as cellulases, ligninases or phosphatases

  12. Amino acid sequences of ribosomal proteins S11 from Bacillus stearothermophilus and S19 from Halobacterium marismortui. Comparison of the ribosomal protein S11 family.

    PubMed

    Kimura, M; Kimura, J; Hatakeyama, T

    1988-11-21

    The complete amino acid sequences of ribosomal proteins S11 from the Gram-positive eubacterium Bacillus stearothermophilus and of S19 from the archaebacterium Halobacterium marismortui have been determined. A search for homologous sequences of these proteins revealed that they belong to the ribosomal protein S11 family. Homologous proteins have previously been sequenced from Escherichia coli as well as from chloroplast, yeast and mammalian ribosomes. A pairwise comparison of the amino acid sequences showed that Bacillus protein S11 shares 68% identical residues with S11 from Escherichia coli and a slightly lower homology (52%) with the homologous chloroplast protein. The halophilic protein S19 is more related to the eukaryotic (45-49%) than to the eubacterial counterparts (35%).

  13. Whole-genome sequencing identifies EN1 as a determinant of bone density and fracture

    PubMed Central

    Zheng, Hou-Feng; Forgetta, Vincenzo; Hsu, Yi-Hsiang; Estrada, Karol; Rosello-Diez, Alberto; Leo, Paul J; Dahia, Chitra L; Park-Min, Kyung Hyun; Tobias, Jonathan H; Kooperberg, Charles; Kleinman, Aaron; Styrkarsdottir, Unnur; Liu, Ching-Ti; Uggla, Charlotta; Evans, Daniel S; Nielson, Carrie M; Walter, Klaudia; Pettersson-Kymmer, Ulrika; McCarthy, Shane; Eriksson, Joel; Kwan, Tony; Jhamai, Mila; Trajanoska, Katerina; Memari, Yasin; Min, Josine; Huang, Jie; Danecek, Petr; Wilmot, Beth; Li, Rui; Chou, Wen-Chi; Mokry, Lauren E; Moayyeri, Alireza; Claussnitzer, Melina; Cheng, Chia-Ho; Cheung, Warren; Medina-Gómez, Carolina; Ge, Bing; Chen, Shu-Huang; Choi, Kwangbom; Oei, Ling; Fraser, James; Kraaij, Robert; Hibbs, Matthew A; Gregson, Celia L; Paquette, Denis; Hofman, Albert; Wibom, Carl; Tranah, Gregory J; Marshall, Mhairi; Gardiner, Brooke B; Cremin, Katie; Auer, Paul; Hsu, Li; Ring, Sue; Tung, Joyce Y; Thorleifsson, Gudmar; Enneman, Anke W; van Schoor, Natasja M; de Groot, Lisette C.P.G.M.; van der Velde, Nathalie; Melin, Beatrice; Kemp, John P; Christiansen, Claus; Sayers, Adrian; Zhou, Yanhua; Calderari, Sophie; van Rooij, Jeroen; Carlson, Chris; Peters, Ulrike; Berlivet, Soizik; Dostie, Josée; Uitterlinden, Andre G; Williams, Stephen R.; Farber, Charles; Grinberg, Daniel; LaCroix, Andrea Z; Haessler, Jeff; Chasman, Daniel I; Giulianini, Franco; Rose, Lynda M; Ridker, Paul M; Eisman, John A; Nguyen, Tuan V; Center, Jacqueline R; Nogues, Xavier; Garcia-Giralt, Natalia; Launer, Lenore L; Gudnason, Vilmunder; Mellström, Dan; Vandenput, Liesbeth; Karlsson, Magnus K; Ljunggren, Östen; Svensson, Olle; Hallmans, Göran; Rousseau, François; Giroux, Sylvie; Bussière, Johanne; Arp, Pascal P; Koromani, Fjorda; Prince, Richard L; Lewis, Joshua R; Langdahl, Bente L; Hermann, A Pernille; Jensen, Jens-Erik B; Kaptoge, Stephen; Khaw, Kay-Tee; Reeve, Jonathan; Formosa, Melissa M; Xuereb-Anastasi, Angela; Åkesson, Kristina; McGuigan, Fiona E; Garg, Gaurav; Olmos, Jose M; Zarrabeitia, Maria T; Riancho, Jose A; Ralston, Stuart H; Alonso, Nerea; Jiang, Xi; Goltzman, David; Pastinen, Tomi; Grundberg, Elin; Gauguier, Dominique; Orwoll, Eric S; Karasik, David; Davey-Smith, George; Smith, Albert V; Siggeirsdottir, Kristin; Harris, Tamara B; Zillikens, M Carola; van Meurs, Joyce BJ; Thorsteinsdottir, Unnur; Maurano, Matthew T; Timpson, Nicholas J; Soranzo, Nicole; Durbin, Richard; Wilson, Scott G; Ntzani, Evangelia E; Brown, Matthew A; Stefansson, Kari; Hinds, David A; Spector, Tim; Cupples, L Adrienne; Ohlsson, Claes; Greenwood, Celia MT; Jackson, Rebecca D; Rowe, David W; Loomis, Cynthia A; Evans, David M; Ackert-Bicknell, Cheryl L; Joyner, Alexandra L; Duncan, Emma L; Kiel, Douglas P; Rivadeneira, Fernando; Richards, J Brent

    2016-01-01

    SUMMARY The extent to which low-frequency (minor allele frequency [MAF] between 1–5%) and rare (MAF ≤ 1%) variants contribute to complex traits and disease in the general population is largely unknown. Bone mineral density (BMD) is highly heritable, is a major predictor of osteoporotic fractures and has been previously associated with common genetic variants1–8, and rare, population-specific, coding variants9. Here we identify novel non-coding genetic variants with large effects on BMD (ntotal = 53,236) and fracture (ntotal = 508,253) in individuals of European ancestry from the general population. Associations for BMD were derived from whole-genome sequencing (n=2,882 from UK10K), whole-exome sequencing (n= 3,549), deep imputation of genotyped samples using a combined UK10K/1000Genomes reference panel (n=26,534), and de-novo replication genotyping (n= 20,271). We identified a low-frequency non-coding variant near a novel locus, EN1, with an effect size 4-fold larger than the mean of previously reported common variants for lumbar spine BMD8 (rs11692564[T], MAF = 1.7%, replication effect size = +0.20 standard deviations [SD], Pmeta = 2×10−14), which was also associated with a decreased risk of fracture (OR = 0.85; P = 2×10−11; ncases = 98,742 and ncontrols = 409,511). Using an En1Cre/flox mouse model, we observed that conditional loss of En1 results in low bone mass, likely as a consequence of high bone turn-over. We also identified a novel low-frequency non-coding variant with large effects on BMD near WNT16 (rs148771817[T], MAF = 1.1%, replication effect size = +0.39 SD, Pmeta = 1×10−11). In general, there was an excess of association signals arising from deleterious coding and conserved non-coding variants. These findings provide evidence that low-frequency non-coding variants have large effects on BMD and fracture, thereby providing rationale for whole-genome sequencing and improved imputation reference panels to study the genetic architecture of

  14. Leaf Transcriptome Sequencing for Identifying Genic-SSR Markers and SNP Heterozygosity in Crossbred Mango Variety 'Amrapali' (Mangifera indica L.).

    PubMed

    Mahato, Ajay Kumar; Sharma, Nimisha; Singh, Akshay; Srivastav, Manish; Jaiprakash; Singh, Sanjay Kumar; Singh, Anand Kumar; Sharma, Tilak Raj; Singh, Nagendra Kumar

    2016-01-01

    Mango (Mangifera indica L.) is called "king of fruits" due to its sweetness, richness of taste, diversity, large production volume and a variety of end usage. Despite its huge economic importance genomic resources in mango are scarce and genetics of useful horticultural traits are poorly understood. Here we generated deep coverage leaf RNA sequence data for mango parental varieties 'Neelam', 'Dashehari' and their hybrid 'Amrapali' using next generation sequencing technologies. De-novo sequence assembly generated 27,528, 20,771 and 35,182 transcripts for the three genotypes, respectively. The transcripts were further assembled into a non-redundant set of 70,057 unigenes that were used for SSR and SNP identification and annotation. Total 5,465 SSR loci were identified in 4,912 unigenes with 288 type I SSR (n ≥ 20 bp). One hundred type I SSR markers were randomly selected of which 43 yielded PCR amplicons of expected size in the first round of validation and were designated as validated genic-SSR markers. Further, 22,306 SNPs were identified by aligning high quality sequence reads of the three mango varieties to the reference unigene set, revealing significantly enhanced SNP heterozygosity in the hybrid Amrapali. The present study on leaf RNA sequencing of mango varieties and their hybrid provides useful genomic resource for genetic improvement of mango.

  15. Diverse novel astroviruses identified in wild Himalayan marmots.

    PubMed

    Ao, Yuan-Yun; Yu, Jie-Mei; Li, Li-Li; Cao, Jing-Yuan; Deng, Hong-Yan; Xin, Yun-Yun; Liu, Meng-Meng; Lin, Lin; Lu, Shan; Xu, Jian-Guo; Duan, Zhao-Jun

    2017-04-01

    With advances in viral surveillance and next-generation sequencing, highly diverse novel astroviruses (AstVs) and different animal hosts had been discovered in recent years. However, the existence of AstVs in marmots had yet to be shown. Here, we identified two highly divergent strains of AstVs (tentatively named Qinghai Himalayanmarmot AstVs, HHMAstV1 and HHMAstV2), by viral metagenomic analysis in liver tissues isolated from wild Marmota himalayana in China. Overall, 12 of 99 (12.1 %) M. himalayana faecal samples were positive for the presence of genetically diverse AstVs, while only HHMAstV1 and HHMAstV2 were identified in 300 liver samples. The complete genomic sequences of HHMAstV1 and HHMAstV2 were 6681 and 6610 nt in length, respectively, with the typical genomic organization of AstVs. Analysis of the complete ORF 2 sequence showed that these novel AstVs are most closely related to the rabbit AstV, mamastrovirus 23 (with 31.0 and 48.0 % shared amino acid identity, respectively). Phylogenetic analysis of the amino acid sequences of ORF1a, ORF1b and ORF2 indicated that HHMAstV1 and HHMAstV2 form two distinct clusters among the mamastroviruses, and may share a common ancestor with the rabbit-specific mamastrovirus 23. These results suggest that HHMAstV1 and HHMAstV2 are two novel species of the genus Mamastrovirus in the Astroviridae. The remarkable diversity of these novel AstVs will contribute to a greater understanding of the evolution and ecology of AstVs, although additional studies will be needed to understand the clinical significance of these novel AstVs in marmots, as well as in humans.

  16. Completed Ensemble Empirical Mode Decomposition: a Robust Signal Processing Tool to Identify Sequence Strata

    NASA Astrophysics Data System (ADS)

    Purba, H.; Musu, J. T.; Diria, S. A.; Permono, W.; Sadjati, O.; Sopandi, I.; Ruzi, F.

    2018-03-01

    Well logging data provide many geological information and its trends resemble nonlinear or non-stationary signals. As long well log data recorded, there will be external factors can interfere or influence its signal resolution. A sensitive signal analysis is required to improve the accuracy of logging interpretation which it becomes an important thing to determine sequence stratigraphy. Complete Ensemble Empirical Mode Decomposition (CEEMD) is one of nonlinear and non-stationary signal analysis method which decomposes complex signal into a series of intrinsic mode function (IMF). Gamma Ray and Spontaneous Potential well log parameters decomposed into IMF-1 up to IMF-10 and each of its combination and correlation makes physical meaning identification. It identifies the stratigraphy and cycle sequence and provides an effective signal treatment method for sequence interface. This method was applied to BRK- 30 and BRK-13 well logging data. The result shows that the combination of IMF-5, IMF-6, and IMF-7 pattern represent short-term and middle-term while IMF-9 and IMF-10 represent the long-term sedimentation which describe distal front and delta front facies, and inter-distributary mouth bar facies, respectively. Thus, CEEMD clearly can determine the different sedimentary layer interface and better identification of the cycle of stratigraphic base level.

  17. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    1999-10-26

    A computer system (1) for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments may be improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area (814) and sample sequences in another area (816) on a display device (3).

  18. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    2001-06-05

    A computer system (1) for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments may be improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area (814) and sample sequences in another area (816) on a display device (3).

  19. Nucleic acid detection kits

    DOEpatents

    Hall, Jeff G.; Lyamichev, Victor I.; Mast, Andrea L.; Brow, Mary Ann; Kwiatkowski, Robert W.; Vavra, Stephanie H.

    2005-03-29

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based on charge. The present invention also provides methods for the detection of non-target cleavage products via the formation of a complete and activated protein binding region. The invention further provides sensitive and specific methods for the detection of nucleic acid from various viruses in a sample.

  20. Cloning and nucleotide sequence of the Pseudomonas aeruginosa glucose-selective OprB porin gene and distribution of OprB within the family Pseudomonadaceae.

    PubMed

    Wylie, J L; Worobec, E A

    1994-03-01

    OprB is a glucose-selective porin known to be produced by Pseudomonas aeruginosa and Pseudomonas putida. We have cloned and sequenced the oprB gene of P. aeruginosa and obtained expression of OprB in Escherichia coli. The mature protein consists of 423 amino acid residues with a deduced molecular mass of 47597 Da. Several clusters of amino acid residues, potentially involved in the structure or function of the protein, were identified. An area of regional homology with E. coli LamB was also identified. Carbohydrate-inducible proteins, potentially homologous to OprB, were identified in several rRNA homology-group-I pseudomonads by sodium dodecyl sulfate/polyacrylamide gel electrophoresis analysis, Western immunoblotting and N-terminal amino acid sequencing. These species also contained DNA that hybridized to a P. aeruginosa oprB gene probe.

  1. The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color

    PubMed Central

    2013-01-01

    Background Theobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated cacao type. The availability of its genome sequence and methods for identifying genes responsible for important cacao traits will aid cacao researchers and breeders. Results We describe the sequencing and assembly of the genome of Theobroma cacao L. cultivar Matina 1-6. The genome of the Matina 1-6 cultivar is 445 Mbp, which is significantly larger than a sequenced Criollo cultivar, and more typical of other cultivars. The chromosome-scale assembly, version 1.1, contains 711 scaffolds covering 346.0 Mbp, with a contig N50 of 84.4 kbp, a scaffold N50 of 34.4 Mbp, and an evidence-based gene set of 29,408 loci. Version 1.1 has 10x the scaffold N50 and 4x the contig N50 as Criollo, and includes 111 Mb more anchored sequence. The version 1.1 assembly has 4.4% gap sequence, while Criollo has 10.9%. Through a combination of haplotype, association mapping and gene expression analyses, we leverage this robust reference genome to identify a promising candidate gene responsible for pod color variation. We demonstrate that green/red pod color in cacao is likely regulated by the R2R3 MYB transcription factor TcMYB113, homologs of which determine pigmentation in Rosaceae, Solanaceae, and Brassicaceae. One SNP within the target site for a highly conserved trans-acting siRNA in dicots, found within TcMYB113, seems to affect transcript levels of this gene and therefore pod color variation. Conclusions We report a high-quality sequence and annotation of Theobroma cacao L. and demonstrate its utility in identifying candidate genes regulating traits. PMID:23731509

  2. Comparative sequence analysis of acid sensitive/resistance proteins in Escherichia coli and Shigella flexneri

    PubMed Central

    Manikandan, Selvaraj; Balaji, Seetharaaman; Kumar, Anil; Kumar, Rita

    2007-01-01

    The molecular basis for the survival of bacteria under extreme conditions in which growth is inhibited is a question of great current interest. A preliminary study was carried out to determine residue pattern conservation among the antiporters of enteric bacteria, responsible for extreme acid sensitivity especially in Escherichia coli and Shigella flexneri. Here we found the molecular evidence that proved the relationship between E. coli and S. flexneri. Multiple sequence alignment of the gadC coded acid sensitive antiporter showed many conserved residue patterns at regular intervals at the N-terminal region. It was observed that as the alignment approaches towards the C-terminal, the number of conserved residues decreases, indicating that the N-terminal region of this protein has much active role when compared to the carboxyl terminal. The motif, FHLVFFLLLGG, is well conserved within the entire gadC coded protein at the amino terminal. The motif is also partially conserved among other antiporters (which are not coded by gadC) but involved in acid sensitive/resistance mechanism. Phylogenetic cluster analysis proves the relationship of Escherichia coli and Shigella flexneri. The gadC coded proteins are converged as a clade and diverged from other antiporters belongs to the amino acid-polyamine-organocation (APC) superfamily. PMID:21670792

  3. Genotyping-by-Sequencing-Based Investigation of the Genetic Architecture Responsible for a ∼Sevenfold Increase in Soybean Seed Stearic Acid.

    PubMed

    Heim, Crystal B; Gillman, Jason D

    2017-01-05

    Soybean oil is highly unsaturated but oxidatively unstable, rendering it nonideal for food applications. Until recently, the majority of soybean oil underwent partial chemical hydrogenation, which produces trans fats as an unavoidable consequence. Dietary intake of trans fats and most saturated fats are conclusively linked to negative impacts on cholesterol levels and cardiovascular health. Two major soybean oil breeding targets are: (1) to reduce or eliminate the need for chemical hydrogenation, and (2) to replace the functional properties of partially hydrogenated soybean oil. One potential solution is the elevation of seed stearic acid, a saturated fat which has no negative impacts on cardiovascular health, from 3 to 4% in typical cultivars to > 20% of the seed oil. We performed QTL analysis of a population developed by crossing two mutant lines, one with a missense mutation affecting a stearoyl-acyl-carrier protein desaturase gene resulting in ∼11% seed stearic acid crossed to another mutant, A6, which has 24-28% seed stearic acid. Genotyping-by-sequencing (GBS)-based QTL mapping identified 21 minor and major effect QTL for six seed oil related traits and plant height. The inheritance of a large genomic deletion affecting chromosome 14 is the basis for largest effect QTL, resulting in ∼18% seed stearic acid. This deletion contains SACPD-C and another gene(s); loss of both genes boosts seed stearic acid levels to ≥ 18%. Unfortunately, this genomic deletion has been shown in previous studies to be inextricably correlated with reduced seed yield. Our results will help inform and guide ongoing breeding efforts to improve soybean oil oxidative stability. Copyright © 2017 Heim and Gillman.

  4. Identifying Students' Conceptions of Basic Principles in Sequence Stratigraphy

    ERIC Educational Resources Information Center

    Herrera, Juan S.; Riggs, Eric M.

    2013-01-01

    Sequence stratigraphy is a major research subject in the geosciences academia and the oil industry. However, the geoscience education literature addressing students' understanding of the basic concepts of sequence stratigraphy is relatively thin, and the topic has not been well explored. We conducted an assessment of 27 students' conceptions of…

  5. Spectrum of mutations in leiomyosarcomas identified by clinical targeted next-generation sequencing.

    PubMed

    Lee, Paul J; Yoo, Naomi S; Hagemann, Ian S; Pfeifer, John D; Cottrell, Catherine E; Abel, Haley J; Duncavage, Eric J

    2017-02-01

    Recurrent genomic mutations in uterine and non-uterine leiomyosarcomas have not been well established. Using a next generation sequencing (NGS) panel of common cancer-associated genes, 25 leiomyosarcomas arising from multiple sites were examined to explore genetic alterations, including single nucleotide variants (SNV), small insertions/deletions (indels), and copy number alterations (CNA). Sequencing showed 86 non-synonymous, coding region somatic variants within 151 gene targets in 21 cases, with a mean of 4.1 variants per case; 4 cases had no putative mutations in the panel of genes assayed. The most frequently altered genes were TP53 (36%), ATM and ATRX (16%), and EGFR and RB1 (12%). CNA were identified in 85% of cases, with the most frequent copy number losses observed in chromosomes 10 and 13 including PTEN and RB1; the most frequent gains were seen in chromosomes 7 and 17. Our data show that deletions in canonical cancer-related genes are common in leiomyosarcomas. Further, the spectrum of gene mutations observed shows that defects in DNA repair and chromosomal maintenance are central to the biology of leiomyosarcomas, and that activating mutations observed in other common cancer types are rare in leiomyosarcomas. Copyright © 2017 Elsevier Inc. All rights reserved.

  6. Whole-exome sequencing in obsessive-compulsive disorder identifies rare mutations in immunological and neurodevelopmental pathways

    PubMed Central

    Cappi, C; Brentani, H; Lima, L; Sanders, S J; Zai, G; Diniz, B J; Reis, V N S; Hounie, A G; Conceição do Rosário, M; Mariani, D; Requena, G L; Puga, R; Souza-Duran, F L; Shavitt, R G; Pauls, D L; Miguel, E C; Fernandez, T V

    2016-01-01

    Studies of rare genetic variation have identified molecular pathways conferring risk for developmental neuropsychiatric disorders. To date, no published whole-exome sequencing studies have been reported in obsessive-compulsive disorder (OCD). We sequenced all the genome coding regions in 20 sporadic OCD cases and their unaffected parents to identify rare de novo (DN) single-nucleotide variants (SNVs). The primary aim of this pilot study was to determine whether DN variation contributes to OCD risk. To this aim, we evaluated whether there is an elevated rate of DN mutations in OCD, which would justify this approach toward gene discovery in larger studies of the disorder. Furthermore, to explore functional molecular correlations among genes with nonsynonymous DN SNVs in OCD probands, a protein–protein interaction (PPI) network was generated based on databases of direct molecular interactions. We applied Degree-Aware Disease Gene Prioritization (DADA) to rank the PPI network genes based on their relatedness to a set of OCD candidate genes from two OCD genome-wide association studies (Stewart et al., 2013; Mattheisen et al., 2014). In addition, we performed a pathway analysis with genes from the PPI network. The rate of DN SNVs in OCD was 2.51 × 10−8 per base per generation, significantly higher than a previous estimated rate in unaffected subjects using the same sequencing platform and analytic pipeline. Several genes harboring DN SNVs in OCD were highly interconnected in the PPI network and ranked high in the DADA analysis. Nearly all the DN SNVs in this study are in genes expressed in the human brain, and a pathway analysis revealed enrichment in immunological and central nervous system functioning and development. The results of this pilot study indicate that further investigation of DN variation in larger OCD cohorts is warranted to identify specific risk genes and to confirm our preliminary finding with regard to PPI network enrichment for particular

  7. Method of increasing conversion of a fatty acid to its corresponding dicarboxylic acid

    DOEpatents

    Craft, David L.; Wilson, C. Ron; Eirich, Dudley; Zhang, Yeyan

    2004-09-14

    A nucleic acid sequence including a CYP promoter operably linked to nucleic acid encoding a heterologous protein is provided to increase transcription of the nucleic acid. Expression vectors and host cells containing the nucleic acid sequence are also provided. The methods and compositions described herein are especially useful in the production of polycarboxylic acids by yeast cells.

  8. Exome Sequencing Identified a Recessive RDH12 Mutation in a Family with Severe Early-Onset Retinitis Pigmentosa

    PubMed Central

    Gong, Bo; Wei, Bo; Huang, Lulin; Hao, Jilong; Li, Xiulan; Yang, Yin; Zhou, Yu; Hao, Fang; Cui, Zhihua; Zhang, Dingding; Wang, Le

    2015-01-01

    Retinitis pigmentosa (RP) is the most important hereditary retinal disease caused by progressive degeneration of the photoreceptor cells. This study is to identify gene mutations responsible for autosomal recessive retinitis pigmentosa (arRP) in a Chinese family using next-generation sequencing technology. A Chinese family with 7 members including two individuals affected with severe early-onset RP was studied. All patients underwent a complete ophthalmic examination. Exome sequencing was performed on a single RP patient (the proband of this family) and direct Sanger sequencing on other family members and normal controls was followed to confirm the causal mutations. A homozygous mutation c.437Tidentified as being related to the phenotype of this arRP family. This homozygous mutation was detected in the two affected patients, but not present in other family members and 600 normal controls. Another three normal members in the family were found to carry this heterozygous missense mutation. Our results emphasize the importance of c.437T

  9. GibbsCluster: unsupervised clustering and alignment of peptide sequences.

    PubMed

    Andreatta, Massimo; Alvarez, Bruno; Nielsen, Morten

    2017-07-03

    Receptor interactions with short linear peptide fragments (ligands) are at the base of many biological signaling processes. Conserved and information-rich amino acid patterns, commonly called sequence motifs, shape and regulate these interactions. Because of the properties of a receptor-ligand system or of the assay used to interrogate it, experimental data often contain multiple sequence motifs. GibbsCluster is a powerful tool for unsupervised motif discovery because it can simultaneously cluster and align peptide data. The GibbsCluster 2.0 presented here is an improved version incorporating insertion and deletions accounting for variations in motif length in the peptide input. In basic terms, the program takes as input a set of peptide sequences and clusters them into meaningful groups. It returns the optimal number of clusters it identified, together with the sequence alignment and sequence motif characterizing each cluster. Several parameters are available to customize cluster analysis, including adjustable penalties for small clusters and overlapping groups and a trash cluster to remove outliers. As an example application, we used the server to deconvolute multiple specificities in large-scale peptidome data generated by mass spectrometry. The server is available at http://www.cbs.dtu.dk/services/GibbsCluster-2.0. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. Purification, characterization and sequence analysis of Omp50,a new porin isolated from Campylobacter jejuni.

    PubMed Central

    Bolla, J M; Dé, E; Dorez, A; Pagès, J M

    2000-01-01

    A novel pore-forming protein identified in Campylobacter was purified by ion-exchange chromatography and named Omp50 according to both its molecular mass and its outer membrane localization. We observed a pore-forming ability of Omp50 after re-incorporation into artificial membranes. The protein induced cation-selective channels with major conductance values of 50-60 pS in 1 M NaCl. N-terminal sequencing allowed us to identify the predicted coding sequence Cj1170c from the Campylobacter jejuni genome database as the corresponding gene in the NCTC 11168 genome sequence. The gene, designated omp50, consists of a 1425 bp open reading frame encoding a deduced 453-amino acid protein with a calculated pI of 5.81 and a molecular mass of 51169.2 Da. The protein possessed a 20-amino acid leader sequence. No significant similarity was found between Omp50 and porin protein sequences already determined. Moreover, the protein showed only weak sequence identity with the major outer-membrane protein (MOMP) of Campylobacter, correlating with the absence of antigenic cross-reactivity between these two proteins. Omp50 is expressed in C. jejuni and Campylobacter lari but not in Campylobacter coli. The gene, however, was detected in all three species by PCR. According to its conformation and functional properties, the protein would belong to the family of outer-membrane monomeric porins. PMID:11104668

  11. Evaluation of microbial community in hydrothermal field by direct DNA sequencing

    NASA Astrophysics Data System (ADS)

    Kawarabayasi, Y.; Maruyama, A.

    2002-12-01

    Many extremophiles have been discovered from terrestrial and marine hydrothermal fields. Some thermophiles can grow beyond 90°C in culture, while direct microscopic analysis occasionally indicates that microbes may survive in much hotter hydrothermal fluids. However, it is very difficult to isolate and cultivate such microbes from the environments, i.e., over 99% of total microbes remains undiscovered. Based on experiences of entire microbial genome analysis (Y.K.) and microbial community analysis (A.M.), we started to find out unique microbes/genes in hydrothermal fields through direct sequencing of environmental DNA fragments. At first, shotgun plasmid libraries were directly constructed with the DNA molecules prepared from mixed microbes collected by an in situ filtration system from low-temperature fluids at RM24 in the Southern East Pacific Rise (S-EPR). A gene amplification (PCR) technique was not used for preventing mutation in the process. The nucleotide sequences of 285 clones indicated that no sequence had identical data in public databases. Among 27 clones determined entire sequences, no ORF was identified on 14 clones like intron in Eukaryote. On four clones, tetra-nucleotide-long multiple tandem repetitive sequences were identified. This type of sequence was identified in some familiar disease in human. The result indicates that living/dead materials with eukaryotic features may exist in this low temperature field. Secondly, shotgun plasmid libraries were constructed from the environmental DNA prepared from Beppu hot springs. In randomly-selected 143 clones used for sequencing, no known sequence was identified. Unlike the clones in S-EPR library, clear ORFs were identified on all nine clones determined the entire sequence. It was found that one clone, H4052, contained the complete Aspartyl-tRNA synthetase. Phylogenetic analysis using amino acid sequences of this gene indicated that this gene was separated from other Euryarchaea before the

  12. Integrating Transcriptome and Genome Re-Sequencing Data to Identify Key Genes and Mutations Affecting Chicken Eggshell Qualities

    PubMed Central

    Liu, Long; Zheng, Chuan Wei; Wang, De He; Hou, Zhuo Cheng; Ning, Zhong Hua

    2015-01-01

    Eggshell damages lead to economic losses in the egg production industry and are a threat to human health. We examined 49-wk-old Rhode Island White hens (Gallus gallus) that laid eggs having shells with significantly different strengths and thicknesses. We used HiSeq 2000 (Illumina) sequencing to characterize the chicken transcriptome and whole genome to identify the key genes and genetic mutations associated with eggshell calcification. We identified a total of 14,234 genes expressed in the chicken uterus, representing 89% of all annotated chicken genes. A total of 889 differentially expressed genes were identified by comparing low eggshell strength (LES) and normal eggshell strength (NES) genomes. The DEGs are enriched in calcification-related processes, including calcium ion transport and calcium signaling pathways as reveled by gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis. Some important matrix proteins, such as OC-116, LTF and SPP1, were also expressed differentially between two groups. A total of 3,671,919 single-nucleotide polymorphisms (SNPs) and 508,035 Indels were detected in protein coding genes by whole-genome re-sequencing, including 1775 non-synonymous variations and 19 frame-shift Indels in DEGs. SNPs and Indels found in this study could be further investigated for eggshell traits. This is the first report to integrate the transcriptome and genome re-sequencing to target the genetic variations which decreased the eggshell qualities. These findings further advance our understanding of eggshell calcification in the chicken uterus. PMID:25974068

  13. High-fidelity target sequencing of individual molecules identified using barcode sequences: de novo detection and absolute quantitation of mutations in plasma cell-free DNA from cancer patients.

    PubMed

    Kukita, Yoji; Matoba, Ryo; Uchida, Junji; Hamakawa, Takuya; Doki, Yuichiro; Imamura, Fumio; Kato, Kikuya

    2015-08-01

    Circulating tumour DNA (ctDNA) is an emerging field of cancer research. However, current ctDNA analysis is usually restricted to one or a few mutation sites due to technical limitations. In the case of massively parallel DNA sequencers, the number of false positives caused by a high read error rate is a major problem. In addition, the final sequence reads do not represent the original DNA population due to the global amplification step during the template preparation. We established a high-fidelity target sequencing system of individual molecules identified in plasma cell-free DNA using barcode sequences; this system consists of the following two steps. (i) A novel target sequencing method that adds barcode sequences by adaptor ligation. This method uses linear amplification to eliminate the errors introduced during the early cycles of polymerase chain reaction. (ii) The monitoring and removal of erroneous barcode tags. This process involves the identification of individual molecules that have been sequenced and for which the number of mutations have been absolute quantitated. Using plasma cell-free DNA from patients with gastric or lung cancer, we demonstrated that the system achieved near complete elimination of false positives and enabled de novo detection and absolute quantitation of mutations in plasma cell-free DNA. © The Author 2015. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  14. Criteria for confirming sequence periodicity identified by Fourier transform analysis: application to GCR2, a candidate plant GPCR?

    PubMed

    Illingworth, Christopher J R; Parkes, Kevin E; Snell, Christopher R; Mullineaux, Philip M; Reynolds, Christopher A

    2008-03-01

    Methods to determine periodicity in protein sequences are useful for inferring function. Fourier transformation is one approach but care is required to ensure the periodicity is genuine. Here we have shown that empirically-derived statistical tables can be used as a measure of significance. Genuine protein sequences data rather than randomly generated sequences were used as the statistical backdrop. The method has been applied to G-protein coupled receptor (GPCR) sequences, by Fourier transformation of hydrophobicity values, codon frequencies and the extent of over-representation of codon pairs; the latter being related to translational step times. Genuine periodicity was observed in the hydrophobicity whereas the apparent periodicity (as inferred from previously reported measures) in the translation step times was not validated statistically. GCR2 has recently been proposed as the plant GPCR receptor for the hormone abscisic acid. It has homology to the Lanthionine synthetase C-like family of proteins, an observation confirmed by fold recognition. Application of the Fourier transform algorithm to the GCR2 family revealed strongly predicted seven fold periodicity in hydrophobicity, suggesting why GCR2 has been reported to be a GPCR, despite negative indications in most transmembrane prediction algorithms. The underlying multiple sequence alignment, also required for the Fourier transform analysis of periodicity, indicated that the hydrophobic regions around the 7 GXXG motifs commence near the C-terminal end of each of the 7 inner helices of the alpha-toroid and continue to the N-terminal region of the helix. The results clearly explain why GCR2 has been understandably but erroneously predicted to be a GPCR.

  15. Quick identification of acetic acid bacteria based on nucleotide sequences of the 16S-23S rDNA internal transcribed spacer region and of the PQQ-dependent alcohol dehydrogenase gene.

    PubMed

    Trcek, Janja

    2005-10-01

    Acetic acid bacteria (AAB) are well known for oxidizing different ethanol-containing substrates into various types of vinegar. They are also used for production of some biotechnologically important products, such as sorbose and gluconic acids. However, their presence is not always appreciated since certain species also spoil wine, juice, beer and fruits. To be able to follow AAB in all these processes, the species involved must be identified accurately and quickly. Because of inaccuracy and very time-consuming phenotypic analysis of AAB, the application of molecular methods is necessary. Since the pairwise comparison among the 16S rRNA gene sequences of AAB shows very high similarity (up to 99.9%) other DNA-targets should be used. Our previous studies showed that the restriction analysis of 16S-23S rDNA internal transcribed spacer region is a suitable approach for quick affiliation of an acetic acid bacterium to a distinct group of restriction types and also for quick identification of a potentially novel species of acetic acid bacterium (Trcek & Teuber 2002; Trcek 2002). However, with the exception of two conserved genes, encoding tRNAIle and tRNAAla, the sequences of 16S-23S rDNA are highly divergent among AAB species. For this reason we analyzed in this study a gene encoding PQQ-dependent ADH as a possible DNA-target. First we confirmed the expression of subunit I of PQQ-dependent ADH (AdhA) also in Asaia, the only genus of AAB which exhibits little or no ADH-activity. Further we analyzed the partial sequences of adhA among some representative species of the genera Acetobacter, Gluconobacter and Gluconacetobacter. The conserved and variable regions in these sequences made possible the construction of A. acetispecific oligonucleotide the specificity of which was confirmed in PCR-reaction using 45 well-defined strains of AAB as DNA-templates. The primer was also successfully used in direct identification of A. aceti from home made cider vinegar as well as for

  16. Amino acid sequences of peptides from a tryptic digest of a urea-soluble protein fraction (U.S.3) from oxidized wool

    PubMed Central

    Corfield, M. C.; Fletcher, J. C.; Robson, A.

    1967-01-01

    1. A tryptic digest of the protein fraction U.S.3 from oxidized wool has been separated into 32 peptide fractions by cation-exchange resin chromatography. 2. Most of these fractions have been resolved into their component peptides by a combination of the techniques of cation-exchange resin chromatography, paper chromatography and paper electrophoresis. 3. The amino acid compositions of 58 of the peptides in the digest present in the largest amounts have been determined. 4. The amino acid sequences of 38 of these have been completely elucidated and those of six others partially derived. 5. These findings indicate that the parent protein in wool from which the protein fraction U.S.3 is derived has a minimum molecular weight of 74000. 6. The structures of wool proteins are discussed in the light of the peptide sequences determined, and, in particular, of those sequences in fraction U.S.3 that could not be elucidated. PMID:16742497

  17. In vivo insertion pool sequencing identifies virulence factors in a complex fungal–host interaction

    PubMed Central

    Uhse, Simon; Pflug, Florian G.; Stirnberg, Alexandra; Ehrlinger, Klaus; von Haeseler, Arndt

    2018-01-01

    Large-scale insertional mutagenesis screens can be powerful genome-wide tools if they are streamlined with efficient downstream analysis, which is a serious bottleneck in complex biological systems. A major impediment to the success of next-generation sequencing (NGS)-based screens for virulence factors is that the genetic material of pathogens is often underrepresented within the eukaryotic host, making detection extremely challenging. We therefore established insertion Pool-Sequencing (iPool-Seq) on maize infected with the biotrophic fungus U. maydis. iPool-Seq features tagmentation, unique molecular barcodes, and affinity purification of pathogen insertion mutant DNA from in vivo-infected tissues. In a proof of concept using iPool-Seq, we identified 28 virulence factors, including 23 that were previously uncharacterized, from an initial pool of 195 candidate effector mutants. Because of its sensitivity and quantitative nature, iPool-Seq can be applied to any insertional mutagenesis library and is especially suitable for genetically complex setups like pooled infections of eukaryotic hosts. PMID:29684023

  18. Isolation, sequencing and expression of RED, a novel human gene encoding an acidic-basic dipeptide repeat.

    PubMed

    Assier, E; Bouzinba-Segard, H; Stolzenberg, M C; Stephens, R; Bardos, J; Freemont, P; Charron, D; Trowsdale, J; Rich, T

    1999-04-16

    A novel human gene RED, and the murine homologue, MuRED, were cloned. These genes were named after the extensive stretch of alternating arginine (R) and glutamic acid (E) or aspartic acid (D) residues that they contain. We term this the 'RED' repeat. The genes of both species were expressed in a wide range of tissues and we have mapped the human gene to chromosome 5q22-24. MuRED and RED shared 98% sequence identity at the amino acid level. The open reading frame of both genes encodes a 557 amino acid protein. RED fused to a fluorescent tag was expressed in nuclei of transfected cells and localised to nuclear dots. Co-localisation studies showed that these nuclear dots did not contain either PML or Coilin, which are commonly found in the POD or coiled body nuclear compartments. Deletion of the amino terminal 265 amino acids resulted in a failure to sort efficiently to the nucleus, though nuclear dots were formed. Deletion of a further 50 amino acids from the amino terminus generates a protein that can sort to the nucleus but is unable to generate nuclear dots. Neither construct localised to the nucleolus. The characteristics of RED and its nuclear localisation implicate it as a regulatory protein, possibly involved in transcription.

  19. Discovery of a novel iflavirus sequence in the eastern paralysis tick Ixodes holocyclus.

    PubMed

    O'Brien, Caitlin A; Hall-Mendelin, Sonja; Hobson-Peters, Jody; Deliyannis, Georgia; Allen, Andy; Lew-Tabor, Ala; Rodriguez-Valle, Manuel; Barker, Dayana; Barker, Stephen C; Hall, Roy A

    2018-05-11

    Ixodes holocyclus, the eastern paralysis tick, is a significant parasite in Australia in terms of animal and human health. However, very little is known about its virome. In this study, next-generation sequencing of I. holocyclus salivary glands yielded a full-length genome sequence which phylogenetically groups with viruses classified in the Iflaviridae family and shares 45% amino acid similarity with its closest relative Bole hyalomma asiaticum virus 1. The sequence of this virus, provisionally named Ixodes holocyclus iflavirus (IhIV) has been identified in tick populations from northern New South Wales and Queensland, Australia and represents the first virus sequence reported from I. holocyclus.

  20. Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform.

    PubMed

    Schirmer, Melanie; Ijaz, Umer Z; D'Amore, Rosalinda; Hall, Neil; Sloan, William T; Quince, Christopher

    2015-03-31

    With read lengths of currently up to 2 × 300 bp, high throughput and low sequencing costs Illumina's MiSeq is becoming one of the most utilized sequencing platforms worldwide. The platform is manageable and affordable even for smaller labs. This enables quick turnaround on a broad range of applications such as targeted gene sequencing, metagenomics, small genome sequencing and clinical molecular diagnostics. However, Illumina error profiles are still poorly understood and programs are therefore not designed for the idiosyncrasies of Illumina data. A better knowledge of the error patterns is essential for sequence analysis and vital if we are to draw valid conclusions. Studying true genetic variation in a population sample is fundamental for understanding diseases, evolution and origin. We conducted a large study on the error patterns for the MiSeq based on 16S rRNA amplicon sequencing data. We tested state-of-the-art library preparation methods for amplicon sequencing and showed that the library preparation method and the choice of primers are the most significant sources of bias and cause distinct error patterns. Furthermore we tested the efficiency of various error correction strategies and identified quality trimming (Sickle) combined with error correction (BayesHammer) followed by read overlapping (PANDAseq) as the most successful approach, reducing substitution error rates on average by 93%. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. A novel NOTCH3 mutation identified in patients with oral cancer by whole exome sequencing.

    PubMed

    Yi, Yanjun; Tian, Zhuowei; Ju, Houyu; Ren, Guoxin; Hu, Jingzhou

    2017-06-01

    Oral cancer is a serious disease caused by environmental factors and/or susceptible genes. In the present study, in order to identify useful genetic biomarkers for cancer prediction and prevention, and for personalized treatment, we detected somatic mutations in 5 pairs of oral cancer tissues and blood samples using whole exome sequencing (WES). Finally, we confirmed a novel nonsense single-nucleotide polymorphism (SNP; chr19:15288426A>C) in the NOTCH3 gene with sanger sequencing, which resulted in a N1438T mutation in the protein sequence. Using multiple in silico analyses, this variant was found to mildly damaging effects on the NOTCH3 gene, which was supported by the results from analyses using PANTHER, SNAP and SNPs&GO. However, further analysis using Mutation Taster revealed that this SNP had a probability of 0.9997 to be 'disease causing'. In addition, we performed 3D structure simulation analysis and the results suggested that this variant had little effect on the solubility and hydrophobicity of the protein and thus on its function; however, it decreased the stability of the protein by increasing the total energy following minimization (-1,051.39 kcal/mol for the mutant and -1,229.84 kcal/mol for the native) and decreasing one stabilizing residue of the protein. Less stability of the N1438T mutant was also supported by analysis using I-Mutant with a DDG value of -1.67. Overall, the present study identified and confirmed a novel mutation in the NOTCH3 gene, which may decrease the stability of NOTCH3, and may thus prove to be helpful in cancer prognosis.

  2. Sequence Analysis of APOA5 Among the Kuwaiti Population Identifies Association of rs2072560, rs2266788, and rs662799 With TG and VLDL Levels

    PubMed Central

    Jasim, Anfal A.; Al-Bustan, Suzanne A.; Al-Kandari, Wafa; Al-Serri, Ahmad; AlAskar, Huda

    2018-01-01

    Common variants of Apolipoprotein A5 (APOA5) have been associated with lipid levels yet very few studies have reported full sequence data from various ethnic groups. The purpose of this study was to analyse the full APOA5 gene sequence to identify variants in 100 healthy Kuwaitis of Arab ethnicities and assess their association with variation in lipid levels in a cohort of 733 samples. Sanger method was used in the direct sequencing of the full 3.7 Kb APOA5 and multiple sequence alignment was used to identify variants. The complete APOA5 sequence in Kuwaiti Arabs has been deposited in GenBank (KJ401315). A total of 20 reported single nucleotide polymorphisms (SNPs) were identified. Two novel SNPs were also identified: a synonymous 2197G>A polymorphism at genomic position 116661525 and a 3′ UTR 3222 C>T polymorphism at genomic position 116660500 based on human genome assembly GRCh37/hg:19. Five SNPs along with the two novel SNPs were selected for validation in the cohort. Association of those SNPs with lipid levels was tested and minor alleles of three SNPs (rs2072560, rs2266788, and rs662799) were found significantly associated with TG and VLDL levels. This is the first study to report the full APOA5 sequence and SNPs in an Arab ethnic group. Analysis of the variants identified and comparison to other populations suggests a distinctive genetic component in Arabs. The positive association observed for rs2072560 and rs2266788 with TG and VLDL levels confirms their role in lipid metabolism. PMID:29686695

  3. Sequence Analysis of APOA5 Among the Kuwaiti Population Identifies Association of rs2072560, rs2266788, and rs662799 With TG and VLDL Levels.

    PubMed

    Jasim, Anfal A; Al-Bustan, Suzanne A; Al-Kandari, Wafa; Al-Serri, Ahmad; AlAskar, Huda

    2018-01-01

    Common variants of Apolipoprotein A5 ( APOA 5) have been associated with lipid levels yet very few studies have reported full sequence data from various ethnic groups. The purpose of this study was to analyse the full APOA5 gene sequence to identify variants in 100 healthy Kuwaitis of Arab ethnicities and assess their association with variation in lipid levels in a cohort of 733 samples. Sanger method was used in the direct sequencing of the full 3.7 Kb APOA5 and multiple sequence alignment was used to identify variants. The complete APOA5 sequence in Kuwaiti Arabs has been deposited in GenBank (KJ401315). A total of 20 reported single nucleotide polymorphisms (SNPs) were identified. Two novel SNPs were also identified: a synonymous 2197G>A polymorphism at genomic position 116661525 and a 3' UTR 3222 C>T polymorphism at genomic position 116660500 based on human genome assembly GRCh37/hg:19. Five SNPs along with the two novel SNPs were selected for validation in the cohort. Association of those SNPs with lipid levels was tested and minor alleles of three SNPs (rs2072560, rs2266788, and rs662799) were found significantly associated with TG and VLDL levels. This is the first study to report the full APOA5 sequence and SNPs in an Arab ethnic group. Analysis of the variants identified and comparison to other populations suggests a distinctive genetic component in Arabs. The positive association observed for rs2072560 and rs2266788 with TG and VLDL levels confirms their role in lipid metabolism.

  4. Mutant fatty acid desaturase and methods for directed mutagenesis

    DOEpatents

    Shanklin, John [Shoreham, NY; Whittle, Edward J [Greenport, NY

    2008-01-29

    The present invention relates to methods for producing fatty acid desaturase mutants having a substantially increased activity towards substrates with fewer than 18 carbon atom chains relative to an unmutagenized precursor desaturase having an 18 carbon chain length specificity, the sequences encoding the desaturases and to the desaturases that are produced by the methods. The present invention further relates to a method for altering a function of a protein, including a fatty acid desaturase, through directed mutagenesis involving identifying candidate amino acid residues, producing a library of mutants of the protein by simultaneously randomizing all amino acid candidates, and selecting for mutants which exhibit the desired alteration of function. Candidate amino acids are identified by a combination of methods. Enzymatic, binding, structural and other functions of proteins can be altered by the method.

  5. Characterization of relative abundance of lactic acid bacteria species in French organic sourdough by cultural, qPCR and MiSeq high-throughput sequencing methods.

    PubMed

    Michel, Elisa; Monfort, Clarisse; Deffrasnes, Marion; Guezenec, Stéphane; Lhomme, Emilie; Barret, Matthieu; Sicard, Delphine; Dousset, Xavier; Onno, Bernard

    2016-12-19

    In order to contribute to the description of sourdough LAB composition, MiSeq sequencing and qPCR methods were performed in association with cultural methods. A panel of 16 French organic bakers and farmer-bakers were selected for this work. The lactic acid bacteria (LAB) diversity of their organic sourdoughs was investigated quantitatively and qualitatively combining (i) Lactobacillus sanfranciscensis-specific qPCR, (ii) global sequencing with MiSeq Illumina technology and (iii) molecular isolates identification. In addition, LAB and yeast enumeration, pH, Total Titratable Acidity, organic acids and bread specific volume were analyzed. Microbial and physico-chemical data were statistically treated by Principal Component Analysis (PCA) and Hierarchical Ascendant Classification (HAC). Total yeast counts were 6 log 10 to 7.6 log 10 CFU/g while LAB counts varied from 7.2 log 10 to 9.6 log 10 CFU/g. Values obtained by L. sanfranciscensis-specific qPCR were estimated between 7.2 and 10.3 log 10 CFU/g, except for one sample at 4.4 log 10 CFU/g. HAC and PCA clustered the sixteen sourdoughs into three classes described by their variables but without links to bakers' practices. L. sanfranciscensis was the dominant species in 13 of the 16 sourdoughs analyzed by Next Generation Sequencing (NGS), by the culture dependent method this species was dominant only in only 10 samples. Based on isolates identification, LAB diversity was higher for 7 sourdoughs with the recovery of L. curvatus, L. brevis, L. heilongjiangensis, L. xiangfangensis, L. koreensis, L. pontis, Weissella sp. and Pediococcus pentosaceus, as the most representative species. L. koreensis, L. heilongjiangensis and L. xiangfangensis were identified in traditional Asian food and here for the first time as dominant in organic sourdough. This study highlighted that L. sanfranciscensis was not the major species in 6/16 sourdough samples and that a relatively high LAB diversity can be observed in French organic

  6. Investigating Salmonella Eko from Various Sources in Nigeria by Whole Genome Sequencing to Identify the Source of Human Infections

    PubMed Central

    Leekitcharoenphon, Pimlapas; Raufu, Ibrahim; Nielsen, Mette T.; Rosenqvist Lund, Birthe S.; Ameh, James A.; Ambali, Abdul G.; Sørensen, Gitte; Le Hello, Simon; Aarestrup, Frank M.; Hendriksen, Rene S.

    2016-01-01

    Twenty-six Salmonella enterica serovar Eko isolated from various sources in Nigeria were investigated by whole genome sequencing to identify the source of human infections. Diversity among the isolates was observed and camel and cattle were identified as the primary reservoirs and the most likely source of the human infections. PMID:27228329

  7. Lyophilisation and concentration of chitosan/siRNA polyplexes: Influence of buffer composition, oligonucleotide sequence, and hyaluronic acid coating.

    PubMed

    Veilleux, Daniel; Gopalakrishna Panicker, Rajesh Krishnan; Chevrier, Anik; Biniecki, Kristof; Lavertu, Marc; Buschmann, Michael D

    2018-02-15

    Chitosan (CS)/siRNA polyplexes have great therapeutic potential for treating multiple diseases by gene silencing. However, clinical application of this technology requires the development of concentrated, hemocompatible, pH neutral formulations for safe and efficient administration. In this study we evaluate physicochemical properties of chitosan polyplexes in various buffers at increasing ionic strengths, to identify conditions for freeze-drying and rehydration at higher doses of uncoated or hyaluronic acid (HA)-coated polyplexes while maintaining physiological compatibility. Optimized formulations are used to evaluate the impact of the siRNA/oligonucleotide sequence on polyplex physicochemical properties, and to measure their in vitro silencing efficiency, cytotoxicity, and hemocompatibility. Specific oligonucleotide sequences influence polyplex physical properties at low N:P ratios, as well as their stability during freeze-drying. Nanoparticles display greater stability for oligodeoxynucleotides ODN vs siRNA; AT-rich vs GC-rich; and overhangs vs blunt ends. Using this knowledge, various CS/siRNA polyplexes are prepared with and without HA coating, freeze-dried and rehydrated at increased concentrations using reduced rehydration volumes. These polyplexes are non-cytotoxic and preserve silencing activity even after rehydration to 20-fold their initial concentration, while HA-coated polyplexes at pH∼7 also displayed increased hemocompatibility. These concentrated formulations represent a critical step towards clinical development of chitosan-based oligonucleotide intravenous delivery systems. Copyright © 2017 Elsevier Inc. All rights reserved.

  8. Analytical Framework for Identifying and Differentiating Recent Hitchhiking and Severe Bottleneck Effects from Multi-Locus DNA Sequence Data

    DOE PAGES

    Sargsyan, Ori

    2012-05-25

    Hitchhiking and severe bottleneck effects have impact on the dynamics of genetic diversity of a population by inducing homogenization at a single locus and at the genome-wide scale, respectively. As a result, identification and differentiation of the signatures of such events from DNA sequence data at a single locus is challenging. This study develops an analytical framework for identifying and differentiating recent homogenization events at multiple neutral loci in low recombination regions. The dynamics of genetic diversity at a locus after a recent homogenization event is modeled according to the infinite-sites mutation model and the Wright-Fisher model of reproduction withmore » constant population size. In this setting, I derive analytical expressions for the distribution, mean, and variance of the number of polymorphic sites in a random sample of DNA sequences from a locus affected by a recent homogenization event. Based on this framework, three likelihood-ratio based tests are presented for identifying and differentiating recent homogenization events at multiple loci. Lastly, I apply the framework to two data sets. First, I consider human DNA sequences from four non-coding loci on different chromosomes for inferring evolutionary history of modern human populations. The results suggest, in particular, that recent homogenization events at the loci are identifiable when the effective human population size is 50000 or greater in contrast to 10000, and the estimates of the recent homogenization events are agree with the “Out of Africa” hypothesis. Second, I use HIV DNA sequences from HIV-1-infected patients to infer the times of HIV seroconversions. The estimates are contrasted with other estimates derived as the mid-time point between the last HIV-negative and first HIV-positive screening tests. Finally, the results show that significant discrepancies can exist between the estimates.« less

  9. New FeFe-hydrogenase genes identified in a metagenomic fosmid library from a municipal wastewater treatment plant as revealed by high-throughput sequencing.

    PubMed

    Tomazetto, Geizecler; Wibberg, Daniel; Schlüter, Andreas; Oliveira, Valéria M

    2015-01-01

    A fosmid metagenomic library was constructed with total community DNA obtained from a municipal wastewater treatment plant (MWWTP), with the aim of identifying new FeFe-hydrogenase genes encoding the enzymes most important for hydrogen metabolism. The dataset generated by pyrosequencing of a fosmid library was mined to identify environmental gene tags (EGTs) assigned to FeFe-hydrogenase. The majority of EGTs representing FeFe-hydrogenase genes were affiliated with the class Clostridia, suggesting that this group is the main hydrogen producer in the MWWTP analyzed. Based on assembled sequences, three FeFe-hydrogenase genes were predicted based on detection of the L2 motif (MPCxxKxxE) in the encoded gene product, confirming true FeFe-hydrogenase sequences. These sequences were used to design specific primers to detect fosmids encoding FeFe-hydrogenase genes predicted from the dataset. Three identified fosmids were completely sequenced. The cloned genomic fragments within these fosmids are closely related to members of the Spirochaetaceae, Bacteroidales and Firmicutes, and their FeFe-hydrogenase sequences are characterized by the structure type M3, which is common to clostridial enzymes. FeFe-hydrogenase sequences found in this study represent hitherto undetected sequences, indicating the high genetic diversity regarding these enzymes in MWWTP. Results suggest that MWWTP have to be considered as reservoirs for new FeFe-hydrogenase genes. Copyright © 2014 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.

  10. Recoding method that removes inhibitory sequences and improves HIV gene expression

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rabadan, Raul; Krasnitz, Michael; Robins, Harlan

    The invention relates to inhibitory nucleotide signal sequences or "INS" sequences in the genomes of lentiviruses. In particular the invention relates to the AGG motif present in all viral genomes. The AGG motif may have an inhibitory effect on a virus, for example by reducing the levels of, or maintaining low steady-state levels of, viral RNAs in host cells, and inducing and/or maintaining in viral latency. In one aspect, the invention provides vaccines that contain, or are produced from, viral nucleic acids in which the AGG sequences have been mutated. In another aspect, the invention provides methods and compositions formore » affecting the function of the AGG motif, and methods for identifying other INS sequences in viral genomes.« less

  11. Isolation and amino acid sequences of squirrel monkey (Saimiri sciurea) insulin and glucagon

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yu, Jinghua; Eng, J.; Yalow, R.S.

    1990-12-01

    It was reported two decades ago that insulin was not detectable in the glucose-stimulated state in Saimiri sciurea, the New World squirrel monkey, by a radioimmunoassay system developed with guinea pig anti-pork insulin antibody and labeled park insulin. With the same system, reasonable levels were observed in rhesus monkeys and chimpanzees. This suggested that New World monkeys, like the New World hystricomorph rodents such as the guinea pig and the coypu, might have insulins whose sequences differ markedly from those of Old World mammals. In this report the authors describe the purification and amino acid sequences of squirrel monkey insulinmore » and glucagon. They demonstrate that the substitutions at B29, B27, A2, A4, and A17 of squirrel monkey insulin are identical with those previously found in another New World primate, the owl monkey (Aotus trivirgatus). The immunologic cross-reactivity of this insulin in their immunoassay system is only a few percent of that of human insulin. It appears that the peptides of the New World monkeys have diverged less from those of the Old World mammals than have those of the New World hystricomorph rodents. The striking improvements in peptide purification and sequencing have the potential for adding new information concerning the evolutionary divergence of species.« less

  12. VISA--Vector Integration Site Analysis server: a web-based server to rapidly identify retroviral integration sites from next-generation sequencing.

    PubMed

    Hocum, Jonah D; Battrell, Logan R; Maynard, Ryan; Adair, Jennifer E; Beard, Brian C; Rawlings, David J; Kiem, Hans-Peter; Miller, Daniel G; Trobridge, Grant D

    2015-07-07

    Analyzing the integration profile of retroviral vectors is a vital step in determining their potential genotoxic effects and developing safer vectors for therapeutic use. Identifying retroviral vector integration sites is also important for retroviral mutagenesis screens. We developed VISA, a vector integration site analysis server, to analyze next-generation sequencing data for retroviral vector integration sites. Sequence reads that contain a provirus are mapped to the human genome, sequence reads that cannot be localized to a unique location in the genome are filtered out, and then unique retroviral vector integration sites are determined based on the alignment scores of the remaining sequence reads. VISA offers a simple web interface to upload sequence files and results are returned in a concise tabular format to allow rapid analysis of retroviral vector integration sites.

  13. A Nasal Brush-based Classifier of Asthma Identified by Machine Learning Analysis of Nasal RNA Sequence Data.

    PubMed

    Pandey, Gaurav; Pandey, Om P; Rogers, Angela J; Ahsen, Mehmet E; Hoffman, Gabriel E; Raby, Benjamin A; Weiss, Scott T; Schadt, Eric E; Bunyavanich, Supinda

    2018-06-11

    Asthma is a common, under-diagnosed disease affecting all ages. We sought to identify a nasal brush-based classifier of mild/moderate asthma. 190 subjects with mild/moderate asthma and controls underwent nasal brushing and RNA sequencing of nasal samples. A machine learning-based pipeline identified an asthma classifier consisting of 90 genes interpreted via an L2-regularized logistic regression classification model. This classifier performed with strong predictive value and sensitivity across eight test sets, including (1) a test set of independent asthmatic and control subjects profiled by RNA sequencing (positive and negative predictive values of 1.00 and 0.96, respectively; AUC of 0.994), (2) two independent case-control cohorts of asthma profiled by microarray, and (3) five cohorts with other respiratory conditions (allergic rhinitis, upper respiratory infection, cystic fibrosis, smoking), where the classifier had a low to zero misclassification rate. Following validation in large, prospective cohorts, this classifier could be developed into a nasal biomarker of asthma.

  14. Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition.

    PubMed

    Tamura, Takeyuki; Akutsu, Tatsuya

    2007-11-30

    Subcellular location prediction of proteins is an important and well-studied problem in bioinformatics. This is a problem of predicting which part in a cell a given protein is transported to, where an amino acid sequence of the protein is given as an input. This problem is becoming more important since information on subcellular location is helpful for annotation of proteins and genes and the number of complete genomes is rapidly increasing. Since existing predictors are based on various heuristics, it is important to develop a simple method with high prediction accuracies. In this paper, we propose a novel and general predicting method by combining techniques for sequence alignment and feature vectors based on amino acid composition. We implemented this method with support vector machines on plant data sets extracted from the TargetP database. Through fivefold cross validation tests, the obtained overall accuracies and average MCC were 0.9096 and 0.8655 respectively. We also applied our method to other datasets including that of WoLF PSORT. Although there is a predictor which uses the information of gene ontology and yields higher accuracy than ours, our accuracies are higher than existing predictors which use only sequence information. Since such information as gene ontology can be obtained only for known proteins, our predictor is considered to be useful for subcellular location prediction of newly-discovered proteins. Furthermore, the idea of combination of alignment and amino acid frequency is novel and general so that it may be applied to other problems in bioinformatics. Our method for plant is also implemented as a web-system and available on http://sunflower.kuicr.kyoto-u.ac.jp/~tamura/slpfa.html.

  15. Amino acid sequences of the ribosomal proteins HL30 and HmaL5 from the archaebacterium Halobacterium marismortui.

    PubMed

    Hatakeyama, T; Hatakeyama, T

    1990-07-06

    The complete amino acid sequences of the ribosomal proteins HL30 and HmaL5 from the archaebacterium Halobacterium marismortui were determined. Protein HL30 was found to be acetylated at its N-terminal amino acid and shows homology to the eukaryotic ribosomal proteins YL34 from yeast and RL31 from rat. Protein HmaL5 was homologous to the protein L5 from Escherichia coli and Bacillus stearothermophilus as well as to YL16 from yeast. HmaL5 shows more similarities to its eukaryotic counterpart than to eubacterial ones.

  16. Characterization of 47 MHC class I sequences in Filipino cynomolgus macaques

    PubMed Central

    Campbell, Kevin J.; Detmer, Ann M.; Karl, Julie A.; Wiseman, Roger W.; Blasky, Alex J.; Hughes, Austin L.; Bimber, Benjamin N.; O’Connor, Shelby L.; O’Connor, David H.

    2009-01-01

    Cynomolgus macaques (Macaca fascicularis) provide increasingly common models for infectious disease research. Several geographically distinct populations of these macaques from Southeast Asia and the Indian Ocean island of Mauritius are available for pathogenesis studies. Though host genetics may profoundly impact results of such studies, similarities and differences between populations are often overlooked. In this study we identified 47 full-length MHC class I nucleotide sequences in 16 cynomolgus macaques of Filipino origin. The majority of MHC class I sequences characterized (39 of 47) were unique to this regional population. However, we discovered eight sequences with perfect identity and six sequences with close similarity to previously defined MHC class I sequences from other macaque populations. We identified two ancestral MHC haplotypes that appear to be shared between Filipino and Mauritian cynomolgus macaques, notably a Mafa-B haplotype that has previously been shown to protect Mauritian cynomolgus macaques against challenge with a simian/human immunodeficiency virus, SHIV89.6P. We also identified a Filipino cynomolgus macaque MHC class I sequence for which the predicted protein sequence differs from Mamu-B*17 by a single amino acid. This is important because Mamu-B*17 is strongly associated with protection against simian immunodeficiency virus (SIV) challenge in Indian rhesus macaques. These findings have implications for the evolutionary history of Filipino cynomolgus macaques as well as for the use of this model in SIV/SHIV research protocols. PMID:19107381

  17. Molecular profiling of appendiceal epithelial tumors using massively parallel sequencing to identify somatic mutations.

    PubMed

    Liu, Xiaoying; Mody, Kabir; de Abreu, Francine B; Pipas, J Marc; Peterson, Jason D; Gallagher, Torrey L; Suriawinata, Arief A; Ripple, Gregory H; Hourdequin, Kathryn C; Smith, Kerrington D; Barth, Richard J; Colacchio, Thomas A; Tsapakos, Michael J; Zaki, Bassem I; Gardner, Timothy B; Gordon, Stuart R; Amos, Christopher I; Wells, Wendy A; Tsongalis, Gregory J

    2014-07-01

    Some epithelial neoplasms of the appendix, including low-grade appendiceal mucinous neoplasm and adenocarcinoma, can result in pseudomyxoma peritonei (PMP). Little is known about the mutational spectra of these tumor types and whether mutations may be of clinical significance with respect to therapeutic selection. In this study, we identified somatic mutations using the Ion Torrent AmpliSeq Cancer Hotspot Panel v2. Specimens consisted of 3 nonneoplastic retention cysts/mucocele, 15 low-grade mucinous neoplasms (LAMNs), 8 low-grade/well-differentiated mucinous adenocarcinomas with pseudomyxoma peritonei, and 12 adenocarcinomas with/without goblet cell/signet ring cell features. Barcoded libraries were prepared from up to 10 ng of extracted DNA and multiplexed on single 318 chips for sequencing. Data analysis was performed using Golden Helix SVS. Variants that remained after the analysis pipeline were individually interrogated using the Integrative Genomics Viewer. A single Janus kinase 3 (JAK3) mutation was detected in the mucocele group. Eight mutations were identified in the V-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog (KRAS) and GNAS complex locus (GNAS) genes among LAMN samples. Additional gene mutations were identified in the AKT1 (v-akt murine thymoma viral oncogene homolog 1), APC (adenomatous polyposis coli), JAK3, MET (met proto-oncogene), phosphatidylinositol-4,5-bisphosphate 3-kinase (PIK3CA), RB1 (retinoblastoma 1), STK11 (serine/threonine kinase 11), and tumor protein p53 (TP53) genes. Among the PMPs, 6 mutations were detected in the KRAS gene and also in the GNAS, TP53, and RB1 genes. Appendiceal cancers showed mutations in the APC, ATM (ataxia telangiectasia mutated), KRAS, IDH1 [isocitrate dehydrogenase 1 (NADP+)], NRAS [neuroblastoma RAS viral (v-ras) oncogene homolog], PIK3CA, SMAD4 (SMAD family member 4), and TP53 genes. Our results suggest molecular heterogeneity among epithelial tumors of the appendix. Next generation sequencing efforts

  18. A putative carbohydrate-binding domain of the lactose-binding Cytisus sessilifolius anti-H(O) lectin has a similar amino acid sequence to that of the L-fucose-binding Ulex europaeus anti-H(O) lectin.

    PubMed

    Konami, Y; Yamamoto, K; Osawa, T; Irimura, T

    1995-04-01

    The complete amino acid sequence of a lactose-binding Cytisus sessilifolius anti-H(O) lectin II (CSA-II) was determined using a protein sequencer. After digestion of CSA-II with endoproteinase Lys-C or Asp-N, the resulting peptides were purified by reversed-phase high performance liquid chromatography (HPLC) and then subjected to sequence analysis. Comparison of the complete amino acid sequence of CSA-II with the sequences of other leguminous seed lectins revealed regions of extensive homology. The amino acid sequence of a putative carbohydrate-binding domain of CSA-II was found to be similar to those of several anti-H(O) leguminous lectins, especially to that of the L-fucose-binding Ulex europaeus lectin I (UEA-I).

  19. Exome Sequencing Identifies Truncating Mutations in Human SERPINF1 in Autosomal-Recessive Osteogenesis Imperfecta

    PubMed Central

    Becker, Jutta; Semler, Oliver; Gilissen, Christian; Li, Yun; Bolz, Hanno Jörn; Giunta, Cecilia; Bergmann, Carsten; Rohrbach, Marianne; Koerber, Friederike; Zimmermann, Katharina; de Vries, Petra; Wirth, Brunhilde; Schoenau, Eckhard; Wollnik, Bernd; Veltman, Joris A.; Hoischen, Alexander; Netzer, Christian

    2011-01-01

    Osteogenesis imperfecta (OI) is a heterogeneous genetic disorder characterized by bone fragility and susceptibility to fractures after minimal trauma. After mutations in all known OI genes had been excluded by Sanger sequencing, we applied next-generation sequencing to analyze the exome of a single individual who has a severe form of the disease and whose parents are second cousins. A total of 26,922 variations from the human reference genome sequence were subjected to several filtering steps. In addition, we extracted the genotypes of all dbSNP130-annotated SNPs from the exome sequencing data and used these 299,494 genotypes as markers for the genome-wide identification of homozygous regions. A single homozygous truncating mutation, affecting SERPINF1 on chromosome 17p13.3, that was embedded into a homozygous stretch of 2.99 Mb remained. The mutation was also homozygous in the affected brother of the index patient. Subsequently, we identified homozygosity for two different truncating SERPINF1 mutations in two unrelated patients with OI and parental consanguinity. All four individuals with SERPINF1 mutations have severe OI. Fractures of long bones and severe vertebral compression fractures with resulting deformities were observed as early as the first year of life in these individuals. Collagen analyses with cultured dermal fibroblasts displayed no evidence for impaired collagen folding, posttranslational modification, or secretion. SERPINF1 encodes pigment epithelium-derived factor (PEDF), a secreted glycoprotein of the serpin superfamily. PEDF is a multifunctional protein and one of the strongest inhibitors of angiogenesis currently known in humans. Our data provide genetic evidence for PEDF involvement in human bone homeostasis. PMID:21353196

  20. iNuc-PhysChem: A Sequence-Based Predictor for Identifying Nucleosomes via Physicochemical Properties

    PubMed Central

    Feng, Peng-Mian; Ding, Chen; Zuo, Yong-Chun; Chou, Kuo-Chen

    2012-01-01

    Nucleosome positioning has important roles in key cellular processes. Although intensive efforts have been made in this area, the rules defining nucleosome positioning is still elusive and debated. In this study, we carried out a systematic comparison among the profiles of twelve DNA physicochemical features between the nucleosomal and linker sequences in the Saccharomyces cerevisiae genome. We found that nucleosomal sequences have some position-specific physicochemical features, which can be used for in-depth studying nucleosomes. Meanwhile, a new predictor, called iNuc-PhysChem, was developed for identification of nucleosomal sequences by incorporating these physicochemical properties into a 1788-D (dimensional) feature vector, which was further reduced to a 884-D vector via the IFS (incremental feature selection) procedure to optimize the feature set. It was observed by a cross-validation test on a benchmark dataset that the overall success rate achieved by iNuc-PhysChem was over 96% in identifying nucleosomal or linker sequences. As a web-server, iNuc-PhysChem is freely accessible to the public at http://lin.uestc.edu.cn/server/iNuc-PhysChem. For the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results without the need to follow the complicated mathematics that were presented just for the integrity in developing the predictor. Meanwhile, for those who prefer to run predictions in their own computers, the predictor's code can be easily downloaded from the web-server. It is anticipated that iNuc-PhysChem may become a useful high throughput tool for both basic research and drug design. PMID:23144709

  1. Phenotypic chemical screening using a zebrafish neural crest EMT reporter identifies retinoic acid as an inhibitor of epithelial morphogenesis

    PubMed Central

    Jimenez, Laura; Wang, Jindong; Morrison, Monique A.; Whatcott, Clifford; Soh, Katherine K.; Warner, Steven; Bearss, David; Jette, Cicely A.; Stewart, Rodney A.

    2016-01-01

    ABSTRACT The epithelial-to-mesenchymal transition (EMT) is a highly conserved morphogenetic program essential for embryogenesis, regeneration and cancer metastasis. In cancer cells, EMT also triggers cellular reprogramming and chemoresistance, which underlie disease relapse and decreased survival. Hence, identifying compounds that block EMT is essential to prevent or eradicate disseminated tumor cells. Here, we establish a whole-animal-based EMT reporter in zebrafish for rapid drug screening, called Tg(snai1b:GFP), which labels epithelial cells undergoing EMT to produce sox10-positive neural crest (NC) cells. Time-lapse and lineage analysis of Tg(snai1b:GFP) embryos reveal that cranial NC cells delaminate from two regions: an early population delaminates adjacent to the neural plate, whereas a later population delaminates from within the dorsal neural tube. Treating Tg(snai1b:GFP) embryos with candidate small-molecule EMT-inhibiting compounds identified TP-0903, a multi-kinase inhibitor that blocked cranial NC cell delamination in both the lateral and medial populations. RNA sequencing (RNA-Seq) analysis and chemical rescue experiments show that TP-0903 acts through stimulating retinoic acid (RA) biosynthesis and RA-dependent transcription. These studies identify TP-0903 as a new therapeutic for activating RA in vivo and raise the possibility that RA-dependent inhibition of EMT contributes to its prior success in eliminating disseminated cancer cells. PMID:26794130

  2. Next-generation sequencing identifies major DNA methylation changes during progression of Ph+ chronic myeloid leukemia

    PubMed Central

    Heller, G; Topakian, T; Altenberger, C; Cerny-Reiterer, S; Herndlhofer, S; Ziegler, B; Datlinger, P; Byrgazov, K; Bock, C; Mannhalter, C; Hörmann, G; Sperr, W R; Lion, T; Zielinski, C C; Valent, P; Zöchbauer-Müller, S

    2016-01-01

    Little is known about the impact of DNA methylation on the evolution/progression of Ph+ chronic myeloid leukemia (CML). We investigated the methylome of CML patients in chronic phase (CP-CML), accelerated phase (AP-CML) and blast crisis (BC-CML) as well as in controls by reduced representation bisulfite sequencing. Although only ~600 differentially methylated CpG sites were identified in samples obtained from CP-CML patients compared with controls, ~6500 differentially methylated CpG sites were found in samples from BC-CML patients. In the majority of affected CpG sites, methylation was increased. In CP-CML patients who progressed to AP-CML/BC-CML, we identified up to 897 genes that were methylated at the time of progression but not at the time of diagnosis. Using RNA-sequencing, we observed downregulated expression of many of these genes in BC-CML compared with CP-CML samples. Several of them are well-known tumor-suppressor genes or regulators of cell proliferation, and gene re-expression was observed by the use of epigenetic active drugs. Together, our results demonstrate that CpG site methylation clearly increases during CML progression and that it may provide a useful basis for revealing new targets of therapy in advanced CML. PMID:27211271

  3. Isolation of laccase gene-specific sequences from white rot and brown rot fungi by PCR

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    D`Souza, T.M.; Boominathan, K.; Reddy, C.A.

    1996-10-01

    Degenerate primers corresponding to the consensus sequences of the copper-binding regions in the N-terminal domains of known basidiomycete laccases were used to isolate laccase gene-specific sequences from strains representing nine genera of wood rot fungi. All except three gave the expected PCR product of about 200 bp. Computer searches of the databases identified the sequences of each of the PCR product of about 200 bp. Computer searches of the databases identified the sequence of each of the PCR products analyzed as a laccase gene sequence, suggesting the specificity of the primers. PCR products of the white rot fungi Ganoderma lucidum,more » Phlebia brevispora, and Trametes versicolor showed 65 to 74% nucleotide sequence similarity to each other; the similarity in deduced amino acid sequences was 83 to 91%. The PCR products of Lentinula edodes and Lentinus tigrinus, on the other hand, showed relatively low nucleotide and amino acid similarities (58 to 64 and 62 to 81%, respectively); however, these similarities were still much higher than when compared with the corresponding regions in the laccases of the ascomycete fungi Aspergillus nidulans and Neurospora crassa. A few of the white rot fungi, as well as Gloeophyllum trabeum, a brown rot fungus, gave a 144-bp PCR fragment which had a nucleotide sequence similarity of 60 to 71%. Demonstration of laccase activity in G. trabeum and several other brown rot fungi was of particular interest because these organisms were not previously shown to produce laccases. 36 refs., 6 figs., 2 tabs.« less

  4. The complete plastid genome sequence of Eustrephus latifolius (Asparagaceae: Lomandroideae).

    PubMed

    Kim, Hyoung Tae; Kim, Jung Sung; Kim, Joo-Hwan

    2016-01-01

    The complete chloroplast (cp) genome sequence of Eustrephus latifolius was firstly determined in subfamily Lomandriodeae of family Asparagaceae. It was 159,736 bp and contained a large single copy region (82,403 bp) and a small single copy region (13,607 bp) which were separated by two inverted repeat regions (31,863 bp). In total, 132 genes were identified and they were consisted of 83 coding genes, 8 rRNA genes, 38 tRNA genes, 3 pseudogenes. rpl23 and clpP were pseudogenes due to sequence deletions. Among 23 genes containing introns, rps12 and ycf3 contained two introns and the rest had just one intron. The intact ycf68 was identified within an intron of trnI-GAU. The amino acid sequence was almost identical with Phoenix dactylifera in Aracales. Ycf1 of E. latifolius was completely located in IR. It was similar to cp genome structure of Lemna minor, Spirodela polyrhiza, Wolffiella lingulata, Wolffia australiana in Alismatales.

  5. Identifying Likely Transmission Pathways within a 10-Year Community Outbreak of Tuberculosis by High-Depth Whole Genome Sequencing

    PubMed Central

    Sadsad, Rosemarie; Martinez, Elena; Jelfs, Peter; Hill-Cawthorne, Grant A.; Gilbert, Gwendolyn L.; Marais, Ben J.; Sintchenko, Vitali

    2016-01-01

    Background Improved tuberculosis control and the need to contain the spread of drug-resistant strains provide a strong rationale for exploring tuberculosis transmission dynamics at the population level. Whole-genome sequencing provides optimal strain resolution, facilitating detailed mapping of potential transmission pathways. Methods We sequenced 22 isolates from a Mycobacterium tuberculosis cluster in New South Wales, Australia, identified during routine 24-locus mycobacterial interspersed repetitive unit typing. Following high-depth paired-end sequencing using the Illumina HiSeq 2000 platform, two independent pipelines were employed for analysis, both employing read mapping onto reference genomes as well as de novo assembly, to control biases in variant detection. In addition to single-nucleotide polymorphisms, the analyses also sought to identify insertions, deletions and structural variants. Results Isolates were highly similar, with a distance of 13 variants between the most distant members of the cluster. The most sensitive analysis classified the 22 isolates into 18 groups. Four of the isolates did not appear to share a recent common ancestor with the largest clade; another four isolates had an uncertain ancestral relationship with the largest clade. Conclusion Whole genome sequencing, with analysis of single-nucleotide polymorphisms, insertions, deletions, structural variants and subpopulations, enabled the highest possible level of discrimination between cluster members, clarifying likely transmission pathways and exposing the complexity of strain origin. The analysis provides a basis for targeted public health intervention and enhanced classification of future isolates linked to the cluster. PMID:26938641

  6. Individual sequences in large sets of gene sequences may be distinguished efficiently by combinations of shared sub-sequences

    PubMed Central

    Gibbs, Mark J; Armstrong, John S; Gibbs, Adrian J

    2005-01-01

    Background Most current DNA diagnostic tests for identifying organisms use specific oligonucleotide probes that are complementary in sequence to, and hence only hybridise with the DNA of one target species. By contrast, in traditional taxonomy, specimens are usually identified by 'dichotomous keys' that use combinations of characters shared by different members of the target set. Using one specific character for each target is the least efficient strategy for identification. Using combinations of shared bisectionally-distributed characters is much more efficient, and this strategy is most efficient when they separate the targets in a progressively binary way. Results We have developed a practical method for finding minimal sets of sub-sequences that identify individual sequences, and could be targeted by combinations of probes, so that the efficient strategy of traditional taxonomic identification could be used in DNA diagnosis. The sizes of minimal sub-sequence sets depended mostly on sequence diversity and sub-sequence length and interactions between these parameters. We found that 201 distinct cytochrome oxidase subunit-1 (CO1) genes from moths (Lepidoptera) were distinguished using only 15 sub-sequences 20 nucleotides long, whereas only 8–10 sub-sequences 6–10 nucleotides long were required to distinguish the CO1 genes of 92 species from the 9 largest orders of insects. Conclusion The presence/absence of sub-sequences in a set of gene sequences can be used like the questions in a traditional dichotomous taxonomic key; hybridisation probes complementary to such sub-sequences should provide a very efficient means for identifying individual species, subtypes or genotypes. Sequence diversity and sub-sequence length are the major factors that determine the numbers of distinguishing sub-sequences in any set of sequences. PMID:15817134

  7. Whole genome sequencing of Oryza sativa L. cv. Seeragasamba identifies a new fragrance allele in rice

    PubMed Central

    Bindusree, Ganigara; Natarajan, Purushothaman; Kalva, Sukesh

    2017-01-01

    Fragrance of rice is an important trait that confers a large economic benefit to the farmers who cultivate aromatic rice varieties. Several aromatic rice varieties have limited geographic distribution, and are endowed with variety-specific unique fragrances. BADH2 was identified as a fragrance gene in 2005, and it is essential to identify the fragrance alleles from diverse geographical locations and genetic backgrounds. Seeragasamba is a short-grain aromatic rice variety of the indica type, which is cultivated in a limited area in India. Whole genome sequencing of this variety identified a new badh2 allele (badh2-p) with an 8 bp insertion in the promoter region of the BADH2 gene. When the whole genome sequences of 76 aromatic varieties in the 3000 rice genome project were analyzed, the badh2-p allele was present in 13 varieties (approximately 17%) of both indica and japonica types. In addition, the badh2-p allele was present in 17 varieties that already had the loss-of-function allele, badh2-E7. Taken together, the frequency of badh2-p allele (approximately 40%) was found to be greater than that of the badh2-E7 allele (approximately 34%) among the aromatic rice varieties. Therefore, it is suggested to include badh2-p as a predominant allele when screening for fragrance alleles in aromatic rice varieties. PMID:29190814

  8. Molecular sled sequences are common in mammalian proteins.

    PubMed

    Xiong, Kan; Blainey, Paul C

    2016-03-18

    Recent work revealed a new class of molecular machines called molecular sleds, which are small basic molecules that bind and slide along DNA with the ability to carry cargo along DNA. Here, we performed biochemical and single-molecule flow stretching assays to investigate the basis of sliding activity in molecular sleds. In particular, we identified the functional core of pVIc, the first molecular sled characterized; peptide functional groups that control sliding activity; and propose a model for the sliding activity of molecular sleds. We also observed widespread DNA binding and sliding activity among basic polypeptide sequences that implicate mammalian nuclear localization sequences and many cell penetrating peptides as molecular sleds. These basic protein motifs exhibit weak but physiologically relevant sequence-nonspecific DNA affinity. Our findings indicate that many mammalian proteins contain molecular sled sequences and suggest the possibility that substantial undiscovered sliding activity exists among nuclear mammalian proteins. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. Usefulness of dual-energy computed tomography with and without dedicated software in identifying uric acid kidney stones.

    PubMed

    Salvador, R; Luque, M P; Ciudin, A; Paño, B; Buñesch, L; Sebastia, C; Nicolau, C

    2016-01-01

    To prospectively evaluate the usefulness of dual-energy computed tomography (DECT) with and without dedicated software in identifying uric acid kidney stones in vivo. We studied 65 kidney stones in 63 patients. All stones were analyzed in vivo by DECT and ex vivo by spectrophotometry. We evaluated the diagnostic performance in identifying uric acid stones with DECT by analyzing the radiologic densities with dedicated software and without using it (through manual measurements) as well as by analyzing the attenuation ratios of the stones in both energies with and without the dedicated software. The six uric acid stones included were correctly identified by evaluating the attenuation ratios with a cutoff of 1.21, both with the dedicated software and without it, yielding perfect diagnostic performance without false positives or false negatives. The study of the attenuations of the stones obtained the following values on the receiver operating characteristic curves in the classification of the uric acid stones: 0.92 for the measurements done with the software and 0.89 for the manual measurements; a cutoff of 538 HU yielded 84% (42/50) diagnostic accuracy for the software and 83.1% (54/65) for the manual measurements. DECT enabled the uric acid stones to be identified correctly through the calculation of the ratio of the attenuations in the two energies. The results obtained with the dedicated software were similar to those obtained manually. Copyright © 2015 SERAM. Published by Elsevier España, S.L.U. All rights reserved.

  10. Exome sequencing of a large family identifies potential candidate genes contributing risk to bipolar disorder.

    PubMed

    Zhang, Tianxiao; Hou, Liping; Chen, David T; McMahon, Francis J; Wang, Jen-Chyong; Rice, John P

    2018-03-01

    Bipolar disorder is a mental illness with lifetime prevalence of about 1%. Previous genetic studies have identified multiple chromosomal linkage regions and candidate genes that might be associated with bipolar disorder. The present study aimed to identify potential susceptibility variants for bipolar disorder using 6 related case samples from a four-generation family. A combination of exome sequencing and linkage analysis was performed to identify potential susceptibility variants for bipolar disorder. Our study identified a list of five potential candidate genes for bipolar disorder. Among these five genes, GRID1(Glutamate Receptor Delta-1 Subunit), which was previously reported to be associated with several psychiatric disorders and brain related traits, is particularly interesting. Variants with functional significance in this gene were identified from two cousins in our bipolar disorder pedigree. Our findings suggest a potential role for these genes and the related rare variants in the onset and development of bipolar disorder in this one family. Additional research is needed to replicate these findings and evaluate their patho-biological significance. Copyright © 2017 Elsevier B.V. All rights reserved.

  11. Partial characterization of the lettuce infectious yellows virus genomic RNAs, identification of the coat protein gene and comparison of its amino acid sequence with those of other filamentous RNA plant viruses.

    PubMed

    Klaassen, V A; Boeshore, M; Dolja, V V; Falk, B W

    1994-07-01

    Purified virions of lettuce infectious yellows virus (LIYV), a tentative member of the closterovirus group, contained two RNAs of approximately 8500 and 7300 nucleotides (RNAs 1 and 2 respectively) and a single coat protein species with M(r) of approximately 28,000. LIYV-infected plants contained multiple dsRNAs. The two largest were the correct size for the replicative forms of LIYV virion RNAs 1 and 2. To assess the relationships between LIYV RNAs 1 and 2, cDNAs corresponding to the virion RNAs were cloned. Northern blot hybridization analysis showed no detectable sequence homology between these RNAs. A partial amino acid sequence obtained from purified LIYV coat protein was found to align in the most upstream of four complete open reading frames (ORFs) identified in a LIYV RNA 2 cDNA clone. The identity of this ORF was confirmed as the LIYV coat protein gene by immunological analysis of the gene product expressed in vitro and in Escherichia coli. Computer analysis of the LIYV coat protein amino acid sequence indicated that it belongs to a large family of proteins forming filamentous capsids of RNA plant viruses. The LIYV coat protein appears to be most closely related to the coat proteins of two closteroviruses, beet yellows virus and citrus tristeza virus.

  12. Genome and transcriptome sequencing identifies breeding targets in the orphan crop tef (Eragrostis tef).

    PubMed

    Cannarozzi, Gina; Plaza-Wüthrich, Sonia; Esfeld, Korinna; Larti, Stéphanie; Wilson, Yi Song; Girma, Dejene; de Castro, Edouard; Chanyalew, Solomon; Blösch, Regula; Farinelli, Laurent; Lyons, Eric; Schneider, Michel; Falquet, Laurent; Kuhlemeier, Cris; Assefa, Kebebew; Tadele, Zerihun

    2014-07-09

    Tef (Eragrostis tef), an indigenous cereal critical to food security in the Horn of Africa, is rich in minerals and protein, resistant to many biotic and abiotic stresses and safe for diabetics as well as sufferers of immune reactions to wheat gluten. We present the genome of tef, the first species in the grass subfamily Chloridoideae and the first allotetraploid assembled de novo. We sequenced the tef genome for marker-assisted breeding, to shed light on the molecular mechanisms conferring tef's desirable nutritional and agronomic properties, and to make its genome publicly available as a community resource. The draft genome contains 672 Mbp representing 87% of the genome size estimated from flow cytometry. We also sequenced two transcriptomes, one from a normalized RNA library and another from unnormalized RNASeq data. The normalized RNA library revealed around 38000 transcripts that were then annotated by the SwissProt group. The CoGe comparative genomics platform was used to compare the tef genome to other genomes, notably sorghum. Scaffolds comprising approximately half of the genome size were ordered by syntenic alignment to sorghum producing tef pseudo-chromosomes, which were sorted into A and B genomes as well as compared to the genetic map of tef. The draft genome was used to identify novel SSR markers, investigate target genes for abiotic stress resistance studies, and understand the evolution of the prolamin family of proteins that are responsible for the immune response to gluten. It is highly plausible that breeding targets previously identified in other cereal crops will also be valuable breeding targets in tef. The draft genome and transcriptome will be of great use for identifying these targets for genetic improvement of this orphan crop that is vital for feeding 50 million people in the Horn of Africa.

  13. Greater than the sum of its parts: single-nucleus sequencing identifies convergent evolution of independent EGFR mutants in GBM.

    PubMed

    Gini, Beatrice; Mischel, Paul S

    2014-08-01

    Single-cell sequencing approaches are needed to characterize the genomic diversity of complex tumors, shedding light on their evolutionary paths and potentially suggesting more effective therapies. In this issue of Cancer Discovery, Francis and colleagues develop a novel integrative approach to identify distinct tumor subpopulations based on joint detection of clonal and subclonal events from bulk tumor and single-nucleus whole-genome sequencing, allowing them to infer a subclonal architecture. Surprisingly, the authors identify convergent evolution of multiple, mutually exclusive, independent EGFR gain-of-function variants in a single tumor. This study demonstrates the value of integrative single-cell genomics and highlights the biologic primacy of EGFR as an actionable target in glioblastoma. ©2014 American Association for Cancer Research.

  14. Exome Sequencing Identifies Mitochondrial Alanyl-tRNA Synthetase Mutations in Infantile Mitochondrial Cardiomyopathy

    PubMed Central

    Götz, Alexandra; Tyynismaa, Henna; Euro, Liliya; Ellonen, Pekka; Hyötyläinen, Tuulia; Ojala, Tiina; Hämäläinen, Riikka H.; Tommiska, Johanna; Raivio, Taneli; Oresic, Matej; Karikoski, Riitta; Tammela, Outi; Simola, Kalle O.J.; Paetau, Anders; Tyni, Tiina; Suomalainen, Anu

    2011-01-01

    Infantile cardiomyopathies are devastating fatal disorders of the neonatal period or the first year of life. Mitochondrial dysfunction is a common cause of this group of diseases, but the underlying gene defects have been characterized in only a minority of cases, because tissue specificity of the manifestation hampers functional cloning and the heterogeneity of causative factors hinders collection of informative family materials. We sequenced the exome of a patient who died at the age of 10 months of hypertrophic mitochondrial cardiomyopathy with combined cardiac respiratory chain complex I and IV deficiency. Rigorous data analysis allowed us to identify a homozygous missense mutation in AARS2, which we showed to encode the mitochondrial alanyl-tRNA synthetase (mtAlaRS). Two siblings from another family, both of whom died perinatally of hypertrophic cardiomyopathy, had the same mutation, compound heterozygous with another missense mutation. Protein structure modeling of mtAlaRS suggested that one of the mutations affected a unique tRNA recognition site in the editing domain, leading to incorrect tRNA aminoacylation, whereas the second mutation severely disturbed the catalytic function, preventing tRNA aminoacylation. We show here that mutations in AARS2 cause perinatal or infantile cardiomyopathy with near-total combined mitochondrial respiratory chain deficiency in the heart. Our results indicate that exome sequencing is a powerful tool for identifying mutations in single patients and allows recognition of the genetic background in single-gene disorders of variable clinical manifestation and tissue-specific disease. Furthermore, we show that mitochondrial disorders extend to prenatal life and are an important cause of early infantile cardiac failure. PMID:21549344

  15. Next-generation sequencing identifies a novel compound heterozygous mutation in MYO7A in a Chinese patient with Usher Syndrome 1B.

    PubMed

    Wei, Xiaoming; Sun, Yan; Xie, Jiansheng; Shi, Quan; Qu, Ning; Yang, Guanghui; Cai, Jun; Yang, Yi; Liang, Yu; Wang, Wei; Yi, Xin

    2012-11-20

    Targeted enrichment and next-generation sequencing (NGS) have been employed for detection of genetic diseases. The purpose of this study was to validate the accuracy and sensitivity of our method for comprehensive mutation detection of hereditary hearing loss, and identify inherited mutations involved in human deafness accurately and economically. To make genetic diagnosis of hereditary hearing loss simple and timesaving, we designed a 0.60 MB array-based chip containing 69 nuclear genes and mitochondrial genome responsible for human deafness and conducted NGS toward ten patients with five known mutations and a Chinese family with hearing loss (never genetically investigated). Ten patients with five known mutations were sequenced using next-generation sequencing to validate the sensitivity of the method. We identified four known mutations in two nuclear deafness causing genes (GJB2 and SLC26A4), one in mitochondrial DNA. We then performed this method to analyze the variants in a Chinese family with hearing loss and identified compound heterozygosity for two novel mutations in gene MYO7A. The compound heterozygosity identified in gene MYO7A causes Usher Syndrome 1B with severe phenotypes. The results support that the combination of enrichment of targeted genes and next-generation sequencing is a valuable molecular diagnostic tool for hereditary deafness and suitable for clinical application. Copyright © 2012 Elsevier B.V. All rights reserved.

  16. Leaf Transcriptome Sequencing for Identifying Genic-SSR Markers and SNP Heterozygosity in Crossbred Mango Variety ‘Amrapali’ (Mangifera indica L.)

    PubMed Central

    Mahato, Ajay Kumar; Sharma, Nimisha; Singh, Akshay; Srivastav, Manish; Jaiprakash; Singh, Sanjay Kumar; Singh, Anand Kumar; Sharma, Tilak Raj; Singh, Nagendra Kumar

    2016-01-01

    Mango (Mangifera indica L.) is called “king of fruits” due to its sweetness, richness of taste, diversity, large production volume and a variety of end usage. Despite its huge economic importance genomic resources in mango are scarce and genetics of useful horticultural traits are poorly understood. Here we generated deep coverage leaf RNA sequence data for mango parental varieties ‘Neelam’, ‘Dashehari’ and their hybrid ‘Amrapali’ using next generation sequencing technologies. De-novo sequence assembly generated 27,528, 20,771 and 35,182 transcripts for the three genotypes, respectively. The transcripts were further assembled into a non-redundant set of 70,057 unigenes that were used for SSR and SNP identification and annotation. Total 5,465 SSR loci were identified in 4,912 unigenes with 288 type I SSR (n ≥ 20 bp). One hundred type I SSR markers were randomly selected of which 43 yielded PCR amplicons of expected size in the first round of validation and were designated as validated genic-SSR markers. Further, 22,306 SNPs were identified by aligning high quality sequence reads of the three mango varieties to the reference unigene set, revealing significantly enhanced SNP heterozygosity in the hybrid Amrapali. The present study on leaf RNA sequencing of mango varieties and their hybrid provides useful genomic resource for genetic improvement of mango. PMID:27736892

  17. Microbial Degradation of Chlorogenic Acid by a Sphingomonas sp. Strain.

    PubMed

    Ma, Yuping; Wang, Xiaoyu; Nie, Xueling; Zhang, Zhan; Yang, Zongcan; Nie, Cong; Tang, Hongzhi

    2016-08-01

    In order to elucidate the metabolism of chlorogenic acid by environmental microbes, a strain of Sphingomonas sp. isolated from tobacco leaves was cultured under various conditions, and chlorogenic acid degradation and its metabolites were investigated. The strain converting chlorogenic acid was newly isolated and identified as a Sphingomonas sp. strain by 16S rRNA sequencing. The optimal conditions for growth and chlorogenic acid degradation were 37 °C and pH 7.0 with supplementation of 1.5 g/l (NH4)2SO4 as the nitrogen source and 2 g/l chlorogenic acid as the sole carbon source. The maximum chlorogenic acid tolerating capability for the strain was 5 g/l. The main metabolites were identified as caffeic acid, shikimic acid, and 3,4-dihydroxybenzoic acid based on gas chromatography-mass spectrometry analysis. The analysis reveals the biotransformation mechanism of chlorogenic acid in microbial cells isolated from the environment.

  18. Co-expression analysis identifies CRC and AP1 the regulator of Arabidopsis fatty acid biosynthesis.

    PubMed

    Han, Xinxin; Yin, Linlin; Xue, Hongwei

    2012-07-01

    Fatty acids (FAs) play crucial rules in signal transduction and plant development, however, the regulation of FA metabolism is still poorly understood. To study the relevant regulatory network, fifty-eight FA biosynthesis genes including de novo synthases, desaturases and elongases were selected as "guide genes" to construct the co-expression network. Calculation of the correlation between all Arabidopsis thaliana (L.) genes with each guide gene by Arabidopsis co-expression dating mining tools (ACT) identifies 797 candidate FA-correlated genes. Gene ontology (GO) analysis of these co-expressed genes showed they are tightly correlated to photosynthesis and carbohydrate metabolism, and function in many processes. Interestingly, 63 transcription factors (TFs) were identified as candidate FA biosynthesis regulators and 8 TF families are enriched. Two TF genes, CRC and AP1, both correlating with 8 FA guide genes, were further characterized. Analyses of the ap1 and crc mutant showed the altered total FA composition of mature seeds. The contents of palmitoleic acid, stearic acid, arachidic acid and eicosadienoic acid are decreased, whereas that of oleic acid is increased in ap1 and crc seeds, which is consistent with the qRT-PCR analysis revealing the suppressed expression of the corresponding guide genes. In addition, yeast one-hybrid analysis and electrophoretic mobility shift assay (EMSA) revealed that CRC can bind to the promoter regions of KCS7 and KCS15, indicating that CRC may directly regulate FA biosynthesis. © 2012 Institute of Botany, Chinese Academy of Sciences.

  19. Multiplexed resequencing analysis to identify rare variants in pooled DNA with barcode indexing using next-generation sequencer.

    PubMed

    Mitsui, Jun; Fukuda, Yoko; Azuma, Kyo; Tozaki, Hirokazu; Ishiura, Hiroyuki; Takahashi, Yuji; Goto, Jun; Tsuji, Shoji

    2010-07-01

    We have recently found that multiple rare variants of the glucocerebrosidase gene (GBA) confer a robust risk for Parkinson disease, supporting the 'common disease-multiple rare variants' hypothesis. To develop an efficient method of identifying rare variants in a large number of samples, we applied multiplexed resequencing using a next-generation sequencer to identification of rare variants of GBA. Sixteen sets of pooled DNAs from six pooled DNA samples were prepared. Each set of pooled DNAs was subjected to polymerase chain reaction to amplify the target gene (GBA) covering 6.5 kb, pooled into one tube with barcode indexing, and then subjected to extensive sequence analysis using the SOLiD System. Individual samples were also subjected to direct nucleotide sequence analysis. With the optimization of data processing, we were able to extract all the variants from 96 samples with acceptable rates of false-positive single-nucleotide variants.

  20. Discrimination of germline V genes at different sequencing lengths and mutational burdens: A new tool for identifying and evaluating the reliability of V gene assignment.

    PubMed

    Zhang, Bochao; Meng, Wenzhao; Prak, Eline T Luning; Hershberg, Uri

    2015-12-01

    Immune repertoires are collections of lymphocytes that express diverse antigen receptor gene rearrangements consisting of Variable (V), (Diversity (D) in the case of heavy chains) and Joining (J) gene segments. Clonally related cells typically share the same germline gene segments and have highly similar junctional sequences within their third complementarity determining regions. Identifying clonal relatedness of sequences is a key step in the analysis of immune repertoires. The V gene is the most important for clone identification because it has the longest sequence and the greatest number of sequence variants. However, accurate identification of a clone's germline V gene source is challenging because there is a high degree of similarity between different germline V genes. This difficulty is compounded in antibodies, which can undergo somatic hypermutation. Furthermore, high-throughput sequencing experiments often generate partial sequences and have significant error rates. To address these issues, we describe a novel method to estimate which germline V genes (or alleles) cannot be discriminated under different conditions (read lengths, sequencing errors or somatic hypermutation frequencies). Starting with any set of germline V genes, this method measures their similarity using different sequencing lengths and calculates their likelihood of unambiguous assignment under different levels of mutation. Hence, one can identify, under different experimental and biological conditions, the germline V genes (or alleles) that cannot be uniquely identified and bundle them together into groups of specific V genes with highly similar sequences. Copyright © 2015 Elsevier B.V. All rights reserved.